Small Matrix (30x1) using significant time/memory to update in a loop
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
1 vote
I have a small vector Rv (30x1) which need to be updated in every loop. Before going in the loop, I have initiated the vector assigning values zeros. On profiling (with memory and time), I find that this updating step takes a significant amount of time compared to others. Also, the memory usage seems to be higher than for what a 30x1 matrix should use. I am attaching the image of profiler analysis. All the other variables appearing (like Kd K_Ir_Ac etc) are simple scaler variables.

What am I missing? What can I do the reduce the overhead in terms of memory and time usage?
This analysis was done for only a fraction of total time to test the complete function. On running the code upto final point (~100-200 times more loops that in this test run), these times will matter. And moreover, this is just out of curiosity as well!!
1 Comment
dpb
on 24 Dec 2013
Not enough info to tell for certain as there's no context to know what's invariant and what's not as well as sizes of everything else presuming some may not be simply constants?
Firstly, however, is there at least one redundant computation of the same values -- start by factoring the denominator out entirely from each computation and do it once at the end.
I didn't try to read the mishmash of similarly named variables to try to see what, if anything else, could either be factored or at least computed only once as a subfactor and used multiple locations.
But, need more context most of all methinks...
Accepted Answer
John D'Errico
on 24 Dec 2013
Preallocating this vector with zeros is a waste of time! At least it is if immediately after that, you replace the entire vector with other numbers. So don't waste that time. (In general, preallocation is a VERY smart thing to do.)
Next, how many times must MATLAB compute the product (Nav*V), and then divide by it? Why force a computation to be done repeatedly? In fact, you go further than that, since you form things like Ir/(Nav*V) multiple times.
Next, I'd not be at all surprised to find that there is a difference in time between
Rv = zeros(30,1);
Rv(:,1) = [stuff];
and
Rv = [stuff];
Next, are ALL of these scalar variables changing all of the time? I would sincerely bet they are not. So why not precompute some parts of this vector and save the results?
Finally, I was going to try optimizing how your line is written, but since it is an image, I cannot copy it into the MATLAB editor.
18 Comments
Sean de Wolski
on 24 Dec 2013
He's using the profiler which is a great first step, credit where due!
Amit
on 24 Dec 2013
Thank you for your feedback. The preallocation is done prior to getting in the while loop. This removes the overhead on while.. end loop and does not creates-recreates RV matrix in the loop.
Otherthan nav and V, other scalars change quite often. I can definitely divide by Nav * V once instead of on every cell. As you can see from the image, it has been looped for 570000 times in this short simulation. and in the final version, id expect it to loop for ~100-200 times of this.
my main issue was - a 30x1 matrix should not take memory in order of kb.
Walter Roberson
on 24 Dec 2013
The copying of the right hand side into Rv(:,1) is likely to take more time than recreating Rv.
When the right hand side is computed, an unnamed variable is created that describes the block of memory and points to its contents.
Then to assign it to a variable involves releasing what was there previously and pointing the symbol table entry to the existing anonymous descriptor.
When it is instead copied into Rv(:,1), the anonymous right-hand side is still computed, and the data block associated with the left hand side is located and a memory copy is done; then the space associated with the anonymous right-hand side is released. Difference in speed: copying the data.
dpb
on 24 Dec 2013
To amplify a little on Walter's and John's point on the preallocation -- IF (the proverbial "big if") you were filling an array by computing each individual array element in a loop sotoo
for i=1:N
x(i)=RHS(i);
end
then preallocating would make a difference if x didn't already exist as Matlab otherwise would have to allocate a new array every pass thru, copying the new, expanded array into the original and destroying it. OTOH, if the whole array is there up front, then it does avoid that reallocation.
In your case, however, you're building a whole RHS vector at one time anyway, so whether the LHS has been previously allocated doesn't really make any big difference UNLESS Rv is more than a vector and Rv(:,2:end) is extant (which it appears from your description it doesn't). In that case the subscripting expression is superfluous and it could possibly actually take more time than simply the expression
Rv=[_RHSexpression_];
without the preallocation and since the whole array is being reassigned every time, there's nothing different in the end result.
Walter Roberson
on 24 Dec 2013
Though by my calculation the memory copying time would be different, less in the case where everything was overwritten.
Amit
on 24 Dec 2013
wow .. so in a way, in my attempt to make the code faster, I was not doing any improvements .. this is quite interesting and helpful to know .. so do you guys think, if I individually do calculation for each cell (I.e RV(1,1) = kd * I and etc) , that would be better?
dpb
on 24 Dec 2013
Probably not but it's easy enough to try/test...
My thought (altho like John since you didn't post code but an image so can't cut'n paste to do anything with it w/o too much effort), would be that of
a) factor what can as much can (already mentioned), then
b) since it appears that each term is essentially(*) A*B*C/D, if you were to be able to build A, B, C as arrays effectively then whole thing could just become
Rv=A.*B.*C/D;
(*) For row 1, C==1 and there may be some others; I didn't try to read every line precisely, but you should get the idea.
John D'Errico
on 25 Dec 2013
No, I don't think doing the elements individually will help, and probably will be worse. Really though, you need to do some tests, because MATLAB's own built-in acceleration may be a factor.
I would definitely factor what you can. My gut says I'd do something like this:
NV = Nav*V;
IrNV = Ir./NV;
ArNV = Ar./NV;
BrNV = Br./NV;
Rv = [Kd*I; ...
[K_Ir_Ac;K_Ir_Bc;K_Ir_Pc;K_Ir_Arc;K_Ir_Brc;K_Ir_Prc;K_Ir_Pirc].*[A;B;P;Ar;Br;Pr;Pir].*IrNV; ...];
Etc. You should get the idea.
I'd still suggest that if many of these numbers are constant, then you don't want to recompute things over and over again.
Agree that if there isn't any way to generalize the terms other than itemize them individually but it's worth looking at I'd think as we don't know where the various terms come from...but there are untold numbers of times where posters ask how to generate a zillion variables from what could and should have been handled as an array. We don't know something of the sort couldn't have led to the above formulation as we're not privy to the additional info--which was my first comment that need more context to answer more generally than just "peephole" optimization of the specific statement.
And of course the factorization of common terms out so they're not recomputed as you've pointed out.
Amit
on 26 Dec 2013
@John, I tested the approach you suggested but this actually increases the computation time for Rv matrix by around 4-5 times than the method I implemented before. I think probably breaking the matrix into different pieces, would create more temporary locations for computation and thus will need more time to update the matrix.
@dpb, the vector Rv is used to calculated weighted probability and then determine a choice among 30 choices of events. The scalar variables which changes frequently are I, A, B, P, Pr, P1r, Ar etc.. The scalar variable Kd, K_xx_xx, Kp_xx_xx, Kt_xx_xx, Nav and V are constant and known before going in the loop. A simple block diagram is like this:
while _condition_
calculate Rv
calculate weighted probability from Rv
generate a random integer (case) between 1-30 using wt. prob.
switch _case_
case xx
end
end
The question I asked was more out of curiosity. I was more surprised how a small matrix take significant time and memory to get updated. I understand the replies that Rv = [_stuff_] would be using some temporary locations to update Rv resulting in increased time and memory. I was just hoping if there is a way that I can increase the efficiency of the code, which will be a newer learning experience for me.
One approach, I was thinking of was using scalars for all 30 values, which will make an ugly code, however simple scalar computation is like 10 times faster than the same for a matrix. However, I am hoping this matlab community will teach/suggest a better elegant solution.
I am attaching the code from the image as well
Rv(:,1) = [Kd*I;
(K_Ir_Ac*Ir*A)/(Nav*V);
(K_Ir_Bc*Ir*B)/(Nav*V);
(K_Ir_Pc*Ir*P)/(Nav*V);
Kt_Ir_Arc*Ir*Ar/(Nav*V);
Kt_Ir_Brc*Ir*Br/(Nav*V);
Kt_Ir_Prc*Ir*Pr/(Nav*V);
Kt_Ir_P1rc*Ir*P1r/(Nav*V);
Kp_Ar_Ac*Ar*A/(Nav*V);
Kp_Ar_Bc*Ar*B/(Nav*V);
Kp_Ar_Pc*Ar*P/(Nav*V);
2*Kt_Ar_Arc*Ar*(Ar-1)/(Nav*V);
Kt_Ar_Brc*Ar*Br/(Nav*V);
Kt_Ar_P1rc*Ar*P1r/(Nav*V);
Kt_Ar_Prc*Ar*Pr/(Nav*V);
Kp_Br_Ac*Br*A/(Nav*V);
Kp_Br_Bc*Br*B/(Nav*V);
Kp_Br_Pc*Br*P/(Nav*V);
2*Kt_Br_Brc*Br*(Br-1)/(Nav*V);
Kt_Br_P1rc*Br*P1r/(Nav*V);
Kt_Br_Prc*Br*Pr/(Nav*V);
Kp_P1r_Ac*P1r*A/(Nav*V);
Kp_P1r_Bc*P1r*B/(Nav*V);
Kp_P1r_Pc*P1r*P/(Nav*V);
2*Kt_P1r_P1rc*P1r*(P1r-1)/(Nav*V);
Kt_P1r_Prc*P1r*Pr/(Nav*V);
Kp_Pr_Ac*Pr*A/(Nav*V);
Kp_Pr_Bc*Pr*B/(Nav*V);
Kp_Pr_Pc*Pr*P/(Nav*V);
2*Kt_Pr_Prc*Pr*(Pr-1)/(Nav*V)]; % Choice 30
My point is that how are the various Kxxx, etc., computed before the psuedo-code section given?
I'm looking for a generalization of these to perchance vectorize the whole thing instead of enumerating them individually as you have. That would be the only way I see of making much difference but again, whether that's possible is indeterminate w/o seeing what they consist of. It may be that you would need to restructure the earlier data for the ability to get to the vector arrangement. Then again, if they're all just independent semi-random combinations of things or wildly different computations, then you're likely "just stuck".
Also, just out of curiosity, how much does the minor expedient previously noted of writing
Rv=[stuff];
eliminating the preallocation and rewriting into the existing array as currently written help (or hurt)?
ADDENDUM:
Just picking a few at random 'cuz they were close enough by to see at the same time on the monitor -- you've got stuff like
Kp_P1r_Ac
Kp_P1r_Bc
Kp_P1r_Pc
Are these somehow related such that even this subset of three could be combined into a single computational vectorized chunk to load a portion of A in my previous factorization of
A.*B.*C
? The more that could be generalized the better, of course, but again there's not enough context given for us to know whether there's any hope or not; it's just a suggestion of the only way otomh I can see to make any real difference in the single expression posted.
Is there something specific I am missing in terms of the context?
The Kd, Kx_xx etc are calculated only once and that is before going in the loop. So I think the matrix can be factorized to get an A.
I tested the Rv=[stuff} for different scenarios for cputime (tic .. toc). All the values were kept constant during test.
Case 1:
tic;
for i = 1:100000
Rv = [Kd*I;
(K_Ir_Ac*Ir*A)/(Nav*V);
(K_Ir_Bc*Ir*B)/(Nav*V);
(K_Ir_Pc*Ir*P)/(Nav*V);
Kt_Ir_Arc*Ir*Ar/(Nav*V);
Kt_Ir_Brc*Ir*Br/(Nav*V);
Kt_Ir_Prc*Ir*Pr/(Nav*V);
Kt_Ir_P1rc*Ir*P1r/(Nav*V);
Kp_Ar_Ac*Ar*A/(Nav*V);
Kp_Ar_Bc*Ar*B/(Nav*V);
Kp_Ar_Pc*Ar*P/(Nav*V);
2*Kt_Ar_Arc*Ar*(Ar-1)/(Nav*V);
Kt_Ar_Brc*Ar*Br/(Nav*V);
Kt_Ar_P1rc*Ar*P1r/(Nav*V);
Kt_Ar_Prc*Ar*Pr/(Nav*V);
Kp_Br_Ac*Br*A/(Nav*V);
Kp_Br_Bc*Br*B/(Nav*V);
Kp_Br_Pc*Br*P/(Nav*V);
2*Kt_Br_Brc*Br*(Br-1)/(Nav*V);
Kt_Br_P1rc*Br*P1r/(Nav*V);
Kt_Br_Prc*Br*Pr/(Nav*V);
Kp_P1r_Ac*P1r*A/(Nav*V);
Kp_P1r_Bc*P1r*B/(Nav*V);
Kp_P1r_Pc*P1r*P/(Nav*V);
2*Kt_P1r_P1rc*P1r*(P1r-1)/(Nav*V);
Kt_P1r_Prc*P1r*Pr/(Nav*V);
Kp_Pr_Ac*Pr*A/(Nav*V);
Kp_Pr_Bc*Pr*B/(Nav*V);
Kp_Pr_Pc*Pr*P/(Nav*V);
2*Kt_Pr_Prc*Pr*(Pr-1)/(Nav*V)]; % Choice 30
end
toc;
Case 2:
tic;
for i = 1:100000
Rv = [Kd*I;
[K_Ir_Ac;K_Ir_Bc;K_Ir_Pc;Kt_Ir_Arc; ...
Kt_Ir_Brc;Kt_Ir_Prc;Kt_Ir_P1rc].*[A;B;P;Ar;Br;Pr;P1r]*Ir; ...
Ar*([Kp_Ar_Ac;Kp_Ar_Bc;Kp_Ar_Pc;2*Kt_Ar_Arc;Kt_Ar_Brc; ...
Kt_Ar_P1rc;Kt_Ar_Prc].*[A;B;P;(Ar-1);Br;P1r;Pr]); ...
Br*([Kp_Br_Ac;Kp_Br_Bc;Kp_Br_Pc;2*Kt_Br_Brc; ...
Kt_Br_P1rc;Kt_Br_Prc].*[A;B;P;(Br-1);P1r;Pr]); ...
P1r*([Kp_P1r_Ac;Kp_P1r_Bc;Kp_P1r_Pc;2*Kt_P1r_P1rc; ...
Kt_P1r_Prc].*[A;B;P;(P1r-1);Pr]);
Pr*([Kp_Pr_Ac;Kp_Pr_Bc;Kp_Pr_Pc;2*Kt_Pr_Prc].*[A;B;P;(Pr-1)])];
Rv(2:30,1) = Rv(2:30,1)/(Nav*V);
end
toc;
Case 3: Scalar instead of vector
tic;
for i = 1:100000
a1 = Kd*I;
a2 =(K_Ir_Ac*Ir*A)/(Nav*V);
a3 =(K_Ir_Bc*Ir*B)/(Nav*V);
a4 =(K_Ir_Pc*Ir*P)/(Nav*V);
a5 =Kt_Ir_Arc*Ir*Ar/(Nav*V);
a6 =Kt_Ir_Brc*Ir*Br/(Nav*V);
a7 =Kt_Ir_Prc*Ir*Pr/(Nav*V);
a8 =Kt_Ir_P1rc*Ir*P1r/(Nav*V);
a9 =Kp_Ar_Ac*Ar*A/(Nav*V);
a10=Kp_Ar_Bc*Ar*B/(Nav*V);
a11=Kp_Ar_Pc*Ar*P/(Nav*V);
a12=2*Kt_Ar_Arc*Ar*(Ar-1)/(Nav*V);
a13=Kt_Ar_Brc*Ar*Br/(Nav*V);
a14=Kt_Ar_P1rc*Ar*P1r/(Nav*V);
a15=Kt_Ar_Prc*Ar*Pr/(Nav*V);
a16= Kp_Br_Ac*Br*A/(Nav*V);
a17=Kp_Br_Bc*Br*B/(Nav*V);
a18=Kp_Br_Pc*Br*P/(Nav*V);
a19=2*Kt_Br_Brc*Br*(Br-1)/(Nav*V);
a20=Kt_Br_P1rc*Br*P1r/(Nav*V);
a21=Kt_Br_Prc*Br*Pr/(Nav*V);
a22=Kp_P1r_Ac*P1r*A/(Nav*V);
a23=Kp_P1r_Bc*P1r*B/(Nav*V);
a24=Kp_P1r_Pc*P1r*P/(Nav*V);
a25=2*Kt_P1r_P1rc*P1r*(P1r-1)/(Nav*V);
a26=Kt_P1r_Prc*P1r*Pr/(Nav*V);
a27=Kp_Pr_Ac*Pr*A/(Nav*V);
a28=Kp_Pr_Bc*Pr*B/(Nav*V);
a29=Kp_Pr_Pc*Pr*P/(Nav*V);
a30=2*Kt_Pr_Prc*Pr*(Pr-1)/(Nav*V);
end
toc;
The elapsed time for case 1, 2 and 3 are 0.163, 1.552 and 0.031 secs. Also, interestingly Rv(:,1) takes more time than simply using Rv. The A.*B.*C approach might be challenging as I can probably define A vector. However B and C might need to be computed in every loop which will, I think, might increase the computation time.
There are many guides to effectively write matlab codes for large matrixes, however there is none that I can find which suggests the effective programming with matlab for short matrixes (probably because short matrixes inherently take short time overall, unless you loop for many many many times, as in Monte Carlo simulations)
dpb
on 27 Dec 2013
Then move to the next set(s)...the point is to see if can get those that are recalculated into a more nearly vectorized form to make use of the internal routines of Matlab.
While the last case may take less time in the above calculation, will you lose that gain later on by having to deal with that many individual variables perhaps?
Amit
on 27 Dec 2013
I think the last approach will remain faster in terms of implementation, even though this makes the overall code aesthetically displeasing. As I mentioned in earlier comment that the Rv matrix is used to calculate weighted probability (the probability to choose 1 case out of 30 possibility) and use that to do further things for that case, the vectorization versus scalar will have no difference later on.
However few things I learnt from this whole discussion (thanks to you guys)
- One need to think about the temporary allocations by matlab (internally). This might become a 'out of memory issues' along with speed issues for large matrixes, however it will have impact in terms of speed even in shorter matrixes.
- The indexing in matlab (atleast until 2012b) is something to think about http://www.mathworks.com/matlabcentral/answers/54522-why-is-indexing-vectors-matrices-in-matlab-very-inefficient
- Scalars are faster than matrixes as matrixes need memory allocation/reallocation. However scalars get more messier after a while. Also, for short matrixes, the speed difference is very small unless the number of times the same calculations done are much much higher.
I am reattempting to code using the scalars instead of Rv vector and I'll find out if the difference between Rv method and scalar is how much.
dpb
on 27 Dec 2013
BTW, you're doing the timing tests from within m-files, not by pasting code sections at the command line, right?
Amit
on 27 Dec 2013
yup, all 3 sections were done in a single m-file. However, all variables had same values in these tests.
I just now tested the scenario using scalar versus Rv vector for the real case where veriables change in loop. it turns out that scalar method takes more time than vector method in this scenario. also, using RV instead of RV(:,1) is better time wise. ill do the test in case of factorization and compare using the complete code.
dpb
on 29 Dec 2013
OK; just wanted to be sure you did get the benefit of the JIT engine included in the testing; otherwise one can draw entirely the wrong conclusion.
Those results are all consistent with what I would have expected.
Again, the only way I can see you can help much is by looking at the code that generates these values earlier on and see what, if anything, can be done there to vectorize that, again with the idea of eventually getting to the vectorized version of the end result. As noted previously, there may not be much else to be done in which case one can then start thinking about mex-ing stuff if there's a serious bottleneck and this will be done more than just once or twice so that can just let it take a couple of hours (or whatever) at some point and then be done with it.
Amit
on 29 Dec 2013
when running the real simulation, the scalar value approah cumulatively takes more time than vector (probably cause the values are changing while in the test run, all the values were same). the bottleneck is not so significant that I'd try the mex approach .. my question was more out of curiosity. I learnt and understood matlab a bit more in this discussion.
More Answers (0)
Categories
Find more on Creating and Concatenating Matrices in Help Center and File Exchange
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)