Small Matrix (30x1) using significant time/memory to update in a loop

Question

1 vote

I have a small vector Rv (30x1) which need to be updated in every loop. Before going in the loop, I have initiated the vector assigning values zeros. On profiling (with memory and time), I find that this updating step takes a significant amount of time compared to others. Also, the memory usage seems to be higher than for what a 30x1 matrix should use. I am attaching the image of profiler analysis. All the other variables appearing (like Kd K_Ir_Ac etc) are simple scaler variables.

What am I missing? What can I do the reduce the overhead in terms of memory and time usage?

This analysis was done for only a fraction of total time to test the complete function. On running the code upto final point (~100-200 times more loops that in this test run), these times will matter. And moreover, this is just out of curiosity as well!!

1 Comment
Show -1 older comments Hide -1 older comments

dpb on 24 Dec 2013

Not enough info to tell for certain as there's no context to know what's invariant and what's not as well as sizes of everything else presuming some may not be simply constants?

Firstly, however, is there at least one redundant computation of the same values -- start by factoring the denominator out entirely from each computation and do it once at the end.

I didn't try to read the mishmash of similarly named variables to try to see what, if anything else, could either be factored or at least computed only once as a subfactor and used multiple locations.

But, need more context most of all methinks...

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

John D'Errico on 24 Dec 2013

Open in MATLAB Online

0 votes

Preallocating this vector with zeros is a waste of time! At least it is if immediately after that, you replace the entire vector with other numbers. So don't waste that time. (In general, preallocation is a VERY smart thing to do.)

Next, how many times must MATLAB compute the product (Nav*V), and then divide by it? Why force a computation to be done repeatedly? In fact, you go further than that, since you form things like Ir/(Nav*V) multiple times.

Next, I'd not be at all surprised to find that there is a difference in time between

Rv = zeros(30,1);
Rv(:,1) = [stuff];

and

Rv = [stuff];

Next, are ALL of these scalar variables changing all of the time? I would sincerely bet they are not. So why not precompute some parts of this vector and save the results?

Finally, I was going to try optimizing how your line is written, but since it is an image, I cannot copy it into the MATLAB editor.

18 Comments
Show 16 older comments Hide 16 older comments

dpb on 24 Dec 2013

Open in MATLAB Online

To amplify a little on Walter's and John's point on the preallocation -- IF (the proverbial "big if") you were filling an array by computing each individual array element in a loop sotoo

for i=1:N
  x(i)=RHS(i);
end

then preallocating would make a difference if x didn't already exist as Matlab otherwise would have to allocate a new array every pass thru, copying the new, expanded array into the original and destroying it. OTOH, if the whole array is there up front, then it does avoid that reallocation.

In your case, however, you're building a whole RHS vector at one time anyway, so whether the LHS has been previously allocated doesn't really make any big difference UNLESS Rv is more than a vector and Rv(:,2:end) is extant (which it appears from your description it doesn't). In that case the subscripting expression is superfluous and it could possibly actually take more time than simply the expression

Rv=[_RHSexpression_];

without the preallocation and since the whole array is being reassigned every time, there's nothing different in the end result.

Amit on 26 Dec 2013

Open in MATLAB Online

@John, I tested the approach you suggested but this actually increases the computation time for Rv matrix by around 4-5 times than the method I implemented before. I think probably breaking the matrix into different pieces, would create more temporary locations for computation and thus will need more time to update the matrix.

@dpb, the vector Rv is used to calculated weighted probability and then determine a choice among 30 choices of events. The scalar variables which changes frequently are I, A, B, P, Pr, P1r, Ar etc.. The scalar variable Kd, K_xx_xx, Kp_xx_xx, Kt_xx_xx, Nav and V are constant and known before going in the loop. A simple block diagram is like this:

while _condition_
   calculate Rv
   calculate weighted probability from Rv
   generate a random integer (case) between 1-30 using wt. prob.
   switch _case_
       case xx
   end
end

The question I asked was more out of curiosity. I was more surprised how a small matrix take significant time and memory to get updated. I understand the replies that Rv = [_stuff_] would be using some temporary locations to update Rv resulting in increased time and memory. I was just hoping if there is a way that I can increase the efficiency of the code, which will be a newer learning experience for me.

One approach, I was thinking of was using scalars for all 30 values, which will make an ugly code, however simple scalar computation is like 10 times faster than the same for a matrix. However, I am hoping this matlab community will teach/suggest a better elegant solution.

I am attaching the code from the image as well

Rv(:,1)  = [Kd*I;
        (K_Ir_Ac*Ir*A)/(Nav*V);
        (K_Ir_Bc*Ir*B)/(Nav*V);
        (K_Ir_Pc*Ir*P)/(Nav*V);
        Kt_Ir_Arc*Ir*Ar/(Nav*V);
        Kt_Ir_Brc*Ir*Br/(Nav*V);
        Kt_Ir_Prc*Ir*Pr/(Nav*V);
        Kt_Ir_P1rc*Ir*P1r/(Nav*V);
        Kp_Ar_Ac*Ar*A/(Nav*V);
        Kp_Ar_Bc*Ar*B/(Nav*V);
        Kp_Ar_Pc*Ar*P/(Nav*V);
        2*Kt_Ar_Arc*Ar*(Ar-1)/(Nav*V);
        Kt_Ar_Brc*Ar*Br/(Nav*V);
        Kt_Ar_P1rc*Ar*P1r/(Nav*V);
        Kt_Ar_Prc*Ar*Pr/(Nav*V);
        Kp_Br_Ac*Br*A/(Nav*V);
        Kp_Br_Bc*Br*B/(Nav*V);
        Kp_Br_Pc*Br*P/(Nav*V);
        2*Kt_Br_Brc*Br*(Br-1)/(Nav*V);
        Kt_Br_P1rc*Br*P1r/(Nav*V);
        Kt_Br_Prc*Br*Pr/(Nav*V);
        Kp_P1r_Ac*P1r*A/(Nav*V);
        Kp_P1r_Bc*P1r*B/(Nav*V);
        Kp_P1r_Pc*P1r*P/(Nav*V);
        2*Kt_P1r_P1rc*P1r*(P1r-1)/(Nav*V);
        Kt_P1r_Prc*P1r*Pr/(Nav*V);
        Kp_Pr_Ac*Pr*A/(Nav*V);
        Kp_Pr_Bc*Pr*B/(Nav*V);
        Kp_Pr_Pc*Pr*P/(Nav*V);
        2*Kt_Pr_Prc*Pr*(Pr-1)/(Nav*V)];    % Choice 30

dpb on 26 Dec 2013

Edited: dpb on 26 Dec 2013

Open in MATLAB Online

My point is that how are the various Kxxx, etc., computed before the psuedo-code section given?

I'm looking for a generalization of these to perchance vectorize the whole thing instead of enumerating them individually as you have. That would be the only way I see of making much difference but again, whether that's possible is indeterminate w/o seeing what they consist of. It may be that you would need to restructure the earlier data for the ability to get to the vector arrangement. Then again, if they're all just independent semi-random combinations of things or wildly different computations, then you're likely "just stuck".

Also, just out of curiosity, how much does the minor expedient previously noted of writing

Rv=[stuff];

eliminating the preallocation and rewriting into the existing array as currently written help (or hurt)?

ADDENDUM:

Just picking a few at random 'cuz they were close enough by to see at the same time on the monitor -- you've got stuff like

Kp_P1r_Ac
Kp_P1r_Bc
Kp_P1r_Pc

Are these somehow related such that even this subset of three could be combined into a single computational vectorized chunk to load a portion of A in my previous factorization of

A.*B.*C

? The more that could be generalized the better, of course, but again there's not enough context given for us to know whether there's any hope or not; it's just a suggestion of the only way otomh I can see to make any real difference in the single expression posted.

Amit on 27 Dec 2013

Edited: Amit on 27 Dec 2013

Open in MATLAB Online

Is there something specific I am missing in terms of the context?

The Kd, Kx_xx etc are calculated only once and that is before going in the loop. So I think the matrix can be factorized to get an A.

I tested the Rv=[stuff} for different scenarios for cputime (tic .. toc). All the values were kept constant during test.

Case 1:

tic;
for i = 1:100000
    Rv  = [Kd*I;
        (K_Ir_Ac*Ir*A)/(Nav*V);
        (K_Ir_Bc*Ir*B)/(Nav*V);
        (K_Ir_Pc*Ir*P)/(Nav*V);
        Kt_Ir_Arc*Ir*Ar/(Nav*V);
        Kt_Ir_Brc*Ir*Br/(Nav*V);
        Kt_Ir_Prc*Ir*Pr/(Nav*V);
        Kt_Ir_P1rc*Ir*P1r/(Nav*V);
        Kp_Ar_Ac*Ar*A/(Nav*V);
        Kp_Ar_Bc*Ar*B/(Nav*V);
        Kp_Ar_Pc*Ar*P/(Nav*V);
        2*Kt_Ar_Arc*Ar*(Ar-1)/(Nav*V);
        Kt_Ar_Brc*Ar*Br/(Nav*V);
        Kt_Ar_P1rc*Ar*P1r/(Nav*V);
        Kt_Ar_Prc*Ar*Pr/(Nav*V);
        Kp_Br_Ac*Br*A/(Nav*V);
        Kp_Br_Bc*Br*B/(Nav*V);
        Kp_Br_Pc*Br*P/(Nav*V);
        2*Kt_Br_Brc*Br*(Br-1)/(Nav*V);
        Kt_Br_P1rc*Br*P1r/(Nav*V);
        Kt_Br_Prc*Br*Pr/(Nav*V);
        Kp_P1r_Ac*P1r*A/(Nav*V);
        Kp_P1r_Bc*P1r*B/(Nav*V);
        Kp_P1r_Pc*P1r*P/(Nav*V);
        2*Kt_P1r_P1rc*P1r*(P1r-1)/(Nav*V);
        Kt_P1r_Prc*P1r*Pr/(Nav*V);
        Kp_Pr_Ac*Pr*A/(Nav*V);
        Kp_Pr_Bc*Pr*B/(Nav*V);
        Kp_Pr_Pc*Pr*P/(Nav*V);
        2*Kt_Pr_Prc*Pr*(Pr-1)/(Nav*V)];    % Choice 30
end
toc;

Case 2:

tic;
for i = 1:100000
    Rv  = [Kd*I;
        [K_Ir_Ac;K_Ir_Bc;K_Ir_Pc;Kt_Ir_Arc; ...
        Kt_Ir_Brc;Kt_Ir_Prc;Kt_Ir_P1rc].*[A;B;P;Ar;Br;Pr;P1r]*Ir; ...
        Ar*([Kp_Ar_Ac;Kp_Ar_Bc;Kp_Ar_Pc;2*Kt_Ar_Arc;Kt_Ar_Brc; ...
        Kt_Ar_P1rc;Kt_Ar_Prc].*[A;B;P;(Ar-1);Br;P1r;Pr]); ...
        Br*([Kp_Br_Ac;Kp_Br_Bc;Kp_Br_Pc;2*Kt_Br_Brc; ...
        Kt_Br_P1rc;Kt_Br_Prc].*[A;B;P;(Br-1);P1r;Pr]); ...
        P1r*([Kp_P1r_Ac;Kp_P1r_Bc;Kp_P1r_Pc;2*Kt_P1r_P1rc; ...
        Kt_P1r_Prc].*[A;B;P;(P1r-1);Pr]);
        Pr*([Kp_Pr_Ac;Kp_Pr_Bc;Kp_Pr_Pc;2*Kt_Pr_Prc].*[A;B;P;(Pr-1)])];
    Rv(2:30,1) = Rv(2:30,1)/(Nav*V);
end
toc;

Case 3: Scalar instead of vector

tic;
for i = 1:100000
    a1 = Kd*I;
    a2 =(K_Ir_Ac*Ir*A)/(Nav*V);
    a3 =(K_Ir_Bc*Ir*B)/(Nav*V);
    a4 =(K_Ir_Pc*Ir*P)/(Nav*V);
    a5 =Kt_Ir_Arc*Ir*Ar/(Nav*V);
    a6 =Kt_Ir_Brc*Ir*Br/(Nav*V);
    a7 =Kt_Ir_Prc*Ir*Pr/(Nav*V);
    a8 =Kt_Ir_P1rc*Ir*P1r/(Nav*V);
    a9 =Kp_Ar_Ac*Ar*A/(Nav*V);
    a10=Kp_Ar_Bc*Ar*B/(Nav*V);
    a11=Kp_Ar_Pc*Ar*P/(Nav*V);
    a12=2*Kt_Ar_Arc*Ar*(Ar-1)/(Nav*V);
    a13=Kt_Ar_Brc*Ar*Br/(Nav*V);
    a14=Kt_Ar_P1rc*Ar*P1r/(Nav*V);
    a15=Kt_Ar_Prc*Ar*Pr/(Nav*V);
    a16= Kp_Br_Ac*Br*A/(Nav*V);
    a17=Kp_Br_Bc*Br*B/(Nav*V);
    a18=Kp_Br_Pc*Br*P/(Nav*V);
    a19=2*Kt_Br_Brc*Br*(Br-1)/(Nav*V);
    a20=Kt_Br_P1rc*Br*P1r/(Nav*V);
    a21=Kt_Br_Prc*Br*Pr/(Nav*V);
    a22=Kp_P1r_Ac*P1r*A/(Nav*V);
    a23=Kp_P1r_Bc*P1r*B/(Nav*V);
    a24=Kp_P1r_Pc*P1r*P/(Nav*V);
    a25=2*Kt_P1r_P1rc*P1r*(P1r-1)/(Nav*V);
    a26=Kt_P1r_Prc*P1r*Pr/(Nav*V);
    a27=Kp_Pr_Ac*Pr*A/(Nav*V);
    a28=Kp_Pr_Bc*Pr*B/(Nav*V);
    a29=Kp_Pr_Pc*Pr*P/(Nav*V);
    a30=2*Kt_Pr_Prc*Pr*(Pr-1)/(Nav*V);
end
toc;

The elapsed time for case 1, 2 and 3 are 0.163, 1.552 and 0.031 secs. Also, interestingly Rv(:,1) takes more time than simply using Rv. The A.*B.*C approach might be challenging as I can probably define A vector. However B and C might need to be computed in every loop which will, I think, might increase the computation time.

There are many guides to effectively write matlab codes for large matrixes, however there is none that I can find which suggests the effective programming with matlab for short matrixes (probably because short matrixes inherently take short time overall, unless you loop for many many many times, as in Monte Carlo simulations)

Amit on 27 Dec 2013

I think the last approach will remain faster in terms of implementation, even though this makes the overall code aesthetically displeasing. As I mentioned in earlier comment that the Rv matrix is used to calculate weighted probability (the probability to choose 1 case out of 30 possibility) and use that to do further things for that case, the vectorization versus scalar will have no difference later on.

However few things I learnt from this whole discussion (thanks to you guys)

One need to think about the temporary allocations by matlab (internally). This might become a 'out of memory issues' along with speed issues for large matrixes, however it will have impact in terms of speed even in shorter matrixes.
The indexing in matlab (atleast until 2012b) is something to think about http://www.mathworks.com/matlabcentral/answers/54522-why-is-indexing-vectors-matrices-in-matlab-very-inefficient
Scalars are faster than matrixes as matrixes need memory allocation/reallocation. However scalars get more messier after a while. Also, for short matrixes, the speed difference is very small unless the number of times the same calculations done are much much higher.

I am reattempting to code using the scalars instead of Rv vector and I'll find out if the difference between Rv method and scalar is how much.

dpb on 29 Dec 2013

OK; just wanted to be sure you did get the benefit of the JIT engine included in the testing; otherwise one can draw entirely the wrong conclusion.

Those results are all consistent with what I would have expected.

Again, the only way I can see you can help much is by looking at the code that generates these values earlier on and see what, if anything, can be done there to vectorize that, again with the idea of eventually getting to the vectorized version of the end result. As noted previously, there may not be much else to be done in which case one can then start thinking about mex-ing stuff if there's a serious bottleneck and this will be done more than just once or twice so that can just let it take a couple of hours (or whatever) at some point and then be done with it.

Amit on 29 Dec 2013

when running the real simulation, the scalar value approah cumulatively takes more time than vector (probably cause the values are changing while in the test run, all the values were same). the bottleneck is not so significant that I'd try the mex approach .. my question was more out of curiosity. I learnt and understood matlab a bit more in this discussion.

Sign in to comment.

Small Matrix (30x1) using significant time/memory to update in a loop

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

18 Comments
Show 16 older comments Hide 16 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Small Matrix (30x1) using significant time/memory to update in a loop

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

18 Comments Show 16 older comments Hide 16 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

18 Comments
Show 16 older comments Hide 16 older comments