Multiply large matrix by scalar - speed issue

Ok this probably doesn't have an answer, but I have a large-ish matrix (1024x1024) which I want to scale by a constant. I need to do this for two equal sized matrices. Unfortunately, when I do this using Matrix = Matrix*c and Matrix2 = Matrix2*c2, it takes 90% of the time of the entire rest of the code, (thereby slowing things to a crawl) which does several numeric integrations and calculations on vectors of size 1024 and the like. I was wondering if there's some other way to get this to go faster, though it seems doutful since the entirety of the code in question is Matrix=Matrix*c. Well any help is appreciated.

 Accepted Answer

OK, here is a mex routine that does the exact calculation you have shown IN-PLACE, and it is bare bones without any argument checking. Translation: it is dangerous to use and will crash MATLAB if you do not pass it exactly what it wants. The idea is to try this out just to see how much difference it makes in your timing. If it is significant then we can fiddle with the safety issues. To compile it do the following:
Call the file multiply1024.c
Type the following at the MATLAB prompt:
mex -setup
(wait for the prompt, then press Enter)
(enter the number of a C compiler such as lcc)
(press Enter again)
mex multiply1024.c
Then replace this line:
W_array(1:1024,1:1024)=W_array(1:1024,1:1024)*(1/tau)
with this:
multiply1024(W_array,W_array,1/tau)
and replace this line:
gNMDA_W_array(1:1024,1:1024)=W_array(1:1024,1:1024)*pyr
with this:
multiply1024(gNMDA_W_array,W_array,pyr)
Report back on timing differences and we will go from there.
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize i, j, Cm, Am;
double *Cpr, *Apr, *Cpr0, *Apr0;
double b;
Cm = mxGetM(prhs[0]);
Am = mxGetM(prhs[1]);
b = mxGetScalar(prhs[2]);
Cpr = Cpr0 = mxGetPr(prhs[0]);
Apr = Apr0 = mxGetPr(prhs[1]);
for( j=0; j<1024; j++ ) {
Cpr = Cpr0 + j * Cm;
Apr = Apr0 + j * Am;
for( i=0; i<1024; i++ ) {
*Cpr++ = *Apr++ * b;
}
}
}

10 Comments

Thanks, I will have to wait until tomorrow to try it out as I don't have my code at home.
It sort of rubs me the wrong way that I have to make a whole routine just to multiply a matrix slice, though I understand the reasoning behind it, but if it works it'll be great.
That's the price you pay for just about any array slice operation in MATLAB.
I implemented the mex routine and things have sped up considerably (8 times faster or so). Though the two lines are still much slower than the rest of the code, it is a pretty big improvement. Unfortunately, I have trouble understanding the code and what it's doing as I'm not that familiar with what looks like C. I'd also like to make it so instead of 1024 I can have some constant Ne. My guess from looking at the code is I need to do something like int Ne in the arguments and mxGetPr(prhs[4]) instead of i < 1024, j < 1024 at the loop.
Oops I had that wrong, it's more of a 3x speedup. Still pretty good
Specify what your requirements/desirements are and we can set it up. The outer j loop is looping on the column number, and the inner i loop is looping down the row indexes of each column. It will be pretty easy to make those adjustable based on inputs.
Well I'm just not clear on what the lines are doing and I like to be able to understand the code for future changes, so specifically:
What exactly is prhs, nlhs, plhs, nrhs. They don't seem to be the arguments I pass in, so I'm wondering how I can add more arguments (arrays or otherwise) and call them. One such argument would be the array size (e.g. sometimes I might decide to use larger matrices) which I guess in this case you might not need to pass and just get from mxGetM right?
So you get the rows from the arrays from prhs? Is prhs[0] one array and prhs[1] another?
So let me try to summarize what I think it's doing. prhs stores the arguments you pass. Not sure what the other three things do (nlhs, plhs nrhs, etc). mxGetM gets the row sizes, mxGetPr gets the data, which you store in both Cpr and Cpr0. Cpr represents the calculated values while Cpr0 is unchanged. Is that correct?
So which statement sets the data into the original array (e.g. W_array)?
I will not try to educate you on the C language here on this forum, so I will just throw these comments at you:
nlhs = The number of output variables requested by the caller
plhs = Array of variable pointers to the output variables your mex function creates.
nrhs = The number of input variables in the arguments list of the caller
prhs = Array of variable pointers to the input variables
Then the mex code works with the prhs array. prhs[0] is the pointer to the first input argument. prhs[1] is the pointer to the second input argument. etc.
You get at the contents of the prhs[etc] variable with functions that MATLAB provides in the External Interfaces API (see the doc). mxGetM gets the number of rows, mxGetN gets the number of columns, mxGetPr gets a pointer to the data of a double, etc.
*Cpr++ = *Apr++ * b; This takes the value that Apr currently points to, multiplies it by b, and then assigns the result to the value that Cpr currently points to. It also increments each of the pointers by 1 (points to the next element in memory) after use.
If you wanted to pass in the size of the block to multiply, you could pass them in as the 4th and 5th arguments, prhs[3] and prhs[4]. Then you could get the values of these with mxGetScalar. Then use these values as limits in the loops. Etc.
I accepted this answer since it speeds up x3. Your other answer also helped make the issue clear, and I'll see if I can restructure the code such that W_array and the like are split into 3 or 4 arrays each instead of taking slices of them. Thanks for the explanation of what it's doing, i get it now, didn't realize those things were pointers even though the * for some reason.
Thanks, but I will again caution you that the above code has no argument checking and it requires that the 1st input is not shared with another non-temporary variable. I know that statement probably doesn't make much sense to you at this time, but it is an important restriction. If you plan on using it just let me know exactly the interface you need and I will post a mex function with some argument checks.
Well, as you guessed, I'm not sure what the restriction means to me so I will describe the use instead. W_array and gNMDA_W_array have the same use, in that they represent weights in a network with recursive connections. MxN matrix is the weight of node M on N (So W_array(1,2) is weight of 1 on 2, while W_array(2,1) is weight of 2 on 1).
Both arrays are initialized as MxN matrices of constants to start, with gNMDA_W_array being initialized as a scaled version of W_array.
The only lines of code that change the contents of the arrays are the two lines I posted above (which are now changed to use the mex function) as well as the following for W_array:
W_array:
Ne=1024;
for ii = TMP
W_array(ii,1:Ne)=W_array(ii,1:Ne).*(1-rate_dw)+(((n_bind-n_ubind)./(1+(abs(fired_vector-fired_vector(ii)))))+n_ubind).*(rate_dw);
end
I'm not sure if this answers your question about the 'interface' that I need, but basically the goal is just to simply scale a chunk of the array (e.g. one class of nodes scales connections to eachother over time but not to the other class of nodes, where their incoming and outgoing connections stay constant).

Sign in to comment.

More Answers (2)

MATLABs matrix * scalar is pretty fast and multi-threaded to boot. Unless you have complex numbers involved I don't think you will be able to speed this up, even with a mex routine. It is possible that Matrix and Matrix2 are shared and that is why the scalar multiplications are taking so long because it involves a data-copy and can't be done in-place. Can't tell without seeing more of your code.

4 Comments

What do you mean by shared? I'm not sure what else of the code you want, but the two lines that are slow are of the form Matrix = Matrix*c and are:
W_array(1:1024,1:1024)=W_array(1:1024,1:1024)*(1/tau)
***More code here that modifies W_array but only a few select elements and is therefore fast****
gNMDA_W_array(1:1024,1:1024)=W_array(1:1024,1:1024)*pyr
where tau and pyr are constants
What size is W_array? Every time you do an array slice in MATLAB an entire data set copy takes place. i.e., the simple expression W_array(1:1024,1:1024) is an array slice where 1024*1024 data elements must first be copied to a new memory location, then the data at this new location gets multiplied by 1/tau, then that result gets copied back into W_array(1:1024,1:1024). In all likelihood it is the data copies that are using the most time, not the multiplies themselves. In that case a mex routine *can* avoid the data copy and run faster. Which gets me back to my first question: What size are your arrays?
Both W_array and gNMDA_W_array are 1280x1280. If that's the case, it might be trivial to split off the arrays e.g. W_array1 having the 1024x1024 values multiplied and W_array2 having the remaining 256x256. Would that be about the same as using mex?
Though that might be tricky since I'd need to break W_array into 4 arrays and gNMDA_W_array into another 4. Then doing the math with them might be annoying or slow. I'll look into it.

Sign in to comment.

Tags

Asked:

CP
on 18 Apr 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!