gpuArray colunwise opertations on matrix ?

nah (view profile)

on 13 Sep 2013
I have a custom function that takes in a m by 2 matrix (2 columns) and operates on it. It's quite a bit complicated function as it involves several matrix multiplications going sequentially through one of the column vectors (in a for loop) and depending on the corresponding value from the other column vector choose the matrix to multiply. More like a cumulative matrix product with elements on on column but conditional upon values in one of the column.
eg.,:
col1 col2
0 0.03
0 0.04
1 0.02
0 0.1
1 0.004
if values are 0, one matrix is chosen to multiply or if it's 1 a different one is chosen. Then a cumulative matrix product is taken. ie., Values = diag(Valuesmat); cumulMatProduct = ini;
for ix = 1:length(col2)
if col1(ix) == 0
matrixToMultiply = matrix1;
elsif col1(ix) == 1
matrixToMultiply = matrix2;
end
anotherMatrixtoMultiply = diag( exp(Values).*col2(ix) );
cumulMatProduct = matrixToMultiply*anotherMatrixtoMultiply*cumulMatProduct;
end
etc.,
Basically that's what the function does.
Now, I have a large number of such column data and so would like to know if I could use GPU computation with it. ( having access to Matlab r2013A with PCT & a TESLA s2050 )
I would like do something like:
DataMatrix1 = [col1; col1; col1] ;
DataMatrix2 = [col2; col2; col2];
gpuDat1 = gpuArray(DataMatrix1);
gpuDat2 = gpuArray(DataMatrix2);
[resultVect] = myFuncCall(gpuDat1, gpuDat2, ValueMat,ini);
%(ValueMat & ini is not sliced & each processor will have its copy)
ie., slice the matrix as columns to each of the gpuProcessor & make each processor use myfunction to give me an output of the cumulativeMatrixProduct for those input columns of data. (more like independent, grained parallelization to cpu nodes/workers but on GPUs)? Or even what is the best way to do this in parallel ? (even just with CPUs/Workers. Is matlabpool the best option ?

Products

Answer by Jill Reese

Jill Reese (view profile)

on 18 Sep 2013

It looks like you could set up all the data in one pass. You might try organizing your data such that the matrix to use for col1(ix) was stored in matrixToMultiply(:,:,ix) and the matrix corresponding to col2(ix) was stored in anotherMatrixToMultiply(:,:,ix). You haven't mentioned the size of your data, so this may very well cause you to run out of memory on your GPU. However, if these variables can fit on your GPU as gpuArrays then you can use
pagefun(@mtimes, matrixToMultiply, anotherMatrixToMultiply)
to perform all of the matrix mutliplications at one time in an efficient way.

nah

nah (view profile)

on 19 Sep 2013
TotalMemory: 2.8180e+09
FreeMemory: 2.7364e+09
is what the gpuDevice output says, and guess that will be sufficient for the data of 1000 x 1000 (~10^6 points in total) I guess.
But, pagefun command is not found. Is it a part of some particular toolBox ?
Could you kindly elaborate a bit more on the conditional selection part ?
Mainly, as I need to walk through every value in the column & then choose a matrix to multiply based on its value, how is done in in parallel for the gpuData is something I don't understand.
Anand

Anand (view profile)

on 19 Sep 2013
You'll probably need the latest release R2013b to be able to use pagefun. It's in the Parallel Computing Toolbox
nah

nah (view profile)

on 19 Sep 2013
This looks like the equivalent of Jacket's GFOR. Need to check for updates at my institution for R2013b. Even this would not solve the problem at my hand of doing sequential (cumulative) multiplications on the column vectors. Would be nice to have your inputs on that problem at hand.
simply even :
for ix = 1:length(col1)
if (col1(ix) ==0 )
cumulProduct = simpleMatrix0 * cumulProduct;
if (col1(ix) ==1 )
cumulProduct = simpleMatrix2 * cumulProduct;
end
end
for columns distributes/sliced as pages of gpuArray ? (so that pagefun may be applicable)