Complex matrix multiplication with pagemtimes
29 views (last 30 days)
Show older comments
I'm optimizing some function and I end up having to do voxelwise quadratic matrix multiplication a'Ba on complex mats. While testing out the speed of different codes, I came across a weird coincidance, having a or B be real and the other complex, my matrix multiplication is an order of magnitude slower! Does anyone have an explaination for this? The work around is just to add 1i*eps but you'd think this shouldn't need to be done in a software as mature as matlab...
%clc
a = rand(50,1)+1i*eps;
b = rand(50,50,10000); %+1i*eps ;
tic
for n=1:100
c= pagemtimes( a',pagemtimes(b,a));
end
toc
tic
for n=1:100
c = sum(pagemtimes(b,a).*conj(a),1);
end
toc
b=b+1i*eps;
tic
for n=1:100
c= pagemtimes( a',pagemtimes(b,a));
end
toc
tic
for n=1:100
c = sum(pagemtimes(b,a).*conj(a),1);
end
toc
3 Comments
Accepted Answer
James Tursa
on 7 Jul 2023
Edited: James Tursa
on 7 Jul 2023
This really has nothing to do with the maturity of MATLAB ... every language that uses BLAS routines in the background will have the same issue. The explanation is because of the way real and complex variables are stored, and how they are multiplied in the background by the BLAS libraries. Real variables are stored in a contiguous block of memory with all the elements next to each other. Since R2018a Complex variables are stored in a contiguous block of memory with the real and imaginary parts interleaved. Matrix multiplication of arrays in MATLAB is accomplished by calling the appropriate BLAS library routine in the background. But there are no complex-real routines in this library. You either have to pass in two real variables, or two complex variables. So when you have a complex times real situation, the real variable first has to be deep copied to new memory in an interleaved fashion with 0's filling the imaginary spots, then the BLAS library is called with these inputs. So these deep copies (to twice the memory size) and all those 0 multiplies are the culprit, but if you have mixed complexity types and want to use BLAS routines in the background this penalty can't be avoided (and would be the same in any other language that used BLAS routines).
BTW, if possible you should be careful doing a' or conj(a) ahead of time as these can cause deep copies also. It might be best to use the transpX and transpY options of pagemtimes( ) to essentially get the same result without making explicit deep copies of the inputs.
Side Note: In MATLAB R2017b and earlier, complex variables were stored with the real and imaginary parts in separate blocks of memory, so mixed matrix multiplies could be accomplished by calling the real*real routine on the individual parts in the background, avoiding the deep copies and avoiding a lot of 0 multiplies. Bottom line is that mixed matrix multiplies in R2017b and earlier is faster than R2018a and later.
*** EDIT ***
If "a" is a complex vector and B is real with symmetric pages, you can mimic what happens in R2017b by splitting up "a" into its real and imaginary parts and doing the computations separately. This avoids all unnecessary deep data copies and 0 multiplies and should be significantly faster. E.g., in this particular case you would have
ar = real(a); ai = imag(a);
Then do the pagewise calculations ar'*B*ar + ai'*B*ai to get the result. Because of the B symmetry, all of the imaginary result can be shown to be 0, so this method avoids explicitly computing it entirely. And you don't even need pagemtimes for this. You can turn your B array into one big 2D array and do normal matrix multiplies. E.g.,
N = numel(a);
BN = reshape(B,N,[]);
result = ar' * reshape( ar' * BN, N, [] ) + ai' * reshape( ai' * BN, N, [] );
Test using reshape and normal matrix multiply:
a = rand(50,1) + rand(50,1)*1i; % a complex vector
B = rand(50,50,100000); % make it big and real
B = B + pagetranspose(B); % make it symmetric
tic
N = numel(a);
ar = real(a); ai = imag(a);
BN = reshape(B,N,[]);
result_r = ar' * reshape( ar' * BN, N, [] ) + ai' * reshape( ai' * BN, N, [] );
toc
Test using pagemtimes:
tic
result_p = pagemtimes( a,'ctranspose',pagemtimes(B,a),'none');
toc
Compare:
result_r(1:10)
reshape(result_p(1:10),1,[])
So, same result but a lot faster, and without those residual imaginary values that are only non-zero because of numerical effects. E.g.,
norm(imag(result_p(:)))
If the B pages are not symmetric, then the imaginary part of the result will not be 0. However, you can still probably save time by splitting things up and avoiding the intermediate 0 multiplies. E.g., just do the pagewise equivalent of:
ar'*B*ar + ai'*B*ai + (ar'*B*ai - ai'*B*ar)*1i
which would be:
tic
N = numel(a);
ar = real(a); ai = imag(a);
BN = reshape(B,N,[]);
result = complex(ar' * reshape( ar' * BN, N, [] ) + ai' * reshape( ai' * BN, N, [] ),...
ai' * reshape( ar' * BN, N, [] ) + ar' * reshape( ai' * BN, N, [] ));
toc
Still quite a bit faster than the pagemtimes( ) method with all those unnecessary 0 multiplies. When the B pages are symmetric, that last imaginary part is identically 0 and drops out.
3 Comments
Bruno Luong
on 8 Jul 2023
Edited: Bruno Luong
on 8 Jul 2023
Few methods and timings I play with on my Laptop, seemps twice faster than TMW online machine.
I don't know whereas BLLAS is behid tensorprod, if someone knows please inform. But it is competitive to pagemtimes.
a = complex(rand(50,1),rand(50,1));
b = rand(50,50,100000);
tic
c = pagemtimes( a',pagemtimes(b,a));
toc % Elapsed time is 0.809244 seconds.
tic
c = pagemtimes( a',pagemtimes(complex(b),a));
toc % Elapsed time is 0.846693 seconds.
tic
tmp = tensorprod(a',b,2,1);
c = tensorprod(tmp,a,2,1);
c = reshape(c,1,1,size(b,3));
toc % Elapsed time is 0.821394 seconds.
tic
A = conj(a)*a.';
c = tensorprod(A,b,[1 2],[1 2]);
c = reshape(c,1,1,size(b,3));
toc % Elapsed time is 0.796169 seconds.
[m,n,p] = size(b);
tic
ar = reshape(real(a),1,m);
ai = reshape(imag(a),1,m);
b = reshape(b,m,n*p);
d = [ar; -ai]*b;
d = complex(d(1,:),d(2,:));
d = reshape(d,n,p);
c = reshape(a.'*d,1,1,p);
b = reshape(b,m,n,p);
toc % Elapsed time is 0.089465 seconds.
Bruno Luong
on 8 Jul 2023
Edited: Bruno Luong
on 8 Jul 2023
One more method, so far the fatest
a = complex(rand(50,1),rand(50,1));
b = rand(50,50,100000);
tic
[m,n,p] = size(b);
A = conj(a)*a.';
b = reshape(b,m*n,p);
A = reshape(A,1,m*n);
Ar = real(A);
Ai = imag(A);
c = [Ar; Ai]*b;
c = reshape(complex(c(1,:),c(2,:)),1,1,p);
b = reshape(b,m,n,p);
toc % Elapsed time is 0.059821 seconds.
More Answers (0)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!