# Matrix multiplication optimization for MUSIC DOA

3 views (last 30 days)
Jayant chouragade on 13 Aug 2020
Commented: Jayant chouragade on 16 Aug 2020
Hi,
I am computing Direction of Arrival of wideband signal using MUSIC. Parameters are as below:
Antenna Array: L shape.
Signal freq. range: 20-80 MHz ( Narrow pulsed signals).
I am computing MUSIC Spatial spectrum as below code:
for dir=1:Total_dir
P_2D_MUSIC(dir,1:128)=mtimesx(mtimesx(A_all(1:3,dir,1:128),'c',V(1:3:,2:end,1:128),'SPEED'),mtimesx(V(1:3,2:end,1:128),'c',A_all(1:3,dir,1:128),'SPEED'),'SPEED');
end
Where P_2D_MUSIC is Music Spatial spectrum in each direction 'dir'. 'dir ' direction pair (Azimuth ,elevation) and there are about 55000 such pairs. V is eigen vectors of received signals by three antenna. A is direction array .
Anyway , A_all is of size 3x55000x128 , V is of 3x 2x128 matix. And above loop mulitply these matirices and compute P_2D_MUSIC for each direction at each frequency points (1:128). P= A'*V*V''*A
Problem: Present loop is taking about 3 sec for execution. However I need to run these loop for several data points (~ 500000) which will take a lot of time to finish.
I tried using 'parfor' but that takes more time than present 3 sec.
I am looking to optimise this loop processing in terms of speed.
regards
jayant

Show 1 older comment
Jayant chouragade on 13 Aug 2020
Walter you are right V" is typo it meant to be V'. I want to compute a 55000x128 matirx (expected size of P_2D_MUSIC).
In side the loop is one iteration ,
first Multiplication A' * V is of size (1x 3x128 ) *(3x2x128)=1x2x128.
Second multiplication is V'*A is of size (2x3x128)*(3x1x128)= 2x1x128.
And outer multiplication is P= (A'*V)*(V'*A) of size (1x2x128)*(2x1x128)= 1x128.
Hence when loop runs for 55000 times P_2D_Music would have size of 55000x128.
Walter Roberson on 13 Aug 2020
A'*V is 55000 x 3 x 128 * 3 x 2 x 128 -> 55000 x 2 x 128
V' * A is 2 x 3 x 128 * 3 x 55000 x 128 -> 2 x 55000 x 128
and then when you multiply the two you get 55000 x 55000 x 128
That is, if you had 1 layer instead of 128 layers then the formula given would have a 55000 x 55000 output, not a 55000 x 1 output.
Jayant chouragade on 13 Aug 2020
For more clarification I would like to highlight below points. 1. I am using A as 1x 3x128 in each iteration. 2. Output of each iteration is 1x 1 x 128. However due to assignment P_2D_Music(dir,:) RHS 1x1x128 is being squeezed to 1x128. 3. Initially I planned to use full matrix A of size 55000x3x128 without loop and then squeeze output to desired size of 55000x 128 but , Matlab runs out of memory.

Dana on 13 Aug 2020
Edited: Dana on 13 Aug 2020
Since your variable A_all is 3x55000x128, this would seem to suggest that, for iteration dir and frequency j, you want to set P_2D_MUSIC(dir,j)=A*Vj*Vj'*A', where A=A_all(:,dir,j)' (note the transpose), which is 1x3, and Vj=V(:,:,j), which is 3x2, so that A*Vj*Vj'*A' is 1x1. Is that right?
Assuming I've got this right, one thing that can make your code a bit faster is that you don't need to compute both A*Vj and Vj'*A', since these are just the (conjugate) transposes of each other. Instead, you can just compute and store Bj=A*Vj, and then set P_2D_MUSIC(dir,j)=Bj*Bj'.
Second, I see above you wrote, "Initially I planned to use full matrix A of size 55000x3x128 without loop and then squeeze output to desired size of 55000x 128 but , Matlab runs out of memory." Are you sure you did this right? Note that you're trying to compute 55000*128 individual matrix multiplications here, each of which evaluates to a 1x1. This means that the dimensions 55000 and 128 should correspond to the third and fourth dimensions of your full matrix A, and in particular 55000 should not be the first dimension as you've stated: if it were, then it would be one of the dimensions involved in the individual matrix multiplications. I'm guessing this is what you tried to do, in which case you would have been computing 128 55000x55000 matrices (or something similar). I can absolutely see how that might run you out of memory.
So one way to do this properly would (I think) be:
AA = permute(A_all,[4,1,3,2]) % add an initial singleton dimension and swap 55000 and 128
% dimensions; result is 1x3x128x55000
B = mtimesx(AA,V,'SPEED');
P_2D_MUSIC = squeeze(mtimesx(B,B,'c','SPEED'))';
I don't have mtimesx on this computer so I can't verify that I've done this properly, but hopefully you get the idea in any case.

Jayant chouragade on 14 Aug 2020
Suggested conjugate method P_2D_MUSIC(dir,j)=Bj*Bj' improves the performance by 1 second . Computation time using 4D permuted AA matrix is about 0.8 sec. But P_2D_MUSIC values are not same as obtained from sequencial compuation using P_2D_MUSIC(dir,j)=Bj*Bj' . I could not figure out the reasons.
Futher I would like to improve the compuation time even better and would like to have it better then 0.01 second.
Dana on 14 Aug 2020
Regarding why using the permuted matrix AA is giving you a different answer, I see a couple of things. First, in your original post, I missed that you're excluding the first column of V: your matrix multiplications involve V(:,2:end,:), not V(:,:,:)=V(:,1:end,:). Assuming that's what you intended, the code I posted would need to be altered appropriately.
Second, it occurs to me is that the code I wrote out before isn't robust to dealing with complex variables (I typically work with real numbers, so I'm not used to this being an issue). Is your data complex? If so, recall that, using the iterative method (rather than the big AA matrix), you have Bj=A_all(:,dir,j)'*V(:,2:end,j). If A_all is complex, then A_all(:,dir,j)' is the complex conjugate transpose (i.e., transpose + complex conjugate) of a column of A_all. Assuming again that's actually what you want, the code I posted would also need to be adjusted for this, since my method only effectively took the non-conjugate transpose of A_all, and then at the end took a conjugate transpose of something that it shouldn't have.
With both of these fixes in place, the code should be something like:
AA = conj(permute(A_all,[4,1,3,2])) % add an initial singleton dimension and swap 55000
% and 128 dimensions, then take complex conjugate; result
% is 1x3x128x55000
B = mtimesx(AA,V(:,2:end,:),'SPEED');
P_2D_MUSIC = squeeze(mtimesx(B,B,'c','SPEED')).';
So first, I've applied a conjugate operation at the first step to deal with the complex transpose issue. I've also changed the conjugate transpose (') after the squeeze on the last line to a non-conjugate transpose (.') to deal with the same issue. Finally, the V matrix used to get B now excludes the first column as you did.
Hopefully, between these two fixes, you'll now be getting the same answer (though whether that's actually the right answer is something you need to confirm for yourself).
Lastly, regarding computation times, based on my experience I would guess that you're highly unlikely to get the times down to what you're looking for. 55,000x128 = 7+ million matrix multiplications is going to take some time, and if you can get it down to 0.8 seconds I'd say you're doing pretty well. Since you need to do this 500,000 times, your best bet may be to parallelize at that step (i.e., run a parfor across the 500,000 data points), but even then, depending on how many cores your machine has, it'll probably take at least a day to run, maybe longer. Sometimes that's just how it goes.
Jayant chouragade on 16 Aug 2020
Thank you . This serves the purpose.