Since your variable A_all is 3x55000x128, this would seem to suggest that, for iteration dir and frequency j, you want to set P_2D_MUSIC(dir,j)=A*Vj*Vj'*A', where A=A_all(:,dir,j)' (note the transpose), which is 1x3, and Vj=V(:,:,j), which is 3x2, so that A*Vj*Vj'*A' is 1x1. Is that right?
Assuming I've got this right, one thing that can make your code a bit faster is that you don't need to compute both A*Vj and Vj'*A', since these are just the (conjugate) transposes of each other. Instead, you can just compute and store Bj=A*Vj, and then set P_2D_MUSIC(dir,j)=Bj*Bj'.
Second, I see above you wrote, "Initially I planned to use full matrix A of size 55000x3x128 without loop and then squeeze output to desired size of 55000x 128 but , Matlab runs out of memory." Are you sure you did this right? Note that you're trying to compute 55000*128 individual matrix multiplications here, each of which evaluates to a 1x1. This means that the dimensions 55000 and 128 should correspond to the third and fourth dimensions of your full matrix A, and in particular 55000 should not be the first dimension as you've stated: if it were, then it would be one of the dimensions involved in the individual matrix multiplications. I'm guessing this is what you tried to do, in which case you would have been computing 128 55000x55000 matrices (or something similar). I can absolutely see how that might run you out of memory.
So one way to do this properly would (I think) be:
AA = permute(A_all,[4,1,3,2])
B = mtimesx(AA,V,'SPEED');
P_2D_MUSIC = squeeze(mtimesx(B,B,'c','SPEED'))';
I don't have mtimesx on this computer so I can't verify that I've done this properly, but hopefully you get the idea in any case.