Fast dot product in three dimensions

Dear community I am trying to speed up my code, and the bottleneck is the following operation.
A=rand(3,3,3);
B=rand(3,1);
tic
for i=1:1000000
C=B(1).*A(:,:,1)+B(2).*A(:,:,2)+B(3).*A(:,:,3);
end
toc
I wrote 1000000, but it can be much more. Anyone getting a faster solution? Another way I found to avoid to repeat A, but is slower since it involves reshape and 3D multiplications is
sum(bsxfun(@times,A,reshape(B,[1 1 3])),3);
Thank you in advance
Florian Matlab R2017a

8 Comments

The reshape command is very fast, but in sum(bsxfun()) you create a large temporary array.
The body of the loop does not depend on the loop counter i. Therefore it is not clear what the loop is doing at all. Please post a more realistic example to let the readers know, what's going on.
C=reshape((reshape(A, 9, 3)*B), 3, 3);
may be slightly (10%) faster.
Florian
Florian on 19 Sep 2017
Edited: Florian on 19 Sep 2017
Indeed, 11% faster, always good to take!
Good idea to convert the sum into this matrix product
Thank you very much
@Jan, the loop is here to benchmarck the time computation, but I cannot avoid it.
Florian
Generally, you may want to minimize the number of explicit MATLAB operations, even if this means reshaping matrices (which, however, is also an operation) or creating intermediate matrices.
Unfortunately, I can't think of a good way to get rid of one of the two reshapes.
If you thoroughly need to optimized this specific operation, you could look into writing it as a C/C++ function and using MEX.
One more thing to thing about - can you eliminate the third dimension? This may speed up accessing and reshaping the A matrix.
Jan
Jan on 19 Sep 2017
Edited: Jan on 19 Sep 2017
@Florian: Sorry, I do not get it. Currently you have a loop, which is required, but you have inserted it for the benchmark only? The expression
C = B(1).*A(:,:,1)+B(2).*A(:,:,2)+B(3).*A(:,:,3);
is a constant - at least in the code example you have posted. If you explain, which parts are variable and changing in the loop, further suggestions about the optimization are possible. Currently there might be a large potential fur further improvements, but the code is too much simplified to see them. It is usually not productive to optimize some pseudo code, which does not perform the actual calculations.
Sorry Jan, my code was not clear, A and B are not constant, I should have written it from the begining as
tic
for i=1:1000000
A=rand(3,3,3);
B=rand(3,1);
C=B(1).*A(:,:,1)+B(2).*A(:,:,2)+B(3).*A(:,:,3);
end
toc
The loop "for" here is just here for benchmarking. I can put bellow one of my term (which is part of a function called several thousand times), put I am not sure that this will be helpful, the reason why I simplified the problem to the first post.
Anyway thank you for your time, and I hope that it is more clear now
Kappa0d2xi0et=(-1).*(a0_3*C0Bis(1,:)*d4Rdxiet(:,1:3)'+a0_3*C0Bis(2,:)*d4Rdxiet(:,3:5)'+2.*a0_3*C0Bis(3,:)*d4Rdxiet(:,2:4)'+2.*(a03d1_1*C0Bis(1,:)+a03d1_2*C0Bis(3,:))*d3Rdxiet(:,1:3)'+2.*(a03d1_2*C0Bis(2,:)+a03d1_1*C0Bis(3,:))*d3Rdxiet(:,2:4)'+(a03d2(:,1)*C0Bis(1,:)+a03d2(:,2)*C0Bis(2,:)+2.*a03d2(:,3)*C0Bis(3,:))*d2Rdxiet'+...
((C0(1,1).*crod2_1(:,:,1)+C0(1,2).*crod2_1(:,:,2)+2.*C0(1,3).*crod2_1(:,:,3))+(C0(2,1).*crod2_2(:,:,1)+C0(2,2).*crod2_2(:,:,2)+2.*C0(2,3).*crod2_2(:,:,3))+2.*(C0(3,1).*crod2_3(:,:,1)+C0(3,2).*crod2_3(:,:,2)+2.*C0(3,3).*crod2_3(:,:,3)))*d1Rdxiet'+...
+2.*((C0(1,1).*crod1_1(:,:,1)+C0(1,2).*crod1_1(:,:,2)+2.*C0(1,3).*crod1_1(:,:,3))+(C0(3,1).*crod1_2(:,:,1)+C0(3,2).*crod1_2(:,:,2)+2.*C0(3,3).*crod1_2(:,:,3)))*d2RdxietV12+2.*((C0(2,1).*crod1_2(:,:,1)+C0(2,2).*crod1_2(:,:,2)+2.*C0(2,3).*crod1_2(:,:,3))+(C0(3,1).*crod1_1(:,:,1)+C0(3,2).*crod1_1(:,:,2)+2.*C0(3,3).*crod1_1(:,:,3)))*d2RdxietV23+...
+(C0(1,1).*cro(:,:,1)+C0(1,2).*cro(:,:,2)+2.*C0(1,3).*cro(:,:,3))*d3Rdxiet(:,1:2)'+(C0(2,1).*cro(:,:,1)+C0(2,2).*cro(:,:,2)+2.*C0(2,3).*cro(:,:,3))*d3Rdxiet(:,3:4)'+2.*(C0(3,1).*cro(:,:,1)+C0(3,2).*cro(:,:,2)+2.*C0(3,3).*cro(:,:,3))*d3Rdxiet(:,2:3)')';
@Christoph, for mex files, what I am doing is once my function is optimized in matlab, I am using matlab coder to get the C function.

Sign in to comment.

Answers (2)

Matt J
Matt J on 19 Sep 2017
Edited: Matt J on 19 Sep 2017
As below, but you shouldn't be doing this in a loop. You should gather all the data you want to process in one big array first and then do the multiplication in a single, vectorized command.
[m,n,p]=size(A);
C=reshape(A,[],3)*B;
C=reshape(C,m,n);

2 Comments

Florian's comment:
Dear Matt,
Thank you for your answer, but I cannot avoid the loop. I am award that loops should be avoid as much as possible, but the code I show is a simplified version of the one I am using. In reality, there are something like 100 lines inside this loop, and full vectorization is not possible. From the profiler, it is this operation that turns to be time consuming, the reason why I want to speed it up.
Best Florian
The next question is why do you need this loop? Maybe working on this rather than on the product would be beneficial.

Sign in to comment.

OCDER
OCDER on 19 Sep 2017
Edited: OCDER on 19 Sep 2017
Here's a bunch of options. Not sure if A or B is a constant, or changes with loop, so the solution you want depends on what the "real" loop is. Also consider using parfor if you have to do many independent calculations.
A = rand(3, 3, 3);
B = rand(3, 1);
tic
%If B is a constant
Bm = repmat(reshape(B, 1, 1, 3), 3, 3);
for i = 1:1000000
C = sum(Bm.*A, 3);
end
toc %Elapsed time is 0.826048 seconds.
tic
%If you want to restructure A to be accessed as A(1, :, :), A(2, :, :) so you can use times(A, B)
As = permute(A, [3 1 2]);
for i = 1:1000000
C1 = sum(times(As, B), 1);
%But your C is really C1(1, :, :), different dimension
end
toc %Elapsed time is 1.152548 seconds.
tic
%Same as above, but shifting dimension of C to 2D
As = permute(A, [3 1 2]);
C = zeros(3, 3);
for i = 1:1000000
C2 = sum(times(As, B), 1);
C(:) = C2(:);
end
toc %Elapsed time is 1.713429 seconds.
tic
%Your original post
for i = 1:1000000
C = B(1).*A(:,:,1) + B(2).*A(:,:,2) + B(3).*A(:,:,3);
end
toc %Elapsed time is 1.818561 seconds.

2 Comments

Dear Donald
I am sorry for not being clear enough in my first post, but A and B are not constant (see my answer to Jan). Unfortunately, when I take that in account in your propositions, I don’t get any computation gains. Thank you anyway.
Best
Florian
OCDER
OCDER on 19 Sep 2017
Edited: OCDER on 19 Sep 2017
I see now. hmm, I think taking that 3rd dimension out might be the best option.
You may need to reconsider a different example that best captures your real loop and shows how A and B change with the real loop counter, and also how C is used in the loop. This will help us figure out if we can remove that 3rd dimension.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Asked:

on 19 Sep 2017

Edited:

on 19 Sep 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!