Computing the outer product (BLAS 2 operation) raised to the power k using the least number of flops

4 views (last 30 days)
Explain how (xy^T)^k can be computed using the least number of flops, where x, y in R^n are vectors. Then write the corresponding MATLAB algorithm
Here is my algorithm:
function M = outer(x, y, k)
n = length(x);
n = length(y);
A = zeros(n, n);
for j = 1 : n
for i = 1 : n
A(i, j) = x(i) * y(j);
M(i,j) = A(i,j)^k;
end
end
so for example, if I have x = [1 2], and y = [2 4], and k = 2, then (xy^T) = [2 4; 4 8], and (xy^T)^2 = [4 16; 16 64], while (xy^T)^2 should be equal to [20 40; 40 80]. what is wrong with my algorithm? (And I looped over the columns first and put M(i,j) outside the for loop to use the least number of flops as the question wants. Is it correcr?)
  3 Comments
Pascale Bou Chahine
Pascale Bou Chahine on 19 Sep 2020
Yeah, I should end up with a matrix, i.e. if k = 2, I should have (xyT)^k = (xy^T)^2 = [20 40; 40 80], but with my algorithm, I'm getting element-wise power, i.e. I'm getting (xy^T)^2 = [4 16; 16 64].
Bjorn Gustavsson
Bjorn Gustavsson on 19 Sep 2020
If you want the [20 40;40 80] result, then you should take the matrix-power, and not the elementwise power. The matrix power are (for the case of an exponent of 2 and 3):
M2 = M*M; % k = 2
M3 = M*M*M; % k = 3
You have to modify your algorithm and read up on efficient calculations of matrix-powers...

Sign in to comment.

Answers (2)

Gaurav Garg
Gaurav Garg on 22 Sep 2020
Hi Pascale,
MATLAB supports LAPACK and BLAS functions/operations which can help you to reduce the number of FLOPS for your function(s). To know more about LAPACK and BLAS, kindly refer to doc here and to know about how can you use MATLAB and BLAS operations in your script/function, kindly refer here.

Walter Roberson
Walter Roberson on 22 Sep 2020
LAPACK and BLAS are designed for computational efficiency, which is not the same thing as reducing the number of FLOPS. The least number of FLOPS is not necessarily the most efficient computation. The theoretical computations for matrix multiplications can involve a lot of cache misses for the right hand matrix; it is a lot more efficient in processor time to do block calculations, which are still "pretty" efficient but might have more FLOPS, -- for example it might require (say) 2*n+m additional additions compared to the theoretical, and yet a block multiplication might be 100 times faster due to not having as many cache misses.
matrix to an integer power can have the integer power decomposed into binary. For example, if k = 9, then instead of 9 matrix multiplications, M*M*M*M*M*M*M*M*M you can take M2 = M*M; M4 = M2*M2; M8 = M4*M2; M9 = M8*M -> only 4 matrix multiplies. Or M2 = M*M, M3 = M2*M, M9 = M3*M3 -> only 3 matrix multiplies
However, for large enough k even this is not necessarily the most efficient. Instead you can do [U,S,V] = svd(M); U * S^k * V and S is guaranteed to be diagonal, so S^k is S.^k .
But to know whether that is going to be more efficient than prime decomposition, you have to know the cost of SVD.
I have a suspicion that the problem might have been structured in a way that makes it easier to do SVD. Experimentally, in the tests I did, S was always non-zero only in its (1,1) entry.... though the values of the matrices are not at all obvious to me at the moment, so I do not know at the moment if there would be an "inexpensive" SVD that could be done.
  1 Comment
Bjorn Gustavsson
Bjorn Gustavsson on 22 Sep 2020
Well, since y is simply 2*x I think we can cook up the eigenvectors in 2-D space rather easily (x/norm(x) and one perpendicular to that) and the nonzero eigenvalue I get to be the product norm(x)*norm(y). But since the problem is clearly designed for someone to work out, it surely is a home-work task?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!