The most efficient way to compute the inversion of a iteratively increased correlation matrix in Matlab

Question

Zhida DENG on 6 Nov 2017

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/365353-the-most-efficient-way-to-compute-the-inversion-of-a-iteratively-increased-correlation-matrix-in-mat

Edited: Zhida DENG on 6 Nov 2017

I have to iteratively process a large correlation matrix divide by another large correlation matrix. The code may be summarised as:

nvar = 5; % number of varibales
ns = 200; % number of samples
npop = 1000; % number of population
S = randi(10,ns,nvar);
maxit = 1000; % maximum iteration
for it = 1 : maxit    
    X = randi(10,npop,nvar);
    [m,n] = size(S);
    [mx,nx] = size(X);
    D = pdist(S,'squaredeuclidean'); % the euclidean distance of the samples;
    R = exp(D*(-10)); % gaussian correlation 
    mu = (10+m)*eps;
    A = triu(squareform(R))'; % where I only need upper elements to build the corrlation matrix
    A(logical(eye(m,m))) = 1+mu;
    d = reshape((pdist2(S,X,'squaredeuclidean')),mx*m,1); % the distance between samples and population
    r = exp(d*(-10));  % gaussian corrlation
    B = reshape(r, m, mx); % the corrlation matrix of samples and population
    C = A\B;   % this is the part of time-consuming and required lots of memory
    idx = randi([1 npop]); % select the best individual from the population, but here is selected randomly
    S(ns+it,:) = X(idx,:); % the samples will be increased 1, and the value is selected from the population every iteration
  end

From the above code we can see that the correlation matrix A and B is increased iteratively, this is really time-consuming when solving A\B and this may be a O (n^3) computation problem. In my real code, the matrix A and B is non-singular. I have tried using gpuArray and svd to compute the A\B in Matlab, unfortunately, no improvement has been found and even worse.

Can anyone share some ideas which can alleviate this problem, or suggest some articles/books which are talking about solving such problem? Many Many thanks in advance!

3 Comments
Show 1 older commentHide 1 older comment

Zhida DENG on 6 Nov 2017

Hi Jan, I have updated the problem statement. You are right, the EXP is very expensive and I am trying to find a way to alleviate it. More important, the computation of B\A is also very expensive. It would be much appreciated that if someone could suggest a strategy for improving the computational efficiency.

Jan on 6 Nov 2017

I cannot try it, because I do not have the required toolboxes.

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 6 Nov 2017

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/365353-the-most-efficient-way-to-compute-the-inversion-of-a-iteratively-increased-correlation-matrix-in-mat#answer_289711

Edited: John D'Errico on 6 Nov 2017

Open in MATLAB Online

SVD will essentially NEVER speed things up. This is not the purpose of SVD. When used with linear systems of equations, it may help you to solve the problem in a more robust way, but it will be slower then backslash.

Next, I'm not sure what you are doing here, and I'm probably not going to spend the time to go into your code in sufficient depth to really know. But I am not at all confidant that you know either, at least not in terms of the numerical computations.

Here is part of your matrix B. I've just dumped out the first 10 rows and columns.

B(1:10,1:10)
ans =
            0  3.9475e-183            0            0  1.4685e-226            0            0            0            0            0
            0  5.4629e-270   1.134e-126            0  5.8793e-105            0  9.9296e-153            0  4.2184e-170  7.1246e-218
  3.0268e-235            0  3.0268e-235            0            0            0  4.1996e-322            0            0            0
  6.2386e-244  4.2184e-170  2.6504e-261            0  3.2346e-222            0  9.8597e-305            0            0            0
  4.7836e-296            0  1.4685e-226            0            0            0            0            0  2.1871e-148  1.3742e-239
  1.9152e-174  1.5693e-213            0            0            0            0            0            0            0            0
            0            0  2.3208e-287   3.7201e-44            0  9.2263e-318            0            0  2.6504e-261            0
            0            0            0  5.8793e-105            0  2.1717e-300            0   3.7201e-44  9.8597e-305            0
            0  3.4566e-209  3.2346e-222            0   2.8524e-96  2.3208e-287  2.4801e-274  1.5693e-213  6.6669e-231  4.7836e-296
  1.4685e-226  4.4763e-309  3.4566e-209            0  9.2263e-318            0            0            0  2.3208e-287            0

So most of your matrix is zero. All of those tiny numbers are zero, at least they are effectively so in double precision arithmetic.

max(abs(B(:)))
ans =
     1

In fact, while your matrix may seem moderately dense, it is quite sparse. All of those numbers with several hundred negative powers of 10 are ZERO.

spy(abs(B)*1e16 > max(abs(B(:))))

So you are doing a lot of "work" there for no purpose at all.

Instead, you might want to work with sparse matrices. While the conversions below may not be the best way to do things, they will serve to show the difference in time one can expect.

size(B)
ans =
         202        1000
Bs = B;
Bs(abs(B)*1e17 < max(abs(B(:)))) = 0;
Bs = sparse(Bs);
nnz(Bs)
ans =
   202
nnz(B)
ans =
       97146
timeit(@() Bs\A)
ans =
     0.019564
timeit(@() B\A)
ans =
      0.19618

The point is, you should NEVER just throw a matrix at any numerical method. Never do an integration, and optimization, a matrix factorization, etc., without thinking if what you are doing is numerical nonsense. I'm sorry, but in this case, that B\A computation is mostly doing a lot of work for no purpose at all.

As well, you commented that the exponential was expensive. But if most of those numbers are less than the maximum element by roughly 50, then computing an exponential of a lot of numbers that are sufficiently negative that the result is effectively zero is a waste of time.

exp(-50)
ans =
   1.9287e-22

This is a number that is well below the cutoff for what I would call zero in double precision when you are doing linear algebra. So if you only computed the exponential for the elements in B that will be of significant size, you could save some more time. As well, you could create a sparse matrix with non-zeros only where the elements were significantly non-zero.

So look carefully at the computations. Your matrices are effectively sparse matrices, but you are creating and working with them as full matrices. That simply is not an efficient way to do things.

1 Comment
Show -1 older commentsHide -1 older comments

Zhida DENG on 6 Nov 2017

Edited: Zhida DENG on 6 Nov 2017

Open in MATLAB Online

Hi @John D'Errico, many thanks for these detailed explanations which are very clearly and useful. I have simply tried using:

B(abs(B)*1e12 < max(abs(B(:)))) = 0; 
B = sparse(B);

However, by adding these to my code, the time spent for computing A\B was 1000 sec (which is much more than my original code 100 sec).

Sign in to comment.

The most efficient way to compute the inversion of a iteratively increased correlation matrix in Matlab

3 Comments
Show 1 older commentHide 1 older comment

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

The most efficient way to compute the inversion of a iteratively increased correlation matrix in Matlab

3 Comments Show 1 older commentHide 1 older comment

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments