Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

Question

1 vote

Given a matrix X and its vector form

I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.

So in MATLAB code it would be something like that:

clear();
numRows = 3;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
% Lower Triangle are indices 2, 3, 6
mL = [  0, 1, 0, 0, 0, 0, 0, 0, 0   ; ...
        0, 0, 1, 0, 0, 0, 0, 0, 0   ; ...
        0, 0, 0, 0, 0, 1, 0, 0, 0   ];
% Upper Triangle are indices 4, 7, 8
mU = [  0, 0, 0, 1, 0, 0, 0, 0, 0   ; ...
        0, 0, 0, 0, 0, 0, 1, 0, 0   ; ...
        0, 0, 0, 0, 0, 0, 0, 1, 0   ];
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));

I am after sparse represenation of mU and mL in the most efficient way.

My current implementation is given by:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
    case(INCLUDE_DIAGONAL)
        numElements = 0.5 * numRows * (numRows + 1);
        diagIdx = 0;
    case(EXCLUDE_DIAGONAL)
        numElements = 0.5 * (numRows - 1) * numRows;
        diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
    elmntIdx = 0;
    for jj = 1:numRows
        for ii = (jj + diagIdx):numRows
            elmntIdx = elmntIdx + 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
    elmntIdx = numElements + 1;
    for jj = numRows:-1:1
        for ii = (jj - diagIdx):-1:1
            elmntIdx = elmntIdx - 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end

Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?

Thank You.

24 Comments
Show 22 older comments Hide 22 older comments

Matt J on 23 Apr 2020

Edited: Matt J on 23 Apr 2020

Open in MATLAB Online

@Matt, there are many cases for using those matrices.

I can't think of any. You can pursue this for hypothetical interest if you want, of course.

Regarding fmincon(), Solving problem which have dedicate solver with general solver is usually a really bad chice.

The situations when that is true are those where calculating the objective and derivatives are faster in matrix form than in operator form. For large problems that won't always be the case, because the computational cost of implementing things in matrix form can start to outweigh the benefits of using a specialized algorithm.

Not to speak defining Linear Constarint in the form of Non Linear (Think time spent on calculating the Derivataive when it is so well defined).

I don't think mU and mL are helpful for defining either linear or nonlinear constraints. A linear constraint on the lower triangle of your unknown matrix X will always be of the form sum(T.*X, 'all')<=b, where T is some lower triangular matrix that you know in advance. The matrix form of the constraint gradient is simply T(:), which doesn't require mL at all to set up.

For non-linear constraints c(mL*X)<=0 on the lower triangular part of X, the gradient can be expressed mL.'*gradc(mL*X), but this could implemented efficeintly and without mL as follows:

B=B=tril(true(numRows),-1);
Bd=double(B);
g=Bd;
g(B)=gradc(X(B));

So, for this, you really only need to pre-compute B and Bd, which can be done with much less time and memory allocation than mL:

N=3000;
tic;
B=tril(true(N),-1);
Bd=double(B);
toc
%Elapsed time is 0.056492 seconds.
tic;
mL=GenerateTriangleExtractorMatrix( N, 1, 2);
toc
%Elapsed time is 0.264385 seconds.

>> whos B Bd mL
  Name   Size                Kilobytes     Class     Attributes
                                                               
  B      3000x3000                8790     logical             
  Bd     3000x3000               70313     double              
  mL     4498500x9000000        140602     double    sparse    

Matt J on 23 Apr 2020

But that would mean your constraints are of the form mL*X(:)<=b. But since each row of mL contains only a single non-zero element, this means the constraint is equivalent to a simple bound X(j)<=b. In Matlab, you would never have to construct a matrix to represent such a constraint. You would use the vector input arguments lb and ub to specify those. I assume Gurobi has something similar.

Royi Avital on 23 Apr 2020

@Matt, I know that. Whenever I can use other features of the solver I user. I have cases I need those extractors in Matrix Form. I appericiate the dialogue. I think other who will read it will gain something. I still hope someone will bring a different point of view to the pattern of vJ. Though I guess @James' solution as practically as good as it gets (Also appericate if there is something to make it even faster).

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

James Tursa on 22 Apr 2020

Edited: James Tursa on 22 Apr 2020

Open in MATLAB Online

2 votes

Here is a mex routine that generates the sparse double matrices mL and mU directly, so no wasted memory in creating them. Seems to run about 3x-5x faster than m-code for somewhat large sizes.

/* S = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag)
 *
 * S = double sparse matrix
 * numRows = integer > 0
 * triangleFlag = 1 , extract lower triangle
 *                2 , extract upper triangle
 * diagFlag = 1 , include diagonal
 *            2 , exclude diagonal
 * where
 *
 * M = an numRows X numRows matrix of non-zero terms
 * assert(isequal(S * M(:), mX(logical(tril(M, -1))))); % for lower
 * assert(isequal(S * M(:), mX(logical(triu(M,  1))))); % for upper
 *
 * Programmer: James Tursa
 * Date: 2020-April-22
*/
        
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mwSize numRows, triangleFlag, diagFlag, numElements;
    mwIndex *Ir, *Jc;
    mwIndex i, j, k, m;
    double *pr;
    
    if( nrhs != 3 || !mxIsNumeric(prhs[0]) || !mxIsNumeric(prhs[1]) || !mxIsNumeric(prhs[2]) ||
        mxGetNumberOfElements(prhs[0]) != 1 || mxGetNumberOfElements(prhs[1]) != 1 ||
        mxGetNumberOfElements(prhs[2]) != 1 ) {
        mexErrMsgTxt("Need three numeric scalar inputs");
    }
    if( nlhs > 1 ) {
        mexErrMsgTxt("Too many outputs");
    }
    numRows = mxGetScalar(prhs[0]);
    triangleFlag = mxGetScalar(prhs[1]);
    diagFlag = mxGetScalar(prhs[2]);
    if( numRows < 1 ) {
        mexErrMsgTxt("Invalid numRows, should be > 0");
    }
    if( triangleFlag != 1 && triangleFlag != 2 ) {
        mexErrMsgTxt("Invalid triangleFlag, should be 1 or 2");
    }
    if( diagFlag != 1 && diagFlag != 2 ) {
        mexErrMsgTxt("Invalid diagFlag, should be 1 or 2");
    }
    if( diagFlag == 1 ) {
        numElements = numRows * (numRows + 1) / 2; /* include diagonal */
    } else {
        numElements = (numRows - 1) * numRows / 2; /* exclude diagonal */
    }
    plhs[0] = mxCreateSparse(numElements, numRows*numRows, numElements, mxREAL);
    pr = (double *) mxGetData(plhs[0]);
    Ir = mxGetIr(plhs[0]);
    Jc = mxGetJc(plhs[0]);
    Jc[0] = 0;
    diagFlag--;
    k = 0;
    m = 1;
    if( triangleFlag == 1 ) { /* Lower */
        for( j=0; j<numRows; j++ ) {
            for( i=0; i<numRows; i++ ) {
                if( i >= j+diagFlag ) {
                    *pr++ = 1.0;
                    *Ir++ = k++;
                    Jc[m] = Jc[m-1] + 1;
                } else {
                    Jc[m] = Jc[m-1];
                }
                m++;
            }
        }
    } else { /* Upper */
        for( j=0; j<numRows; j++ ) {
            for( i=0; i<numRows; i++ ) {
                if( i+diagFlag <= j ) {
                    *pr++ = 1.0;
                    *Ir++ = k++;
                    Jc[m] = Jc[m-1] + 1;
                } else {
                    Jc[m] = Jc[m-1];
                }
                m++;
            }
        }
    }
}

You mex the routine as follows (you need a supported C compiler installed):

mex GenerateTriangleExtractorMatrixMex.c

And some test code:

% GenerateTriangleExtractorMatrix_test.m
n = 300;
disp('m-code timing')
tic
GenerateTriangleExtractorMatrix(10000,1,1);
toc
disp('mex code timing')
tic
GenerateTriangleExtractorMatrixMex(10000,1,1);
toc
for k=1:n
    numRows = ceil(rand*5000+100);
    numCols = numRows;
    triangleFlag = (rand<0.5) + 1;
    diagFlag = (rand<0.5) + 1;
    Mm = GenerateTriangleExtractorMatrix(numRows,triangleFlag,diagFlag);
    Mx = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag);
    if( ~isequal(Mm,Mx) )
        error('Not equal');
    end
end
disp('Random tests passed')

With a sample run:

>> GenerateTriangleExtractorMatrix_test
m-code timing
Elapsed time is 9.964882 seconds.
mex code timing
Elapsed time is 1.901741 seconds.
Random tests passed

4 Comments
Show 2 older comments Hide 2 older comments

Royi Avital on 22 Apr 2020

@James, I meant I want to use all the abstractions of the MATLAB C API. I just want to use it in my own C files. Not for MEX but for general computing. Yet I guess MATLAB blocks that kind of use.

Royi Avital on 23 Apr 2020

Edited: Royi Avital on 23 Apr 2020

Open in MATLAB Online

By the way, I tried optimizing the code:

	if( triangleFlag == 1 ) { // Lower Triangle
		for( jj = 1; jj < numRows + 1; jj++ ) {
			for( ii = 1; ii < jj + diagFlag; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1];
			}
			for( ii = jj + diagFlag; ii < numRows + 1; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1] + 1;
				vV[kk] = 1.0;
				Ir[kk] = kk;
				kk++;
			}
		}
	} else { // Upper Triangle
		for( jj = 1; jj < numRows + 1; jj++ ) {
			for( ii = 1; ii < jj + 1 - diagFlag; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1] + 1;
				vV[kk] = 1.0;
				Ir[kk] = kk;
				kk++;
			}
			for( ii = jj + 1 - diagFlag; ii < numRows + 1; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1];
			}
		}
	}

But for some reason even removing the branching inside the loop didn't improve results.

Really Nice! If nothing comes up I will mark this as an answer. Thank You!

Sign in to comment.

Answer 2

Matt J on 23 Apr 2020

Edited: Matt J on 23 Apr 2020

Open in MATLAB Online

1 vote

Another approach to consider is to use my MatrixObj class

https://www.mathworks.com/matlabcentral/fileexchange/26611-on-the-fly-definition-of-custom-matrix-objects

to construct an object that has the same effect as the operations mL*X and mL.'*Y, but doesn't require you to actually build the matrix,

    N=5000;
    
    tic;
     mL0=GenerateTriangleExtractorMatrix( N, 1, 2);
    toc
    %Elapsed time is 0.678702 seconds.
         
    tic;
    
        B=tril(true(N),-1);
        Bd=double(B(:));
        
        mL=MatrixObj;
        mL.Params.B=B;
        mL.Params.Bd=Bd;
        mL.Ops.mtimes=@(obj,z) z(obj.Params.B);
        mL.Trans.mtimes=@mtimesT;
    
    toc;
    %Elapsed time is 0.086228 seconds.
    
    function out=mtimesT(obj,z) 
        
        out=obj.Params.Bd;
        out(obj.Params.B)=z;
        
    end

In addition to requiring less time to construct, you can verify that it gives the same results as multiplications with mL and mL.',

        >> X=rand(N^2,1);   isequal(mL0.'*(mL0*X),mL.'*(mL*X))
        
        ans =
    
          logical
        
           1

but with considerably less memory consumption:

  >> whos mL mL0
  
  Name   Size                  Kilobytes     Class       Attributes
                                                                   
  mL     1x1                      219739     MatrixObj             
  mL0    12497500x25000000        390586     double      sparse    

1 Comment
Show -1 older comments Hide -1 older comments

Royi Avital on 24 Apr 2020

This is really nice. Thank you for the effort!

Sign in to comment.

Answer 3

Royi Avital on 21 Apr 2020

Open in MATLAB Online

0 votes

My current solution:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
    case(INCLUDE_DIAGONAL)
        numElements = 0.5 * numRows * (numRows + 1);
        diagIdx = 0;
    case(EXCLUDE_DIAGONAL)
        numElements = 0.5 * (numRows - 1) * numRows;
        diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
    elmntIdx = 0;
    for jj = 1:numRows
        for ii = (jj + diagIdx):numRows
            elmntIdx = elmntIdx + 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
    elmntIdx = numElements + 1;
    for jj = numRows:-1:1
        for ii = (jj - diagIdx):-1:1
            elmntIdx = elmntIdx - 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end

I like the memory allocation is kept to a minimum.

I wonder if there is a more efficient way to generate vJ. It is trivial to remove the inner loop and just count the number of elements yet in MATLAB it will mean each iteration will allocate memory (As we don't have iterators).

2 Comments
Show None Hide None

Tommy on 21 Apr 2020

The two methods are fairly similar - I also like that yours minimizes memory allocation. I ran a few simple fun tests:

I didn't dare try higher than 20,000 for numRows. It seems that your code may possibly perform better at higher values of numRows. In the second case (calculating both the upper and lower triangles) I had your code running both sets of for loops, one after the other (shown in red). In green is the result from your code if only the first set of loops runs, and you recognize that vJ for one triangle is easy to determine if you have vJ for the other triangle (N^2+1-flip(vJ)). So the only thing I'll conclude from this is, if you will eventually calculate both the lower and upper triangle matrices for a given size, it might be better to calculate them together and only find vJ once. I suppose it depends on how expensive N^2+1-flip(vJ) is.

Royi Avital on 21 Apr 2020

@Tommy, Really liked your analysis. Yes, when dealing with sparse matrices the whole point it making sure allocation is kept to minimum. I agree if one wants both, it is better to do the trick you mentioned.

Let's see if someone can think on a different pattern to populate vJ which is more efficient.

Sign in to comment.

Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

24 Comments
Show 22 older comments Hide 22 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None

Categories

Products

Tags

Community Treasure Hunt

Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

24 Comments Show 22 older comments Hide 22 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

2 Comments Show None Hide None

Categories

Products

Tags

See Also

Community Treasure Hunt

24 Comments
Show 22 older comments Hide 22 older comments

4 Comments
Show 2 older comments Hide 2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None