MeX implementation of Tensorproduct multiplications
2 views (last 30 days)
Show older comments
Dear all,
I am searching for a more efficient way to calculate tensorproduct multiplications of arbitrary sizes. First of all, i have already searched the web to get inspired by different solutions, e.g. here Tensorproduct at stackexchange:
1.) External Libraries, e.g. Eigen C++.
2.) Pure Matlab solution based in reshape and shiftdim.
However, so far i ended up with a MeX-file implementation with OpenMP parallelization.
--
In the following, i will give a short introduction of the math behind the implementation:
Suppose you have a matrix vector multiplication, where a matrix C is constructed by a Kronecker product of matrices A with size (n x m) and B with size (p x q). More informations can be found here Wikipedia.
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/718719/image.png)
In two dimensions this operation can be performed with O(npq+qnm) operations instead of O(mqnp) operations.
--
So far so good. In following example the sizes of the arrays are
matrix_xi = 4x4,
matrix_eta = 4x4,
vector = 16x500000x5,
(new version: vector = 16x2500000)
and may be initialized with ones or zeros. Moreover, rather than calculating only one Kronecker matrix vector multiplication, i want to calculate e.g. 500000x5 operations with one call (see the size of vector).
--
My current MeX-C file implementation:
/*************************************************
* CALLING:
*
* out = tensorproduct(A,B,vector)
*
*************************************************/
#include "mex.h"
#include "omp.h"
#define PP_A prhs[0]
#define PP_B prhs[1]
#define PP_vector prhs[2]
#define PP_out plhs[0]
#define n 4
#define m 4
#define p 4
#define q 4
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
const mwSize *dim;
int i,j,k,s,l;
double temp[n*q];
double *A = mxGetPr(PP_A);
double *B = mxGetPr(PP_B);
double *vector = mxGetPr(PP_vector);
dim = mxGetDimensions(PP_vector);
l = dim[1];
PP_out = mxCreateDoubleMatrix(l*n*p,1,mxREAL);
double *out = mxGetPr(PP_out);
#pragma omp parallel for private(i,j,k,s,temp)
for(k=0; k<l; k++){
for(i=0; i<n; i++){
for(j=0; j<q; j++){
temp[i+j*n]=0;
}
}
for(s=0; s<m; s++){
for(i=0; i<n; i++){
for(j=0; j<m; j++){
temp[i+s*n]+=A[i+j*n]*vector[j+s*m+k*m*p];
}
}
}
for(s=0; s<n; s++){
for(i=0; i<p; i++){
for(j=0; j<q; j++){
out[i*n+s+k*n*p]+=B[i+j*p]*temp[j*n+s];
}
}
}
}
}
--
The code can be compiled with:
mex FFLAGS='$FFLAGS -fopenmp -Ofast -funroll-loops' ...
CFLAGS='$CFLAGS -fopenmp -Ofast -funroll-loops' ...
LDFLAGS='$LDFLAGS -fopenmp -Ofast -funroll-loops' ...
COPTIMFLAGS='$COPTIMFLAGS -fopenmp -Ofast -funroll-loops' ...
LDOPTIMFLAGS='$LDOPTIMFLAGS -fopenmp -Ofast -funroll-loops' ...
DEFINES='$DEFINES -fopenmp -Ofast -funroll-loops' ...
tensorproduct.c
--
Currently on my Notebook: (Ubuntu 18.04, GCC 7.5.0, 4 Cores)
matrix_xi = ones(4,4);
matrix_eta = ones(4,4);
vector = ones(16,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc
% Elapsed time is 8.036490 seconds.
--
A quite simple Matlab implementation without the use of the Kronecker property gives
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc
% Elapsed time is 10.489606 seconds.
--
With a different size of n,m,p,q = 7 the difference surprisingly does not change and gets worse (here you have to change the constants in the C-file)
matrix_xi = ones(7,7);
matrix_eta = ones(7,7);
vector = ones(49,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc
% Elapsed time is 31.436541 seconds.
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc
% Elapsed time is 35.829957 seconds.
Luckily I am still faster than Matlab.
--
Note that only n,m,p,q are constants during compile time. To make the code useable for arbitrary sizes i have written a Macro to generate many C-files with different const n,m,p,q which will be included later with a header and a main C-file. This will help the compiler to unrole and inline the code.
This is the most efficient way i have achieved so far. I hope there are some experts here in this forum who can help to optimize these few lines. I am also open for other solutions, e.g. Matlab based or based on external libraries.
Thank you!
4 Comments
Bruno Luong
on 6 Dec 2020
Edited: Bruno Luong
on 6 Dec 2020
Yes I do know tensor and kron. But your notation is confusing. Actually your second statement
A = matrix_xi
B = matrix_eta
is wrong. Actually it's
B = matrix_xi.';
A = matrix_eta;
And your vector consists of a set of multiple X matrices.
Each reshape(vector(:,p,q), size(A,2), size(B,1)) is an X.
Your description is very confusing.
I knew what you want to compute. For the general reader, the description is careless and confusing.
Accepted Answer
Bruno Luong
on 6 Dec 2020
Edited: Bruno Luong
on 22 Aug 2021
I propose two other methods of MATLAB.
The first simply replaces (:,:) by RESHAPE. The first doing some data copy, the second do not. It seems to be the fatest.
The second method uses MATLAB pagemtimes introduced in R2020b. (Edit: Jan corrects the release)
function benchtensor
matrix_xi = rand(7,7);
matrix_eta = rand(7,7);
vector = rand(49,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc % Elapsed time is 19.429296 seconds.
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc % Elapsed time is 28.08963 seconds.
tic
for i=1:n
vector_out = reshape(matrix*reshape(vector,49,[]),size(vector));
end
toc % Elapsed time is 14.30025 seconds.
B = matrix_xi.';
A = matrix_eta;
tic
for i=1:n
X = reshape(vector, size(A,2), size(B,1), []);
CX = pagemtimes(pagemtimes(A, X), B); % Reshape CX if needed
end
toc % Elapsed time is 29.296709 seconds.
end
17 Comments
More Answers (0)
See Also
Categories
Find more on Operators and Elementary Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!