Similar thing also happen for using cblas_dgemm to do matrix product, e.g., 3x3 matrix multiply 3x3 matrix produce correct result while 2x3 with 3x2 will give wrong result.
Wrong result when calling CBLAS dgemv function in a mex file
4 views (last 30 days)
Show older comments
Code is as follows,
extern "C" {
#include "cblas.h"
#include "mex.h"
}
#include<iostream>
using namespace std;
void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
if(nrhs!=2) {
mexErrMsgTxt("Two input required.");
} else if(nlhs>1){
mexErrMsgTxt("Too many outputs!");
}
mwSignedIndex row0 = mxGetM(prhs[0]);
mwSignedIndex col0 = mxGetN(prhs[0]);
mwSignedIndex row1 = mxGetM(prhs[1]);
mwSignedIndex col1 = mxGetN(prhs[1]);
if(col1!=1){
mexErrMsgTxt("Column vector required!");}
if(col0!=row1){
mexErrMsgTxt("Dimension mismatch!");}
double* A = mxGetPr(prhs[0]);
double* b = mxGetPr(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(row0,1,mxREAL);
double* y = mxGetPr(plhs[0]);
cout << row0 << "," << col0 <<endl;
ptrdiff_t b_stride = 1;
ptrdiff_t y_stride = 1;
cblas_dgemv(CblasColMajor, CblasNoTrans,
row0, col0,
1.0,
A, col0,
b, b_stride,
0.,
y, y_stride);
}
Then, if I call,
mv([1,2;3,4;5,6;7,8],[3;4])
it just give result,
ans =
0
0
0
0
However, it give correct result for 2 by 2 matrice,
mv([1,2;3,4;],[3;4])
ans =
11
25
Very confused, if I call cblas_dgemv in cpp and compile it with Visual Studio, everything just works well.
Any suggestion and help is appreciated!
I compile the Git version of OpenBLAS. Since it works well programming in C++, I believe there is nothing to do with the compiled library.
2 Comments
Jan
on 26 Mar 2018
Edited: Jan
on 26 Mar 2018
It would be easier if you explain, how you compile the code: I'm not using CBLAS, but the standard BLAS libraries shipped with Matlab usually. This does not allow the smart CblasColMajor method, such that I cannot reproduce your problem directly.
Maybe this is the bug:
cblas_dgemv(CblasColMajor, CblasNoTrans,
row0, col0,
1.0,
A, row0, % Not: col0
b, b_stride,
0.,
y, y_stride);
Answers (1)
Jan
on 26 Mar 2018
Edited: Jan
on 26 Mar 2018
This does not answer your question directly, but this is a C-Mex function to call DGEMM and DGEMV:
#include "mex.h"
// Management of trailing underscore in BLAS and LAPACK calls:
#if defined(__WINDOWS__) || defined(WIN32) || defined(_WIN32) || defined(_WIN64)
#define BLASCALL(f) f
#else
#define BLASCALL(f) f ## _
#endif
// Integers in BLAS calls:
// #define int_BLAS ptrdiff_t
#define int_BLAS mwSignedIndex
int BLASCALL(dgemm)(char *transa, char *transb, int_BLAS *m, int_BLAS *n,
int_BLAS *k, double *alpha, double *a, int_BLAS *lda,
double *b, int_BLAS *ldb, double *beta, double *c,
int_BLAS *ldc);
int BLASCALL(dgemv)(char *transa, int_BLAS *m, int_BLAS *n,
double *alpha, double *a, int_BLAS *lda,
double *b, int_BLAS *incb, double *beta, double *c,
int_BLAS *incc);
void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
int_BLAS mA, nA, mB, nB, i1 = 1;
double *A, *B, *C;
double d1 = 1.0, d0 = 0.0;
if (nrhs != 2) {
mexErrMsgTxt("Two inputs required.");
} else if (nlhs > 1) {
mexErrMsgTxt("Too many outputs!");
}
mA = (int_BLAS) mxGetM(prhs[0]);
nA = (int_BLAS) mxGetN(prhs[0]);
mB = (int_BLAS) mxGetM(prhs[1]);
nB = (int_BLAS) mxGetN(prhs[1]);
if (nA != mB) {
mexErrMsgTxt("Dimension mismatch!");
}
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
plhs[0] = mxCreateDoubleMatrix(mA, nB, mxREAL);
C = mxGetPr(plhs[0]);
// C = A * B:
if (nB == 1) {
BLASCALL(dgemv) ("N", &mA, &nA,
&d1, A, &mA, B, &i1, &d0, C, &i1);
} else {
BLASCALL(dgemm) ("N", "N", &mA, &nB,
&nA, &d1, A, &mA, B, &mB,
&d0, C, &mA);
}
return;
}
Compile this by:
mex -O DGEMM.c libmwblas.lib -largeArrayDims
For matix input, this has the same speed as calling the matrix multiplication in Matlab directly, while for matrix-vector multiplication Matlab is faster:
A = rand(200, 300);
B = rand(300, 200);
for k = 1:100; C = A*B; end; % Warmup!
tic; for k = 1:1000; C = A*B; end; toc
tic; for k = 1:1000; C = DGEMM(A, B); end; toc
A = rand(200, 300);
b = rand(300, 1);
tic; for k = 1:10000; C = A*b; end; toc
tic; for k = 1:10000; C = DGEMM(A, b); end; toc
Matlab R2016b, Core2Duo, Win7:
Elapsed time is 3.166592 seconds.
Elapsed time is 3.148328 seconds.
Elapsed time is 0.263720 seconds.
Elapsed time is 0.304909 seconds.
By the way:
- A*B and DGEMM(A,B) use multiple cores.
- Defining alpha for alpha*A * B let the C-Mex function be 2% faster than Matlab.
- It took me 20 minutes to find out, that the default -compatibleArrayDims let the BLAS library crash, because it worked sometimes by accident. My conclusion: Linear Algebra functions are implemented efficiently in Matlab and I do not spend time in performing them in C.
0 Comments
See Also
Categories
Find more on Sparse Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!