GPU CUDA kernel malloc error

5 views (last 30 days)
Gaszton
Gaszton on 10 May 2011
Hello, i have a geforce 425m card with compute capability 2.1 I wrote a kernel that is using malloc inside the kernel. First the ptx file didnot compiled. After I tried to set the nvcc parameter arch=sm_21 ( nvcc -I "D:\...VC\include" -arch=sm_21 -use_fast_math -ptx SR2.cu ) With this it compiled succesfully, i was just wondering why do i need the specify that. After that i tried to create the kernel in matlab:
ckernel=parallel.gpu.CUDAKernel('SR2.ptx', 'SR2.cu');
But i a get the error:
??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of <image>.
The information log was:
: Considering profile 'compute_20' for gpu='sm_21' in
'cuModuleLoadDataEx_2a9
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE.
Before modifying the kernel to use malloc, and not specifying nvcc arch=sm_21, i was able to run my kernel from MATLAB without any problem.
I think that there is some configuration problem with CUDA. I hope someone has some idea how to solve this.
Thanks,
Gaszton
  1 Comment
Gaszton
Gaszton on 10 May 2011
Seems like that there is no options in the cuModuleLoadDataEx for compute capability 2.1:
CUjit_target_enum; possible values are:
CU_TARGET_COMPUTE_10
CU_TARGET_COMPUTE_11
CU_TARGET_COMPUTE_12
CU_TARGET_COMPUTE_13
CU_TARGET_COMPUTE_20
http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/online/group__CUDA__MODULE_g9e8047e9dbf725f0cd7cafd18bfd4d12.html#g9e8047e9dbf725f0cd7cafd18bfd4d12
But in the cuda toolkit 3.2 release notes i found:
Added CU_TARGET_COMPUTE_21 to JIT options.

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 11 May 2011
You can get that error message if you have a mismatch between the CUDA runtime in use by Parallel Computing Toolbox and the version of nvcc that you're using. If you're using R2010b, you need to use CUDA-3.1; for R2011a, you can use CUDA-3.2. I was able to compile and use the following trivial kernel:
// simple.cu
__global__ void fcn( double * out ) {
int * x = (int *) malloc( 1024 );
out[0] = x[0];
free( x );
}
By compiling like so:
$ /usr/local/cuda32/cuda/bin/nvcc -arch compute_20 -ptx simple.cu
and then using within MATLAB R2011a like so:
>> k = parallel.gpu.CUDAKernel( 'simple.ptx' );
>> gather(k.feval(0))
ans =
1.768515945000000e+09
  2 Comments
Gaszton
Gaszton on 11 May 2011
Thank you for your help,
I have R2010b, and cuda toolkit 3.2.
Everything worked, until i specified the -arch options to nvcc.
If i dont specify that, what is the default? i wonder why it is not 2.1 if i have a card that has 2.1 compute capability.
If i compile my cu with -arch compute_20 or sm_20 , i still get error from matlab.
I should install CUDA toolkit 3.1, and try out if it works?
with cuda_3.1 am i able to use kernel malloc?
Thank you,
Gaszton
Gaszton
Gaszton on 11 May 2011
Seems like, CUDA 3.1 does not support kernel malloc.
Otherwise with 3.1 i am able to use sm21 code in matlab.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!