MATLAB Answers

Using a templated CUDA kernel via MATLAB

9 views (last 30 days)
Hello,
Is it possible to use a C++-style templated CUDA kernel via MATLAB's GPU Computing interface?
For example, consider the following (useless) toy code:
template<typename T>
__global__ void get_nans(T*, const int*);
template<>
__global__ void get_nans<double>(double* out, const int* dims)
{
const int tx = blockIdx.x*blockDim.x + threadIdx.x;
const int ty = blockIdx.y*blockDim.y + threadIdx.y;
if ((tx < dims[1]) && (ty < dims[0]))
out[tx*dims[0] + ty] = nan(0);
}
template<>
__global__ void get_nans<float>(float* out, const int* dims)
{
const int tx = blockIdx.x*blockDim.x + threadIdx.x;
const int ty = blockIdx.y*blockDim.y + threadIdx.y;
if ((tx < dims[1]) && (ty < dims[0]))
out[tx*dims[0] + ty] = nanf(0);
}
I then compile this into PTX code, but when I try to instantiate the kernel object in MATLAB I get the following error:
>> k = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu' );
Error using handleKernelArgs (line 61)
Found multiple matching entries in the PTX code. Matches found:
_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_
_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_
Thank you,
Alex

  0 Comments

Sign in to comment.

Accepted Answer

Narfi
Narfi on 29 Jan 2013
Alex,
Allow me to quote from the help for parallel.gpu.CUDAKernel and try to parse it for you. If you have suggestions for how to improve the wording, please let me know! "If specified, FUNC must be a string that unambiguously defines the appropriate kernel entry name in the PTX file. If FUNC is omitted, the PTX file must contain only a single entry point"
In your case, the get_nans.cu defines two global functions:
  1. get_nans<double>
  2. get_nans<float>
and the get_nans.ptx defines the corresponding two entry points:
  1. Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5 (For the double function)
  2. Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5 (For the float function)
When you create the parallel.gpu.CUDAKernel, it is ambiguous whether you want to invoke the double or the float function. Therefore, you must provide the name of the entry point you want to use and construct either the double or the float version:
kDouble = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu', '_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_'); kFloat = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu', '_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_');
Now, this almost works, but not quite because the parser in parallel.gpu.CUDAKernel cannot parse the template function definition. Therefore, we stop using this way of constructing the CUDAKernel:
KERN = parallel.gpu.CUDAKernel(PTXFILE, CUFILE, FUNC)
and use this one instead:
KERN = parallel.gpu.CUDAKernel(PTXFILE, CPROTO, FUNC)
We then end up with:
kDouble = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'double* out, const int* dims', '_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_');
kFloat = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'float* out, const int * dims', '_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_');
Does this make sense?
Best,
Narfi

  0 Comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.