Using a templated CUDA kernel via MATLAB

Question

0 votes

Hello,

Is it possible to use a C++-style templated CUDA kernel via MATLAB's GPU Computing interface?

For example, consider the following (useless) toy code:

template<typename T>
__global__ void get_nans(T*, const int*);
template<>
__global__ void get_nans<double>(double* out, const int* dims)
{
  const int tx = blockIdx.x*blockDim.x + threadIdx.x;
  const int ty = blockIdx.y*blockDim.y + threadIdx.y;
  if ((tx < dims[1]) && (ty < dims[0]))
    out[tx*dims[0] + ty] = nan(0);
}
template<>
__global__ void get_nans<float>(float* out, const int* dims)
{
  const int tx = blockIdx.x*blockDim.x + threadIdx.x;
  const int ty = blockIdx.y*blockDim.y + threadIdx.y;
  if ((tx < dims[1]) && (ty < dims[0]))
    out[tx*dims[0] + ty] = nanf(0);
}

I then compile this into PTX code, but when I try to instantiate the kernel object in MATLAB I get the following error:

>> k = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu' );
Error using handleKernelArgs (line 61)
Found multiple matching entries in the PTX code. Matches found:
_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_
_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_

Thank you,

Alex

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Narfi on 29 Jan 2013

0 votes

Alex,

Allow me to quote from the help for parallel.gpu.CUDAKernel and try to parse it for you. If you have suggestions for how to improve the wording, please let me know! "If specified, FUNC must be a string that unambiguously defines the appropriate kernel entry name in the PTX file. If FUNC is omitted, the PTX file must contain only a single entry point"

In your case, the get_nans.cu defines two global functions:

get_nans<double>
get_nans<float>

and the get_nans.ptx defines the corresponding two entry points:

Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5 (For the double function)
Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5 (For the float function)

When you create the parallel.gpu.CUDAKernel, it is ambiguous whether you want to invoke the double or the float function. Therefore, you must provide the name of the entry point you want to use and construct either the double or the float version:

kDouble = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu', '_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_'); kFloat = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'get_nans.cu', '_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_');

Now, this almost works, but not quite because the parser in parallel.gpu.CUDAKernel cannot parse the template function definition. Therefore, we stop using this way of constructing the CUDAKernel:

KERN = parallel.gpu.CUDAKernel(PTXFILE, CUFILE, FUNC)

and use this one instead:

KERN = parallel.gpu.CUDAKernel(PTXFILE, CPROTO, FUNC)

We then end up with:

kDouble = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'double* out, const int* dims', '_Z16get_nansIdEvPT_PKS0_S3_S3_PKiS5_');

kFloat = parallel.gpu.CUDAKernel( 'get_nans.ptx', 'float* out, const int * dims', '_Z16get_nansIfEvPT_PKS0_S3_S3_PKiS5_');

Using a templated CUDA kernel via MATLAB

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

Using a templated CUDA kernel via MATLAB

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments