Main Content

parallel.gpu.CUDAKernel

Create GPU CUDA kernel object from PTX and CU code

Description

example

kern = parallel.gpu.CUDAKernel(ptxFile,cuFile) creates a CUDAKernel object using the PTX code ptxFile and the CUDA® source file cuFile. The PTX file must contain only a single entry point.

Use kern to execute a CUDA kernel on the GPU. For information on executing your kernel object, see Run a CUDAKernel.

kern = parallel.gpu.CUDAKernel(ptxFile,cuFile,func) creates a CUDAKernel for the function entry point defined by func. func must unambiguously define the appropriate kernel entry point in the PTX file.

example

kern = parallel.gpu.CUDAKernel(ptxFile,cProto) creates a CUDAKernel object using the PTX file ptxFile and the C prototype cProto. cProto is the C function prototype for the kernel call that kern represents. The PTX file must contain only a single entry point.

kern = parallel.gpu.CUDAKernel(ptxFile,cProto,func) creates a CUDAKernel object from a PTX file and C prototype for the function entry point defined by func. func must unambiguously define the appropriate kernel entry point in the PTX file.

Examples

collapse all

This example shows how to create a CUDAKernel object using a PTX file and a CU file, or using a PTX file and the function prototype.

The CUDA source file simpleEx.cu contains the following code:

/*
* Add a constant to a vector.
*/
__global__ void addToVector(float * pi, float c, int vecLen)  {
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < vecLen) {
       pi[idx] += c;
   }
}

Compile the CU file into PTX file using the nvcc compiler in the NVIDIA® CUDA toolkit. To compile the PTX file, execute the following shell command.

nvcc -ptx simpleEx.cu

The preceding command generates a generic PTX file that is supported on all NVIDIA GPU devices. To generate code optimized for specific GPU devices, specify additional options using the -arch or -code flags. For information about nvcc options, see the nvcc documentation.

Create a CUDA kernel using the PTX file and the CU file.

kern = parallel.gpu.CUDAKernel('simpleEx.ptx','simpleEx.cu');

Create a CUDA kernel using the PTX file and the function prototype of the addToVector function.

kern = parallel.gpu.CUDAKernel('simpleEx.ptx','float *,float,int');

Both of the preceding statements return a kernel object that you can use to call the addToVector CUDA kernel.

Input Arguments

collapse all

Name of a PTX file or PTX code, specified as a character vector.

You can provide the name of a PTX code, or the contents of a PTX file.

Name of a CUDA source file, specified as a character vector.

The function examines the CUDA source file to find the function prototype for the CUDA kernel that is defined in the PTX code. The CUDA source file must contain a kernel definition starting with '__global__'.

Function entry point, specified as a character vector. func must unambiguously define the appropriate kernel entry point in the PTX file.

C prototype for the kernel call, specified as a character vector. Specify multiple input arguments separated by commas.

Introduced in R2010b