Main Content

gpucoder.sort

Optimized GPU implementation of the MATLAB sort function

Description

B = gpucoder.sort(A) sorts the elements of A in ascending order. The sort operation is performed on the GPU with the help of Thrust library. Thrust is a C++ template library for CUDA® and is shipped with CUDA Toolkit. The sorted output in B has the same type and size as A. If A is a vector, gpucoder.sort(A) sorts the elements of A in ascending order. If A is a matrix, gpucoder.sort(A) sorts each column of A in ascending order. If A is an N-dimensional array, gpucoder.sort(A) sorts along the first non-singleton dimension.

B = gpucoder.sort(A,dim) has the optional argument dim that specifies the dimension along which the sort operation is performed.

B = gpucoder.sort(A,direction) has the optional argument direction that specifies the sort direction. direction can take one of two values:

  • 'ascend' - Sorts in the ascending order. This is the default option

  • 'descend' - Sorts in the descending order.

[B,I] = gpucoder.sort(A,...) returns a sort index I which specifies how the elements of A were rearranged to obtain the sorted output B.

  • If A is a vector, then B = A(I).

  • If A is an m-by-n matrix and dim = 1, then

    for j = 1:n
      B(:,j) = A(I(:,j),j);
    end

The sort ordering is stable. Namely, when more than one element has the same value, the order of the equal elements is preserved in the sorted output B and the indices I relating to equal elements are ascending.

When gpucoder.sort is called from MATLAB®, it uses the built-in sort function.

example

Examples

collapse all

This example generates CUDA code to sort the columns of a matrix in descending order.

In one file, write an entry-point function mySort that accepts a matrix inputs A. Use the gpucoder.sort function to sort the columns of A in descending order.

function B = mySort(A)
     B = gpucoder.sort(A, 1, 'descend');
end

Use the codegen function to generate CUDA MEX function.

codegen -config coder.gpuConfig('mex') -args {ones(1024,1024,'double')} -report mySort

The following is a snippet of the generated code. The Thrust library call is denoted by thrustSortImpl

...
cudaMalloc(&gpu_inDims, 8ULL);
cudaMalloc(&gpu_B, 8388608ULL);
cudaMalloc(&gpu_A, 8388608ULL);
mySort_kernel1<<<dim3(1U, 1U, 1U), dim3(32U, 1U, 1U)>>>(*gpu_inDims);
cudaMemcpy(gpu_A, (void *)&A[0], 8388608ULL, cudaMemcpyHostToDevice);
mySort_kernel2<<<dim3(2048U, 1U, 1U), dim3(512U, 1U, 1U)>>>(*gpu_A, *gpu_B);
cudaMemcpy(&inDims[0], gpu_inDims, 8ULL, cudaMemcpyDeviceToHost);
thrustSortImpl(&(*gpu_B)[0], 2, &inDims[0], 1, 'd', false);
cudaMemcpy(&B[0], gpu_B, 8388608ULL, cudaMemcpyDeviceToHost);
...

Input Arguments

collapse all

Input array, specified as a vector, matrix, or multidimensional array.

Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | char

Dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

sort returns A if dim is greater than ndims(A). dim is not supported when A is a cell array, that is, sort only operates along the first array dimension whose size does not equal 1.

Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Sorting direction, specified as 'ascend' or 'descend'. direction is not supported when A is a cell array, that is, sort only sorts in ascending order.

Output Arguments

collapse all

Sorted array, returned as a vector, matrix, or multidimensional array. B is the same size and type as A. The order of the elements in B preserves the order of equal elements in A.

Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | char

Sort index, returned as a vector, matrix, or multidimensional array. I is the same size as A. The index vectors are oriented along the same dimension that sort operates on. For example, if A is a 2-by-3 matrix, then [B,I] = sort(A,2) sorts the elements in each row of A. The output I is a collection of 1-by-3 row index vectors describing the rearrangement of each row of A.

Limitations

  • gpucoder.sort does not support complex numbers.

  • gpucoder.sort does not support 'MissingPlacement' and 'ComparisonMethod' name-value pairs supported by the MATLAB sort function.

Version History

Introduced in R2018b