How Shared GPU Memory Manager Improves Performance of Generated MEX
You can use the GPU memory manager for efficient memory allocation, management, and improving run-time performance. The GPU memory manager creates a collection of large GPU memory pools and manages allocation and deallocation of chunks of memory blocks within these pools. By creating large memory pools, the memory manager reduces the number of calls to the CUDA® memory APIs, improving run-time performance. See GPU Memory Allocation and Minimization.
In particular, when you generate CUDA MEX code, GPU Coder™ creates a single universal memory manager that handles the memory management for
all running CUDA MEX functions, thereby further improving the performance of the MEX functions.
To view the shared MEX memory manager properties and manage allocation, create a
gpucoder.MemoryManager
object by using the cudaMemoryManager
function. To free the GPU memory that is not in use, call the
freeUnusedMemory
function. This topic explains the working of this
shared memory manager with the help of an example.
Obtain Fog Rectification Example Files
This example uses the design file fog_rectification.m
and the image
file foggyInput.png
of the Fog Rectification example. To create a
folder that contains these files, run this command.
openExample('gpucoder/FogRectificationGPUExample')
Generate and Profile CUDA MEX with GPU Memory Manager Disabled
Create a GPU code configuration object for generating a MEX function. To generate code
that does not use the memory manager, set the EnableMemoryManager
property to
false
.
cfg = coder.gpuConfig("mex");
cfg.GpuConfig.EnableMemoryManager = false;
Generate and profile CUDA MEX code for the design file fog_rectification.m
using the
gpuPerformanceAnalyzer
function. Specify the input type using an example value
inputImage
, which is the variable into which you loaded the
foggyInput.png
image file. Run the GPU Performance Analyzer with the
default iteration count of
2
.
inputImage = imread("foggyInput.png"); gpuPerformanceAnalyzer("fog_rectification",{inputImage},Config=cfg);
In the Performance Analyzer report, observe that a significant portion of the execution time is spent on memory allocation and deallocation.
Generate and Profile CUDA MEX with GPU Memory Manager Enabled
Enable GPU memory manager. Then, generate and profile the CUDA MEX function again.
cfg.GpuConfig.EnableMemoryManager = true;
gpuPerformanceAnalyzer("fog_rectification",{inputImage},Config=cfg);
Observe that most memory allocation and deallocation events have disappeared from the profiling report. Therefore, the generated MEX now has improved run-time performance. The remaining memory allocation and deallocation activities originate from a function call to the Thrust library, which cannot benefit from the GPU memory manager.
Shared Memory Manager Allocations and Deallocations
To see when the shared GPU memory manager allocates large GPU memory pools, select the
first run of fog_rectification_mex
in the profiling report.
Observe that, compared to the second run, the first run has three extra GPU memory
allocation events in the timeline graph. These events correspond to the allocation of
three memory pools by the shared GPU memory manager. Subsequent runs of
fog_rectification_mex
reuse the memory pools allocated in the first
run, thereby improving the run-time performance.
For MEX code generation, the memory pools allocated for
fog_rectification_mex
are preserved after
fog_rectification_mex
finishes its first execution. This allows
subsequent MEX functions to reuse the memory pools allocated for
fog_rectification_mex
. However, for standalone CUDA code generation,
the memory pools are private to the target (executable or static/dynamic library) and are
deallocated when the standalone target is unloaded from the memory.