Assigning gpuArrays to different graphics cards

5 views (last 30 days)
In the example below I use a parfor loop to assign a different 256x256x256 (random k-space) matrix to each of my 2 GPUs. Theoretically, I can then process these matrices in parallel on the 2 GPUs (here I've done an ifftn). The problem is that the parfor loop is very slow (presumably overhead related). This operation takes 6.5 seconds on my machine. If I replace the 'parfor' with a simple 'for' (and run 1 GPU sequentially), this operation takes 0.2 seconds.
Is there an easy (fast) way to assign a gpuArray to a specific graphics card - such that future operations on this gpuArray will use the specified graphics card? Rather than using parfor, I would prefer to simply use asynchronous CUDA kernels (invoking one directly following the other in Matlab code) to run both GPUs in parallel.
Dims = [256,256,256,2];
Kspace = complex(rand(Dims,'single'),rand(Dims,'single'));
Image = gpuArray(complex(zeros(Dims,'single')));
tic
parfor n = 1:2
gKspace = gpuArray(Kspace(:,:,:,n));
Image(:,:,:,n) = ifftn(fftshift(gKspace));
end
toc

Accepted Answer

Joss Knight
Joss Knight on 22 Feb 2020
There is no way to do what you ask. Selecting a GPU is the only way to move data there, and selecting a GPU resets all GPU data.
The issue here is the way you're sending all the data to each worker and then indexing it, this is your bottleneck (and equivalently, moving all the results back). You need to amortise this communication cost, either by doing more work inside the loop or by loading the data you need directly onto each worker without first loading it onto the client.
Presumably you have more than 2 256^3 arrays. Put another loop inside your parfor and process all those arrays together. Move the results back to the CPU to save GPU memory. Eventually the communication overhead will be irrelevant and you'll see the gain of use of both your GPUs.
  6 Comments
Adam Karboski
Adam Karboski on 15 May 2020
Same result. I also created a profile from scratch, same again.
Adam Karboski
Adam Karboski on 19 May 2020
Solved, solution was to disable nvidia-persistenced

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!