Fetching outputs from different GPU's, results in an error ?
1 view (last 30 days)
Show older comments
I have 2-GPU in my computer, I wanted to use both the GPU's to perform the function. Hence I feed, part of the array to one GPU and the remaining to the second GPU.
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@Function,2,Agpu1,1);
F(2)=parfeval(@Function,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
When I fetch the outputs using the last statement, I get the error "Error using parallel.Future/fetchOutputs : One or more futures resulted in an error" .
1) Does this mean, fetch outputs is trying to fetch the output, when the other GPU is still performing the operation. How to solve this ?
In the above link, when I try printing the gpuDevice used, it always shows gpu2 is being used and gpu 1 is idle. How to confirm both GPU's are being used ?
Thankyou!
3 Comments
Joss Knight
on 30 Jan 2019
Edited: Joss Knight
on 30 Jan 2019
You can try to use the same GPUs on more than one parallel worker, but it's pointless - the work will happen in serial. If you have two GPUs, open a pool with two workers. If you want to do some work on the GPU and some on the CPU, take a look at the answer to this question.
The error is a pretty simple one. Every time you select the device using gpuDevice, you are resetting it, clearing all gpuArray variables in memory, including the ones you passed in. As I said, there is no point in moving the data to the GPU on the client MATLAB and then sending it to your worker in a parfeval call. All that happens is that the data gets transferred back to the system memory, then transmitted to the other process, then deserialised and put back on whatever device is currently selected. Create your data on your worker or send it as a CPU array and then transfer it to the GPU at the other end. You could also try using a parallel.pool.Constant to define data on your workers that persists from call to call.
If I was trying to do pagewise QR like you are on two GPUs I'd probably use SPMD, and I probably would limit the GPU work to just the call to qr - there's no advantage to all that indexing and storage on the GPU, I don't think:
parpool('local', gpuDeviceCount);
spmd
nPages = size(A,3);
blocksize = ceil(nPages/numlabs);
strt = (labindex-1)*blocksize + 1;
fnsh = min(nPages, strt+blocksize);
for j = fnsh:-1:strt
Agpu = gpuArray(A(:,:,j));
[qgpu,rgpu] = qr(Agpu, 0);
i = j-strt+1;
q(:,:,i) = gather(qgpu);
r(:,:,i) = gather(rgpu);
end
end
% q and r are now Composites so need to be indexed to recreate result
Q = cat(3, q{:});
R = cat(3, r{:});
By the way, I hope you're not actually doing this
for i=1:500
A(:,:,i)=rand(500,500);
end
Since it's just the same as A = rand(500,500,500), but way slower.
Answers (0)
See Also
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!