Effective GPU Bandwidth Nvidia Quadro 6000
13 views (last 30 days)
Show older comments
Hello, I would like to use GPU acceleration to speed up the computation of fft2 in my code. The GPU device I'm using is a Nvidia Quadro 6000 having a theoretical bandwidth of 144 GB/s. However the effective bandwidth is almost 100 times lower making the use of a GPU almost unworthy:
Test : 2048 x 2048
Elapsed CPU time is : 0.109062 sec
Elapsed GPU time is : 0.007661 sec
Elapsed GPU time with CPU transfer is : 0.079723 sec
Speed up : 14.236 without memory transfer
1.36801 with memory transfer
Test : 4096 x 4096
Elapsed CPU time is : 0.356208 sec
Elapsed GPU time is : 0.026819 sec
Elapsed GPU time with CPU transfer is : 0.29406 sec
Speed up : 13.2819 without memory transfer
1.21134 with memory transfer
Test : 8192 x 8192
Elapsed CPU time is : 1.30381 sec
Elapsed GPU time is : 0.121605 sec
Elapsed GPU time with CPU transfer is : 1.17194 sec
Speed up : 10.7217 without memory transfer
1.11252 with memory transfer
If I compute the effective bandwidth (see benchmark below) it's about 1.45 GB/s
Could it be due to the version of Matlab I'm using (R2011a) or is it rather normal to expect such poor performances?
Benchmark used to measure the bandwidth:
sizes = power(2, 12:26);
repeats = 10;
D = gpuDevice
sendTimes = inf(size(sizes));
gatherTimes = inf(size(sizes));
for ii=1:numel(sizes)
data = randi([0 255], sizes(ii), 1, 'uint8');
for rr=1:repeats
timer = tic();
gdata = gpuArray(data);
sendTimes(ii) = min(sendTimes(ii), toc(timer));
timer = tic();
data2 = gather(gdata);
gatherTimes(ii) = min(gatherTimes(ii), toc(timer));
end
end
sendBandwidth = (sizes./sendTimes)/1e9
[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);
fprintf('Peak send speed is %g GB/s\n',maxSendBandwidth)
gatherBandwidth = (sizes./gatherTimes)/1e9
[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);
fprintf('Peak gather speed is %g GB/s\n',max(gatherBandwidth))
0 Comments
Answers (2)
Edric Ellis
on 19 Mar 2013
Your experiment there is measuring the transfer bandwidth across the PCI bus, not the device global memory bandwidth. The PCI bus bandwidth is discussed in a blog entry on Loren's blog here http://blogs.mathworks.com/loren/#1fa09fa2-c99c-4bb0-8b11-eb805fdd7040.
We have made various performance improvements to the gpuArray code since R2011a, so it would be best for you to upgrade if you can.
0 Comments
Domenico
on 19 Mar 2013
1 Comment
Edric Ellis
on 19 Mar 2013
Those figures are published using R2012b, and show that 8GB/s is not achieved; however it does show a decent improvement over your measured speed. It's hard to predict exactly how much of the difference is due to the software and how much due to the different hardware.
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!