GPU much slower in existing code

9 views (last 30 days)
Nate
Nate on 21 Nov 2011
Hi,
I'm working on an aerodynamics simulation, and have a function that needs speeding up. I have rewritten that function to utilize a GPU, and compared the results to a case run without the GPU, and the results are terrible. I used the profiler to compare time spent in particular functions...as a simple example, at one point I take the square root of the sum of the squares:
r1=sqrt(pxx1.^2+pyy1.^2_pzz1.^2);
and in the code without the GPU, MATLAB spends 0.22 seconds on this calculation (over the entire simulation), whereas in the code with the GPU, MATLAB spends 2.31 seconds on this calculation. An order of magnitude higher!
The only difference is that one set of variables is on the GPU, and the other is not. This does not include any time spent gathering the variable from the GPU...this is strictly time spent on the calculation.
And to make matters even more confusing, I pulled out the square root of the sum of the squares to compare performance of that function alone, and I get a completely different result. The GPU is 30% faster when calculating that function alone.
What reasons, within a larger body of code, would cause a GPU to perform so poorly compared to the CPU? Why would the GPU perform so much better on a single calculation?
  1 Comment
Daniel Shub
Daniel Shub on 21 Nov 2011
I am sure it is a typo but I am pretty sure .^2_pzz1 is not valid syntax. Also, what do you mean by pull out the square root. Do you mean something like:
temp = pxx1.^2+pyy1.^2+pzz1.^2;
r1 = sqrt(temp);
If so, which steps take more time and which ones take less time.

Sign in to comment.

Answers (3)

Jan
Jan on 21 Nov 2011
Do the temporarily created arrays match into the available RAM on the graphics card? 0.22 sec on the CPU sounds like a large arrays.
Please measure the time for this also:
r1 = sqrt(pxx1 .* pxx1 + pyy1 .* pyy1 + pzz1 .* pzz1);
and
r1 = pxx1 .* pxx1 + pyy1 .* pyy1 + pzz1 .* pzz1;
r1 = sqrt(r1);

Jason Ross
Jason Ross on 21 Nov 2011
What GPU are you using? The range of computing power varies considerably across the range of GPU hardware.

Nate
Nate on 21 Nov 2011
@Daniel - Yes, that was a typo... and to be clear, I created a separate m file, into which I copied only the sqrt of the sum of the squares, then created arrays of random numbers with which to time the different methods (i.e. gpu vs. cpu) using tic and toc.
@Jan - In the tests I did, this rewrite made a difference of around 5% on this particular calculation...which is nice to know (thanks!), but not the order of magnitude difference I observed in the larger code.
@Jason - This is our test setup, and as such, it is only a GTS250 which has 1GB RAM and 192 cores... though it is light on computational resources, we're still seeing a speedup of 30% or so on basic operations (squaring a matrix etc) Where it gets slow is when it has to interact with other code...
With some further experimentation, I was able to determine that passing the function and the code around the function to the GPU (as a separate function using arrayfun) was the only way to get the 30% speedup.
The statement was inside 3 nested for loops, (with some other minor operations) and once I put all this code into a separate function...i see the increase...
As an example (where all variables except i,j,k are on GPU):
This is slow:
for i = 1:numloops
for j = 1:numtimes
for k = 1:numits
g1pxx1=px-ga1;
g1pyy1=py-ga2;
g1pzz1=pz-ga3;
r2=sqrt(g1pxx1.^2+g1pyy1.^2+g1pzz1.^2);
end
end
end
And this is fast:
res=arrayfun(@gpuloop, numloops, numtimes, numits, px, py, pz, ga1, ga2, ga3, out, r4);
where
function [ out ] = gpuloop(numloops, numtimes, numits, px,py,pz,ga1,ga2,ga3,out,r4)
for i = 1:numloops
for j = 1:numtimes
for k = 1:numits
g1pxx1=px-ga1;
g1pyy1=py-ga2;
g1pzz1=pz-ga3;
r4=sqrt(g1pxx1.^2+g1pyy1.^2+g1pzz1.^2);
end
end
end
out = r4;
end
Here are the results:
ARRAYFUN: avg time per iteration: 0.0011235
ARRAYFUN ALL GPU: avg time per iteration: 1.5829e-005
DIRECT GPU: avg time per iteration: 0.0019522
MATLAB NO GPU: avg time per iteration: 0.00048984
Where "ARRAYFUN ALL GPU" is the second example and "DIRECT GPU" is the first.

Categories

Find more on Get Started with GPU Coder in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!