gpuArray slower on newer graphics card in double precision

I have been making the following speed test in R2015a on two different computers running two different graphics cards,
>> A=gpuArray(rand(5e3));
>> T=gputimeit(@()A*A)
The first computer is an older model (Dell Precision T7500) running an older graphics card (GTX 580). The second, newer computer (Dell Precision Tower 7910) is running a newer graphics card (Titan X).
Oddly, I find that the older configuration outperforms the newer by about 20%. The GTX 580 gives T=1.1178 seconds, whereas the Titan X gives T=1.3097 seconds. When I redo the test in single precision,
>> A=gpuArray(rand(5e3,'single'));
>> T=gputimeit(@()A*A)
the results are more in line with my expectations. The GTX 580 gives T=0.2121 seconds, whereas the Titan X gives T=0.0491 seconds.
I'm wondering what could account for this difference. One thing that might be worth mentioning is that the Titan X is not using a fully updated driver. At the time of this writing, there is some bug in its newest driver release, making it unusable, and I am instead using driver version 353.62. Could this be the reason? If not, any other ideas?

7 Comments

Surprising. My older EVGA Titan Black gives T=1.0277 in double, and T=0.0737 in single.
No answer but an anecdote: when I first installed the Titan Black, the power unit was not supporting the load and I got terrible performance, crashes, etc. I spent a good 4 hours cursing at Windows, MATLAB (not proud of that though ;-)), EVGA, until I got lucky enough that a heavy computation triggered a power off of the machine.. which helped me finding the cause.
Thanks, Cedric. It makes me wonder if I should roll back to the Titan Black.
Not sure why the power failure was a "lucky" thing, though. I'm seeing a fair amount of crashes both on the Titan X and the GTX 580 as well. What was the solution? You just needed a computer with a stronger power supply?
Hi Matt, it was a good thing because otherwise I would never have thought about the power supply. My PSU has two pairs of 6 and 8 pins PCI-E power outputs; one pair is white-black (6+8) and the other is blue-black (6+8). I used all white-black at first and it crashed. Then I mixed and it worked (I also tried with dual 4 + adapter and it went well), which seems to indicate that they are wired to separate circuits internally and mixing just splits the load.
PS :
The Titan X is a terrible card to use for GPGPU as it was designed as a cheaper alternative to other Titans with a focus on single precision (gaming). You will see that the GFLOPS for double precision is about 1/32 that of single precision on the Maxwell chips. Compare that with the Fermi architecture used on the GTX 580 which has 1/5 the GFLOPS for double precision compared with its single precision. If you intended to use this for double precision I would highly recommend using the Titan Z (or Black) which uses the Kepler architecture. Therefore if you have a Titan Black, this would not be rolling back at all, but rather using a card which considered double precision as being important.
This looks like an answer!
Brendan's response does indeed look like an answer, and is supported by this article so, Brendan, if you resubmit as an Answer, I will accept.
Ultimately, though, my computationally intensive work will mainly be single precision. I was just curious about the behavior I was seeing, and whether it might be due to a bad driver. So, I don't know if "the Titan X is a terrible card" is applicable to me.
Added double precision to that terrible line :)

Sign in to comment.

 Accepted Answer

The Titan X is a terrible card to use for double precision GPGPU as it was designed as a cheaper alternative to other Titans with a focus on single precision (gaming). You will see that the GFLOPS for double precision is about 1/32 that of single precision on the Maxwell chips. Compare that with the Fermi architecture used on the GTX 580 which has 1/5 the GFLOPS for double precision compared with its single precision. If you intended to use this for double precision I would highly recommend using the Titan Z (or Black) which uses the Kepler architecture. Therefore if you have a Titan Black, this would not be rolling back at all, but rather using a card which considered double precision as being important.

1 Comment

More info can be found here as well: NVidia Comparisson Wiki.
For single precision work, the Titan X is the card to use, so looks like you made a good choice. It does have less cores than the Titan Z, but a higher clock rate and a lower price point.

Sign in to comment.

More Answers (0)

Categories

Find more on Language Fundamentals in Help Center and File Exchange

Asked:

on 31 Jul 2015

Edited:

on 3 Aug 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!