gpuArray slower on newer graphics card in double precision

Question

2 votes

I have been making the following speed test in R2015a on two different computers running two different graphics cards,

>> A=gpuArray(rand(5e3));
>> T=gputimeit(@()A*A)

The first computer is an older model (Dell Precision T7500) running an older graphics card (GTX 580). The second, newer computer (Dell Precision Tower 7910) is running a newer graphics card (Titan X).

Oddly, I find that the older configuration outperforms the newer by about 20%. The GTX 580 gives T=1.1178 seconds, whereas the Titan X gives T=1.3097 seconds. When I redo the test in single precision,

    >> A=gpuArray(rand(5e3,'single'));
    >> T=gputimeit(@()A*A)

the results are more in line with my expectations. The GTX 580 gives T=0.2121 seconds, whereas the Titan X gives T=0.0491 seconds.

I'm wondering what could account for this difference. One thing that might be worth mentioning is that the Titan X is not using a fully updated driver. At the time of this writing, there is some bug in its newest driver release, making it unusable, and I am instead using driver version 353.62. Could this be the reason? If not, any other ideas?

7 Comments
Show 5 older comments Hide 5 older comments

Matt J on 3 Aug 2015

Edited: Matt J on 3 Aug 2015

Brendan's response does indeed look like an answer, and is supported by this article so, Brendan, if you resubmit as an Answer, I will accept.

Ultimately, though, my computationally intensive work will mainly be single precision. I was just curious about the behavior I was seeing, and whether it might be due to a bad driver. So, I don't know if "the Titan X is a terrible card" is applicable to me.

Brendan Hamm on 3 Aug 2015

Added double precision to that terrible line :)

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Brendan Hamm on 3 Aug 2015

2 votes

The Titan X is a terrible card to use for double precision GPGPU as it was designed as a cheaper alternative to other Titans with a focus on single precision (gaming). You will see that the GFLOPS for double precision is about 1/32 that of single precision on the Maxwell chips. Compare that with the Fermi architecture used on the GTX 580 which has 1/5 the GFLOPS for double precision compared with its single precision. If you intended to use this for double precision I would highly recommend using the Titan Z (or Black) which uses the Kepler architecture. Therefore if you have a Titan Black, this would not be rolling back at all, but rather using a card which considered double precision as being important.

1 Comment
Show -1 older comments Hide -1 older comments

Brendan Hamm on 3 Aug 2015

Edited: Brendan Hamm on 3 Aug 2015

More info can be found here as well: NVidia Comparisson Wiki.

For single precision work, the Titan X is the card to use, so looks like you made a good choice. It does have less cores than the Titan Z, but a higher clock rate and a lower price point.

Sign in to comment.

gpuArray slower on newer graphics card in double precision

7 Comments
Show 5 older comments Hide 5 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

gpuArray slower on newer graphics card in double precision

7 Comments Show 5 older comments Hide 5 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments

1 Comment
Show -1 older comments Hide -1 older comments