Apply different Functions on same data using GPU simultaneously

Hi, guys.
I want to use my GPU for calc. For example, I want to get sin, cos, plus on same data using GPU simultaneously.
a = rand(1000, 1);
b = sin(a);
c = cos(a);
d = plus(a, 1);
This code runs in series, so it takes time. How can I use my GPU for calc. this code?
With pagefun, I know I can apply same function on different set of data. but, vice versa, what do I do?
And in addition, how can I code with user-defined funtion to GPU?
Thank you~~!

Answers (1)

Adam
Adam on 30 Sep 2016
Edited: Adam on 30 Sep 2016
The GPU is not well optimised for running different functions on data in parallel as far as I am aware so that wouldn't be faster, I imagine it would be slower.
If your data is sufficiently big then running each of those instructions on the GPU, still in sequence might be faster, but you would have to test it. I have tried GPU programming a few times, but I always either run into memory issues or the data transfer times are just too long to make it worthwhile so I have never really found a usage for it in what I do.
The GPU is optimised to do the same operation on a lot of data in parallel so, for example, it could run all your 1000 sin calculations in parallel very quickly, except that the time taken to transfer the data to and from the GPU would in this case make it take a lot longer than the CPU I imagine.
To do anything on the GPU you need the Parallel Processing Toolbox though.
Your code above takes of the order of 0.0005s on my machine on the CPU. How long is it taking you that you need it speeded up?!

4 Comments

Hi, Adam. Thank you for your reply.
Frankly I expect that one cuda core acts like a cpu thread, so it takes less time than cpu calc. But I found that Matlab support to Cuda isn't that useful for my coding. Above example is just a example. In real, I use about 90,000 double type data and repeat calc about 10mil times;;;; It is very hard to do parallel calculation in my code, because it needs a very large number of application of different function on same data.
GPU cores are very different to CPU threads. CPU cores are optimised for sequential processing, GPU cores for parallel processing so doing sequential type operations on a GPU is not efficient.
If you are running a function like sin on 90,000 doubles then it may be faster on GPU than CPU, but not parallelised with cosines and random other operations, just parallelised by itself. I don't know how the CPU vectorisation of something like sin works though as it is obviously a lot faster than running individual sines in a for loop.
It's hard to answer your points without knowing more about your real use case. Usually there's a way to write your code, using out-of-the-box gpuArray functionality or tools like arrayfun, pagefun, and accumarray to get the GPU to execute all of your operations at once so you don't have to use a loop. If your use case is fundamentally task parallel - you want to execute sin() in one thread, cos() in another for instance - then the GPU is not appropriate, but it is precisely what the parallel computing toolbox's other features such as parfor and batch are for. If your loop is fundamentally serial - each time round the loop you're using the result from the time before - then maybe no parallel solution can help you.

Sign in to comment.

Tags

Asked:

on 30 Sep 2016

Commented:

on 1 Oct 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!