Solving pde using GPU

Hello
I am currently solve numerically a non linear time dependent PDE using Euler method. When I run my code with CPU everything seems to work fine however when I use GPU (transforming all my variables using gpuarray ) the solution changes significantly and in some point "explodes" (I compared the solution between CPU and GPU in the same time and they were totally different ) . Did anyone see something like that before? thank you

7 Comments

You're going to have to show us some example code before we have any hope of answering your question. On the face of it, no, there shouldn't be any significant difference between the CPU and GPU behaviour. Most likely the problem you are presenting to the GPU isn't actually the same.
kevin k
kevin k on 9 Mar 2016
Edited: kevin k on 10 Mar 2016
An example to the problem you can try :
clear
T=rand(100,100);
% T=gpuArray(T)
disp(max(max(abs(imag(ifft2(fft2(T)))))))
There is difference in the result whither you use GPU or not.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
kevin k
kevin k on 9 Mar 2016
Edited: kevin k on 10 Mar 2016
It seems that the bug was detected , I just do not understand why it is the problem.. when I added T=real(ifft2(T)) (instead the line T=ifft2(T)) it worked fine. I also noticed that after taking ifft2(T) the solution has imaginary part which grows in each time step till eventually the solution explodes. (This imaginary part is vanish when I used CPU) Is it something familiar? Does gpu has problem with complex variables?
@ Joss Knight
Could similar thing that is happening on here happening here?
Also, on NVCC we have the option for -fast-math, which is faster but less accurate. Is any of the underlying libraries compiled with that switch enabled or compiled in such a way that it is equivalent ot -fast-math?
kevin k
kevin k on 10 Mar 2016
Edited: kevin k on 10 Mar 2016
Thank you for your answer.
I applied the changes following your link:
T_hat=fftGPUWorkaround(fftGPUWorkaround(T, NN, 2).', NN, 2).';
instead of
T_hat=fft2(T)
in my code
however the inaccuracy is still affect my results significantly, which does not happen using CPU.
T_hat=fftGPUWorkaround(fftGPUWorkaround(T, NN, 2).', NN, 2).'; is close to the CPU value of fft2(T) by 1e-14, instead of 1e-13 before.
T is NN*NN matrix
about your question: I don't know -how can I check it?
That bug workaround is for the FFT, not for FFT2. And it only does anything for vector inputs of certain lengths, so it will literally be doing nothing in your case. All you've done is make the FFT2 implementation more like the CPU one - so less efficient, but the results will be closer.
Looks like all we're talking about here is numerical accuracy. You haven't actually shown what it is you are iterating on, but it's a fair assumption that you are continually calling ifft2(fft2(X))? This will inevitably have the effect of causing slight numerical offsets to grow, which is why you need to insert real(X) into the loop to remove the extraneous imaginary part.
I can't exactly explain how the CPU's version of the FFT can perfectly reflect the result of FFT with the IFFT and end up with something with a zero imaginary part - no doubt it just falls out in the equations. However, the GPU FFT is computed in parallel and won't be able to provide the same perfect mirroring properties.
In short, it is perfectly valid for you to remove the imaginary part when you know the result is supposed to be real.
kevin k
kevin k on 15 Mar 2016
Edited: kevin k on 15 Mar 2016
thank you. I do continually calling ifft2(fft2(X)).
erasing the real part did not help since it seems that there is lack of accuracy also in the real part.
I compared a solution using CPU and then with GPU including deleting imaginary part- The results are different in O(1) error so the it is still avoiding me to use GPU.

Sign in to comment.

Answers (0)

Asked:

on 8 Mar 2016

Edited:

on 15 Mar 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!