4 views (last 30 days)

Here's my code:

% for j=1:4

tic;

reset(gpuDevice(1)); clear all; % clean up

format long; % show double precision

R_i_gpu=gpuArray(6); % initial radius

dL_gpu=gpuArray(1.e-5); % delta length

n_gpu=R_i_gpu/dL_gpu; % calculate the number of steps or intervals

theta_i_gpu=dL_gpu/R_i_gpu; % calculate the initial theta

R_final_gpu=gpuArray(2); % final radius

dR_gpu=R_final_gpu-R_i_gpu; % calculate delta radius

d_theta_gpu=gpuArray(3*pi/2); % angle that the radius varies over (i.e. (R_final-R_initial)/d_theta))

dR_d_theta_gpu=dR_gpu/d_theta_gpu; % calculate the rate of change of radius with respect to theta

% Ri=R_i-(dR_d_theta*theta_i)

% thetai=dL/Ri

Ri_gpu=R_i_gpu; % initialise radius

thetai_gpu=theta_i_gpu; % initialise theta

n=gather(n_gpu);

itime=toc

tic;

for i=1:n-1; % for loop

Ri_gpu=Ri_gpu+(dR_d_theta_gpu*thetai_gpu); % update radius

thetai_gpu=dL_gpu/Ri_gpu; % update theta

R_gpu(i)=Ri_gpu; % put the radius into a column array

theta_gpu(i)=thetai_gpu; % put the theta into a column array

% A_gpu=[R_gpu ;theta_gpu]'; % create the radius/theta array

end

rtime=toc

%%tic

% R_gpu=[R_i_gpu R_gpu]'; % horizontally concatenate the initial radius with the radius array that's calculated for each of the interval steps

theta_gpu=[theta_i_gpu theta_gpu]'; % hcat the initial theta with the theta array

theta_sum_gpu=sum(theta_gpu); % sum the theta (in radians)

theta_sum_deg_gpu=theta_sum_gpu*360/(2*pi); % convert the theta sum to degrees

% ptime=toc

% end

I'm running this on a 3930K with a nVidia GTX 660 Superclocked and I noticed that as my dL_gpu goes from 1.e-4 to 1.e-5, the effective computational rate decreased. So, I started using GPU-Z to monitor the memory usage and the GPU load and the GPU memory controller load and found that both the GPU load and GPU memory controller load increases as time goes on and now I am trying to figure out why it is doing that?

Should it be that if you're solving a 1-D integration of a Newton's approximate-like solution that for each step/iteration, the time required is the same?

I'm trying to understand how MATLAB builds arrays for A(i)=B. Does it rebuild the entire array at each iteration or does it just add the latest entry to the bottom of the list?

And if that is the case, then why is the memory controller load going up (also as a function of time)?

Any assistance that can try and help me understand what's going on behind the scenes would be greatly appreciated! Thank you!

Joss Knight
on 16 Jun 2014

Edited: Joss Knight
on 16 Jun 2014

A(i) = B without pre-allocation will add data to the end of your array until it runs out of space, then it will allocate more space, copy the existing array across, and continue. This takes a lot of time. This is where all your little spikes come from. The larger the array, the longer the resize operation takes: when you have smaller deltas your array is getting longer and longer and so everything is running slower and slower.

Note that what you are doing here is not appropriate for GPU computation. The GPU is useful for operating in parallel on large arrays of data. You are not doing anything in parallel here, so the GPU is mostly idle and you've wasted a lot of time sending data over to it.

Another tip: don't put scalar data onto the GPU (e.g. R_i_gpu=gpuArray(6)). Only put arrays on the GPU. GPU code will automatically bring scalars across to the GPU if and when it is necessary.

Opportunities for recent engineering grads.

Apply TodayFind the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 1 Comment

## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/127281-why-does-the-gpu-load-increase-over-time#comment_220382

⋮## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/127281-why-does-the-gpu-load-increase-over-time#comment_220382

Sign in to comment.