GPU: Iteratively adding 2D gaussians to a 2D array

Question

Luke Melo on 27 Nov 2015

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/257894-gpu-iteratively-adding-2d-gaussians-to-a-2d-array

Edited: Joss Knight on 3 Dec 2015

Accepted Answer: Joss Knight

Open in MATLAB Online

Hi everyone.

First I'd like to thank you for taking the time to consider my problem.

I am currently building a novel microscope for imaging samples (cells, nano-particles, graphene, etc.) at sub-diffraction limited resolution. I won't go into too many details here so this is what you need to know. I have three columns of time dependent information:

1) Px(t) = A pointer to the x-coordinate of the image (ranging from 1 to 2048) [float]

2) Py(t) = A pointer to the y-coordinate of the image (ranging from 1 to 2048) [float]

3) I(t) = The measured intensity [float]

The objective here it to produce a 2D gaussian centred around (Px,Py) with intensity I for each point in time. Here is what I have written so far:

% Two pixel arrays and one intensity array
Px_t = [x1, x2, ..., xn];
Py_t = [y1, y2, ..., yn];
I_t = [I1, I2, ..., In];
% Define the span of the gaussian in x & y
x = linspace(1,2048,2048);
y = linspace(1,2048,2048);
% Let stdev = 1 for simplicity
sgma = 1;
% Initialize the image space
image = zeros(2048);
% Loop through time data
for t = 1:n
    x0 = Px_t(t);
    y0 = Py_t(t);
    I = I_t(t)
      % Generate new gaussian spread around x0 & y0
      gx = exp(-(x-x0).^2/(2*sgma^2));
      gy = exp(-(y-y0).^2/(2*sgma^2));
      gxy = gx'*gy;
      % Add the intensity scaled gaussian to the image array
      image = image + I*gxy;
  end

My question is if there is any way to perform this operation in parallel on the GPU? It is impossible to store n 2048x2048 arrays in memory and add them up after so I've resorted to this iterative approach.

All the best,

- Luke

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 2 Dec 2015

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/257894-gpu-iteratively-adding-2d-gaussians-to-a-2d-array#answer_201979

Open in MATLAB Online

You are already getting a lot of data parallelism out of what you have - you are performing multiple element-wise operations on 4 million elements at once. Just stick your input data x, y and image in a gpuArray and see what happens.

Two improvements come to mind: You might get a speedup on the GPU if you use arrayfun to package the contents of your loop into a single kernel:

   function im = addGaussian(im, xin, yin, x0in, y0in, intensity)
      gx = exp(-(xin-x0in).^2/(2*sgma^2));
      gy = exp(-(yin-y0in).^2/(2*sgma^2));
      gxy = gx*gy;
      im = im + intensity*gxy;
   end
   [x,y] = ndgrid(gpuArray.linspace(1,2048,2048));
   image = gpuArray.zeros(2048);
   for t = 1:n
      image = arrayfun(@addGaussian, image, x, y, Px_t(t), Py_t(t), I_t(t));
   end

Secondly, can you generate one Gaussian image that is twice as big in each dimension and then index into it inside the loop to crop the portion you need centered on the pixel of interest? That would save a lot of repeated computation of the same values.

2 Comments
Show NoneHide None

Luke Melo on 2 Dec 2015

Thank you for the feedback. Option one didn't provide much performance increase. The 2D gaussians were produced about an order of magnitude faster than previously, however the rate limiting step was the cumulative addition of the image matrix.

I have implemented an algorithm similar to your second suggestion and its actually much faster. What I've done is create an array that encapsulates 99.9% of the gaussian volume and added this to a subspace of the image matrix. This has considerably cut computation time.

Thanks for the prompt response.

Joss Knight on 3 Dec 2015

Edited: Joss Knight on 3 Dec 2015

When you say "the rate limiting step was the cumulative addition of the image matrix" - in my suggestion I moved the addition inside the arrayfun function, so you wouldn't then be able to distinguish the cost of creating the gaussians from the cost of the addition. There's no particular reason why the addition would be any slower than the gaussian creation steps when run in parallel, except that it has to write to global memory.

However, the second option does seem more efficient.

Sign in to comment.

GPU: Iteratively adding 2D gaussians to a 2D array

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

GPU: Iteratively adding 2D gaussians to a 2D array

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None