GPU: Iteratively adding 2D gaussians to a 2D array

2 views (last 30 days)
Hi everyone.
First I'd like to thank you for taking the time to consider my problem.
I am currently building a novel microscope for imaging samples (cells, nano-particles, graphene, etc.) at sub-diffraction limited resolution. I won't go into too many details here so this is what you need to know. I have three columns of time dependent information:
1) Px(t) = A pointer to the x-coordinate of the image (ranging from 1 to 2048) [float]
2) Py(t) = A pointer to the y-coordinate of the image (ranging from 1 to 2048) [float]
3) I(t) = The measured intensity [float]
The objective here it to produce a 2D gaussian centred around (Px,Py) with intensity I for each point in time. Here is what I have written so far:
% Two pixel arrays and one intensity array
Px_t = [x1, x2, ..., xn];
Py_t = [y1, y2, ..., yn];
I_t = [I1, I2, ..., In];
% Define the span of the gaussian in x & y
x = linspace(1,2048,2048);
y = linspace(1,2048,2048);
% Let stdev = 1 for simplicity
sgma = 1;
% Initialize the image space
image = zeros(2048);
% Loop through time data
for t = 1:n
x0 = Px_t(t);
y0 = Py_t(t);
I = I_t(t)
% Generate new gaussian spread around x0 & y0
gx = exp(-(x-x0).^2/(2*sgma^2));
gy = exp(-(y-y0).^2/(2*sgma^2));
gxy = gx'*gy;
% Add the intensity scaled gaussian to the image array
image = image + I*gxy;
end
My question is if there is any way to perform this operation in parallel on the GPU? It is impossible to store n 2048x2048 arrays in memory and add them up after so I've resorted to this iterative approach.
All the best,
- Luke

Accepted Answer

Joss Knight
Joss Knight on 2 Dec 2015
You are already getting a lot of data parallelism out of what you have - you are performing multiple element-wise operations on 4 million elements at once. Just stick your input data x, y and image in a gpuArray and see what happens.
Two improvements come to mind: You might get a speedup on the GPU if you use arrayfun to package the contents of your loop into a single kernel:
function im = addGaussian(im, xin, yin, x0in, y0in, intensity)
gx = exp(-(xin-x0in).^2/(2*sgma^2));
gy = exp(-(yin-y0in).^2/(2*sgma^2));
gxy = gx*gy;
im = im + intensity*gxy;
end
[x,y] = ndgrid(gpuArray.linspace(1,2048,2048));
image = gpuArray.zeros(2048);
for t = 1:n
image = arrayfun(@addGaussian, image, x, y, Px_t(t), Py_t(t), I_t(t));
end
Secondly, can you generate one Gaussian image that is twice as big in each dimension and then index into it inside the loop to crop the portion you need centered on the pixel of interest? That would save a lot of repeated computation of the same values.
  2 Comments
Luke Melo
Luke Melo on 2 Dec 2015
Thank you for the feedback. Option one didn't provide much performance increase. The 2D gaussians were produced about an order of magnitude faster than previously, however the rate limiting step was the cumulative addition of the image matrix.
I have implemented an algorithm similar to your second suggestion and its actually much faster. What I've done is create an array that encapsulates 99.9% of the gaussian volume and added this to a subspace of the image matrix. This has considerably cut computation time.
Thanks for the prompt response.
Joss Knight
Joss Knight on 3 Dec 2015
Edited: Joss Knight on 3 Dec 2015
When you say "the rate limiting step was the cumulative addition of the image matrix" - in my suggestion I moved the addition inside the arrayfun function, so you wouldn't then be able to distinguish the cost of creating the gaussians from the cost of the addition. There's no particular reason why the addition would be any slower than the gaussian creation steps when run in parallel, except that it has to write to global memory.
However, the second option does seem more efficient.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!