how can I output different array size than input by GPU arrayfun?

Question

Hao Zhang on 8 Dec 2018

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/434594-how-can-i-output-different-array-size-than-input-by-gpu-arrayfun

Commented: Hao Zhang on 10 Dec 2018

I am doing particle simulations using matlab. So I have very large data sets which is the position of each particles, for example, [rx ry], where rx and ry are large column vecotrs containing the x and y coordinates of all the particles. So suppose there are 1e6 particles then the dimension of rx and ry both are 1e6 by 1. In the simulation I have to find the neighbor particles within a certain range R for each particles. Using CPU I wrote something like this using bruto force search

function [idx_Neighbor,d_Neighbor]=find_neighbors_Naive_func(rx,ry,N_par,r_neighbor,h_smo)

idx_Neighbor=cell(N_par,1);

d_Neighbor=cell(N_par,1);

for j=1:N_par

dj=(rx-rx(j)).^2+(ry-ry(j)).^2;

[idx_Neighbor_j]=find(dj<=r_neighbor^2);

d_Neighbor_j=((dj(idx_Neighbor_j)).^0.5)./h_smo;

idx_Neighbor{j}=idx_Neighbor_j;

d_Neighbor{j}=d_Neighbor_j;

end

This function output a N_par by 1 cell array idx_Neighbor whose individual elements are the indices of the neigbor particels for every particles in the domain. This works fine using CPU, but it is very slow so I want to write a GPU version of the code by using arrayfun. But as far as I know, the GPU arrayfun requires the output array the same size as the input array, and it does not support cell array. Is there any way to get around this or there is no simple way other than write a cuda c code? Thanks!

p.s. I realize the kd tree rangesearch function, but this function does not support GPU. And I want to avoid transfering data between GPU and CPU since I implement other part of the simulation code also on GPU.

some following up, through some experimental on Matlab GPU coder, I managed to translate the following matlab code to cuda knernal (mex):

function [Numpar_neighbor]=find_neighbors_Naive_func_test(rx,ry,r_neighbor)

Numpar_neighbor=zeros(size(rx),'single'); %%% number of neighbor particles for each particle j

coder.gpu.kernelfun;

for j=1:size(rx,1)

dj=(rx-rx(j)).^2+(ry-ry(j)).^2;

[idx_Neighbor_j]=find(dj<=r_neighbor^2);

Numpar_neighbor(j)=length(idx_Neighbor_j);

end

The cuda mex file runs fine, but it returns an ordinary array instead of gpuArray although the code runs on the GPU. So if I use the returned array Numpar_neighbor in the following part of the code I need to transfer it to the GPU, adding unnecessary overhead. Why is this and any way to force the mex file return gpuArray?

Thanks again!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 8 Dec 2018

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/434594-how-can-i-output-different-array-size-than-input-by-gpu-arrayfun#answer_351215

Open in MATLAB Online

You can't output a variable sized output from GPU arrayfun, which would require atomic operations. You're going to have to compute all those distances or you're going to need to sort your data into bins so that you can reduce the search space.

Computing every distance is easy, just use pdist2, that's what it's for.

distances = pdist2([rx ry], [rx ry]);

Now threshold to get the entries within the range, and index (or call find) to get the list you're after

neighbours = distances < r_neighbour;
neighboursList = cellfun(@(i)find(neighbours(:,i)), 1:N_par, 'UniformOutput', false);

Now, if you can't fit the whole N_par-by-N_par array distances into memory, you can compute in batches by looping over one of the inputs to pist2 in chunks of a more reasonable size.

If this is still too slow then you're going to need to bin your data so that you can reduce the search space for each particle.

Another option might involve using knnsearch. You could find the K nearest neighbours of each point, and then threshold the distance output from that to find the points within range. K would need to be large enough to ensure you got all of the neighbours in range.

Others may have more efficient solutions.

1 Comment
Show -1 older commentsHide -1 older comments

Hao Zhang on 10 Dec 2018

Thank you for your answer! It is very helpful! I am going to test the idea and get back to here if I got positive speed up!

Sign in to comment.

how can I output different array size than input by GPU arrayfun?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

how can I output different array size than input by GPU arrayfun?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments