Deleting X-Y points that are not near other points on a field of data points
12 views (last 30 days)
I have a set of data. This data is around 900 rows of two columns. Each row has an X and a Y value which specifies a point on the X-Y plane. The X-Y plane is from 0 to 100 and 0 to 100 respectively. All of these points are randomly scattered throughout the X-Y plane. My problem is there are too many X-Y points cluttering up the scatter plot. So what I want to do is have Matlab look at each point and say: Is this point a distance of 10 or less to another point. If it is then keep it. If it isn’t then delete the row containing that X, Y value. A shortened example of my data:
X=[1 2 3 4 20];
Y=[1 3 4 3 59];
Since (20,59) is more than a distance of 10 away from the other points, delete it and return the following:
X2=[1 2 3 4];
Y2=[1 3 4 3];
If anyone knows how I could do this, It would be a very great thing.
per isakson on 6 Jun 2012
See Doug's video Advanced: making a 2d or 3d histogram to visualize data density and search the FEX for "hist2"
I failed to find a solution in the FEX. Here is a naive code with "10" hard-coded in the magic number "100".
to_be_removed = false(size(X));
for ii = 1 : length(X)
is = (X-X(ii)).^2+(Y-Y(ii)).^2 <= 100;
is(ii) = false;
if not( any( is ) )
to_be_removed(ii) = true;
More Answers (2)
Geoff on 6 Jun 2012
Naive (brute force) implementation given by per isakson looks sufficient for this problem. O(N^2) is okay for 900 rows. For larger sets, I'd consider partitioning the points into a quad tree.
However, without making things complicated, I would say that the number of candidates for removal will be small due to your X and Y range. You could easily speed up the naive algorithm by first approximating the local point-density into a 21x21 array (cel-sizes of 5 with extra one for the ends) and then only do a search on points that are unique to a cel address.