# Deleting X-Y points that are not near other points on a field of data points

12 views (last 30 days)
charles atlas on 6 Jun 2012
I have a set of data. This data is around 900 rows of two columns. Each row has an X and a Y value which specifies a point on the X-Y plane. The X-Y plane is from 0 to 100 and 0 to 100 respectively. All of these points are randomly scattered throughout the X-Y plane. My problem is there are too many X-Y points cluttering up the scatter plot. So what I want to do is have Matlab look at each point and say: Is this point a distance of 10 or less to another point. If it is then keep it. If it isn’t then delete the row containing that X, Y value. A shortened example of my data:
X=[1 2 3 4 20];
Y=[1 3 4 3 59];
Since (20,59) is more than a distance of 10 away from the other points, delete it and return the following:
X2=[1 2 3 4];
Y2=[1 3 4 3];
If anyone knows how I could do this, It would be a very great thing.

per isakson on 6 Jun 2012
See Doug's video Advanced: making a 2d or 3d histogram to visualize data density and search the FEX for "hist2"
I failed to find a solution in the FEX. Here is a naive code with "10" hard-coded in the magic number "100".
X=[1,2,3,4,20];
Y=[1,3,4,3,59];
to_be_removed = false(size(X));
for ii = 1 : length(X)
is = (X-X(ii)).^2+(Y-Y(ii)).^2 <= 100;
is(ii) = false;
if not( any( is ) )
to_be_removed(ii) = true;
end
end
X(to_be_removed)=[];
Y(to_be_removed)=[];
charles atlas on 7 Jun 2012
I had to tweak the code a little bit to fit my actual data,.. but it worked well, Thank you

Geoff on 6 Jun 2012
Naive (brute force) implementation given by per isakson looks sufficient for this problem. O(N^2) is okay for 900 rows. For larger sets, I'd consider partitioning the points into a quad tree.
However, without making things complicated, I would say that the number of candidates for removal will be small due to your X and Y range. You could easily speed up the naive algorithm by first approximating the local point-density into a 21x21 array (cel-sizes of 5 with extra one for the ends) and then only do a search on points that are unique to a cel address.
charles atlas on 7 Jun 2012
Sorry I havent been able to get into the office and test the code until today.
the read data is latitude and longitudes but for simplicity's sake, I said it was 0 to 100 on the X and Y axis (which would actually be the longitude and latitude axes respectively.
The code did what it was supposed to do when I tested it, but It neglected half the values that were jumbled together (that is at a distance of about 600 yards away, aka <= .005 as a difference in lat and long squared, added and then square rooted.
charles atlas on 7 Jun 2012
That is; I used isakson's code from above.

Image Analyst on 7 Jun 2012
If you displayed it as an image instead of a scatterplot, you wouldn't have that problem. Why not give it a try?
per isakson on 7 Jun 2012
That's what Doug shows