Plot a curve that splits data into two sets

7 views (last 30 days)
RPatel
RPatel on 4 Aug 2017
Commented: Image Analyst on 14 Aug 2017
Hello,
I have data points which represent 2 classes (collisions avoided and probable collisions). My goal is to plot a curve (polynomial equation), that would split the data points say in a chosen ratio (Say 90% collisions avoided to 10% probable collisions). Note that data points corresponding to two classes are very close.
I have tried using 'fit' funciton in matlab, and for a polynomial of degree 8, here is what I get (refer image). But it doesn't split the data as required.
I am looking at Support Vector Machines for Binary Classification (I am not an expert in this domain), I am not sure if it would help. How can I get the data seggregation I want?
Best,
Raj

Answers (3)

Greg Heath
Greg Heath on 4 Aug 2017
Your data is extremely discontinuous. The best you can hope for is a decision tree.
Hope this helps
Thank you for formally acceptingmy answer
Greg
  2 Comments
RPatel
RPatel on 14 Aug 2017
Thanks Greg for your suggestion, but it will not help my study...
Image Analyst
Image Analyst on 14 Aug 2017
Too bad because I think that's your best shot at a possible solution. Since your data is so overlapping, I think that those two parameters are not enough to do the discrimination. You'd best try to look for a third or fourth parameter, like acceleration, velocity vector angles, or something. If you can't, then I think a treebagger/random forest/decision tree type of approach is the best you can hope for, like Greg said. See the scatterplot example on https://www.mathworks.com/help/stats/ensemble-methods.html#bsx62vu Actually your ad hoc convex hull example is somewhat related to a treebagger type of solution. It also sounds a bit like dbscan https://en.wikipedia.org/wiki/DBSCAN

Sign in to comment.


John D'Errico
John D'Errico on 4 Aug 2017
But why would a polynomial regression fit have any chance of satisfying this goal? It would be pure random chance if it came even close. It is especially wrong to hope that such a fit, based on purely distance as the independent variable would have a chance.
It seems you are looking for a nonlinear discriminant curve, based on both velocity and distance. I'd suggest neural nets, but just because you want to see a 90% success rate does not mean any such function exists. You could have as easily have insisted on a 99.99% target success rate. If wishes were horses, beggars would ride.
What you need to be modeling is a boolean result, thus collision or not, as a function of TWO independent variables, vehicle velocity and inter-vehicle distance. Again, use a tool of your choice. But a polynomial regression is still NOT the tool I would ever advise here.
  1 Comment
RPatel
RPatel on 4 Aug 2017
Hello John,
I never had to do something of this sort before, and I have no idea about the diverse tools matlab offers to solve this kind of an issue. 'Polynomial regression fit' is just one of them I came across and I tried.
Indeed, I would like to have a different curve, for different percentage of success rate (90 %, 99%, etc.).
I will have a look at neural nets to see if it helps. Thanks for your comments :)

Sign in to comment.


RPatel
RPatel on 14 Aug 2017
As there doesn't seem to be any solution to this, here is what I did:
I found the centroid, chose x % of the closest points. Then I plot a convexhull around those points. Next, I check whether a particular point of interest lies in or out of the convex hull. Using this, I manage to get the percentage of collisions avoided to probable collisions (of points inside the hull)..
Hope this helps to others who might face a similar situation...

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!