# How can I better classify my data

2 views (last 30 days)
Izem on 25 Aug 2020
Commented: Izem on 25 Aug 2020
Hi everyone,
I am trying to separate data into two groups using clustering (k-means) as you can see in the picture attached. My algorithm work perfectly on some data (see pic1),the problem is that it does not work in some regions for other data (see pic2), do you have any suggestions on how I can better filter the upper points ? Do you think a regression on the lower points would solve the problem ? If yes how can I do it only for these lower points and not all the data ?

Izem on 25 Aug 2020
@Adam Danz I clustered the data with k-means and fitted them using :
[pop,gof] = fit(X1,Y1,'poly2')
After that I only can plot the residuals using :
plot(pop,X1,Y1,'residuals')
and can't have access to the computed residuals of each point so that I can set a treshold and classify them, can you help please ?
Adam Danz on 25 Aug 2020
You can access the residuals using
[pop,gof,output] = fit(X1,Y1,'poly2');
resid = output.residuals;
however, that will only contain the residuals of the data used for fitting (blue dots). You need the residuals for all of the dots so you'll need to compute the residuals.
[pop,gof,output] = fit(X1,Y1,'poly2');
resids = pop(XALL) - YALL;
where XALL and YALL are the coordinates to all of the points. pop(XALL) produces the y-estimates. Subtract the actual y coordinates from the y-estimates to get the residuals.
For thresholding, use absolute values of the residuals.
Izem on 25 Aug 2020
Alright! thank you !

Image Analyst on 25 Aug 2020
I'd use fitPolynomialRANSAC(), if you have the Computer Vision Toolbox. Train it with values less than 1 and fit a second or third order polynomial. RANSAC had the advantage over "dumb" regressions in that it can fit a polynomial through extremely noisy neighborhoods. So the points in that shotgun blast cluster on the right won't be all in one class.

#### 1 Comment

Izem on 25 Aug 2020