How to remove the outliers
12 views (last 30 days)
Show older comments
I have a sequence data and I assumed there are some outliers which us plott in my excel in the red colour of shading. I attach the xfiles of my data.
I have a question about how function of the matlab can detect dan delete those data in the red shading.
If any one can help, I would be appreciated.
Thanks
0 Comments
Accepted Answer
Steven Lord
on 11 Jul 2019
3 Comments
Jon
on 11 Jul 2019
Maybe you are running an old version of MATLAB that does not have the filloutliers function.
filloutliers was introduced in MATLAB version 2017A
What version of MATLAB are you running? To find out you can type the ver command.
In the future it is good to use the code button in the MATLAB answers toolbar for inserting code. That way it comes out nicely formatted and is easier to read, use and or copy.
More Answers (1)
Jon
on 11 Jul 2019
Edited: Jon
on 11 Jul 2019
Since you do not have filloutliers and rmoutliers in your version of MATLAB
I would first recommend updating to a more recent version of MATLAB if possible as there have been many advances since 2013.
If that is not possible, you can look at the documentation in the link that Steven provided.
It gives MATLAB's default definition of an outlier as:
Outliers are defined as elements more than three scaled MAD from the median. The scaled MAD is defined as c*median(abs(A-median(A))), where c=-1/(sqrt(2)*erfcinv(3/2)).
So you could easily implement this in your code. For example if you had a vectors x and y and you wanted to make a plot with the outliers removed you could do the following
isOutlier = abs(y) > -3/(sqrt(2)*erfcinv(3/2))*median(abs(y - median(y)))
plot(x(~isOutlier),y(~isOutlier))
I would recommend though implementing isOutlier as a small function, so you don't have to keep repeating this code.
Another simple way to remove outliers is to sort your data, using the sort command, and then removing the first and last n values from the sorted listed, where you choose n according to how conservative you want to be with the outlier removal. so for example, given vectors x and y and n = 5.
You could implement this with something like
n = 5;
[ySrt,iSrt] = sort(y)
iKeep = iSrt(n:length(y)-n)
plot(x(iKeep),y(iKeep))
Note that n/length(y) is the fraction of data that you are discarding as outliers at the top and the bottom of the sorted list. So you might want to choose n so that n/length(y) is approximately 0.025, and thus you would be keeping 100*( 1- 2*0.025) = 95% of your data and considering the other extremes as outlier.
This method although simple, of course assumes you usually have some outliers at the extremes, otherwise you are just throwing away good data even though it is at the lower and upper end of the sorted list.
2 Comments
Jon
on 12 Jul 2019
Edited: Jon
on 12 Jul 2019
Glad to hear it is working now. If you feel like the question is answered it would be good to "accept" it so that if someone else has the same issue they can see that there is an answer available. If you are still waiting to see if there other approaches then you should leave it open.
See Also
Categories
Find more on Data Preprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!