Regression Outliers

Removes outliers from X and Y variables based on regression residuals
970 Downloads
Updated 18 Jun 2012

View License

This function accepts two (vector of) variables for which a bivariate linear regression analysis is meant to be performed, and removes the outliers from both variables. Since the regression residual vector is used to detect the outliers, only those records which stand farthest from the 1:1 regression line will be detected and removed. If more than one outliers is asked to be removed, before removing the next outlier, regression residuals will be recalculated to avoid swamping and masking effects, then the next farthest point from 1:1 line will be removed and so forth. This method differentiates those points that might be outlier in a single variable (X or Y) but can fit well in a 1:1 regression line-fit from those points that stay in the acceptable range in each of the individual input variables (X,Y) but can appear in the outliers when the two variables are fitted in the regression line. To detect the outlier from the residual's vector, a subfunction is used (this subfunction is an enhancement from a work by Vince Petaccio, 2009, and is available also as a stand-alone function, "outliers", from Matlab File exchange).

--Inputs:
X0: vector of dependent variable in bivariate linear regression
Y0: vector of independent variable in bivariate linear regression
noutliers: how many outliers should be removed? (1 will be used as default if not provided)
plotOp: plotting option, whether to produce a scatterplot of the two input variables before and after each iteration of outliers removal (up to noutliers) or only do calculations (0: don’t plot, 1: plot), if 1 is given, plots will be generated in a subplot

--Outputs:
X: vector of dependent variable after removal of the outliers
Y: vector of independent variable after removal of the outliers
rSquares: a vector of r-square values calculated from the original inputs and after removal of each outlier
outliers_idx: indexes of outliers, note that records for these indexes are turned to NaN in X and Y outputs

--Dependency:
outliers subfunction, which is included in this code following main function

--Example:
X0=10.2:0.2:30; first vector
Y0=0.1:0.1:10; second vector
idx=randi(length(Y0),4,1); %randomly distribute 4 noise
Y0(idx)=randn(4,1)*10; %produce 4 random noise
noutliers=3; %number of outliers to remove
plotOp=1; %0: dont plot, 1: plot
[X,Y,rSquares,outliers_idx]=regoutliers(X0,Y0,noutliers,plotOp);
rSquares %print rsquare values calculated from original %
%data and each step after removal of outliers, this
%should show progressively increasing values, otherwise
%number of outliers to be removed should be decreased or
%in some cases increased.
outliers_idx %print indexes of outliers in both input vectors

Cite As

M Sohrabinia (2024). Regression Outliers (https://www.mathworks.com/matlabcentral/fileexchange/37212-regression-outliers), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2008b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Acknowledgements

Inspired by: Remove Outliers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.0.0.0