This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.


Exclude data from fit


outliers = excludedata(xdata,ydata,MethodName,MethodValue)


outliers = excludedata(xdata,ydata,MethodName,MethodValue) identifies data to be excluded from a fit using the specified MethodName and MethodValue. outliers is a logical vector, with 1 marking predictors (xdata) to exclude and 0 marking predictors to include. Supported MethodName and MethodValue pairs are given in the table below.

You can use the output outliers as an input to the fit function in the Exclude name-value pair argument. You can alternatively use the Exclude argument to specify excluded data as:

  1. An expression describing a logical vector, e.g., x > 10.

  2. A vector of integers indexing the points you want to exclude, e.g., [1 10 25].




A four-element vector specifying the edges of a closed box in the xy-plane, outside of which data is to be excluded from a fit. The vector has the form [xmin xmax ymin ymax].


A two-element vector specifying the endpoints of a closed interval on the x-axis, outside of which data is to be excluded from a fit. The vector has the form [xmin xmax].


A vector of indices specifying the data points to be excluded.


A two-element vector specifying the endpoints of a closed interval on the y-axis, outside of which data is to be excluded from a fit. The vector has the form [ymin ymax].


Load the vote counts and county names for the state of Florida from the 2000 U.S. presidential election:

load flvote2k

Use the vote counts for the two major party candidates, Bush and Gore, as predictors for the vote counts for third-party candidate Buchanan, and plot the scatters:

hold on
legend('Bush data','Gore data')

Assume a model where a fixed proportion of Bush or Gore voters choose to vote for Buchanan:

f = fittype({'x'})
f =
     Linear model:
       f(a,x) = a*x

Exclude the data from absentee voters, who did not use the controversial “butterfly” ballot:

absentee = find(strcmp(counties,'Absentee Ballots'));
nobutterfly = excludedata(bush,buchanan,...

Perform a bisquare weights robust fit of the model to the two data sets, excluding absentee voters:

bushfit = fit(bush,buchanan,f,...
gorefit = fit(gore,buchanan,f,...

Robust fits give outliers a low weight, so large residuals from a robust fit can be used to identify the outliers:

hold on

The residuals in the plot above can be computed as follows:

bushres = buchanan - feval(bushfit,bush);
goreres = buchanan - feval(gorefit,gore);

Large residuals can be identified as those outside the range [-500 500]:

bushoutliers = excludedata(bush,bushres,...
                           'range',[-500 500]);
goreoutliers = excludedata(gore,goreres,...
                           'range',[-500 500]);

The outliers for the two data sets correspond to the following counties:

ans = 
    'Palm Beach'

ans = 
    'Palm Beach'

Miami-Dade and Broward counties correspond to the largest predictor values. Palm Beach county, the only county in the state to use the “butterfly” ballot, corresponds to the largest residual values.


You can combine data exclusion rules using logical operators. For example, to exclude data inside the box [-1 1 -1 1] or outside the domain [-2 2], use:

outliers1 = excludedata(xdata,ydata,'box',[-1 1 -1 1]);
outliers2 = excludedata(xdata,ydata,'domain',[-2 2]);
outliers = ~outliers1|outliers2;

You can visualize the combined exclusion rule using random data:

xdata = -3 + 6*rand(1,1e4);
ydata = -3 + 6*rand(1,1e4);
axis ([-3 3 -3 3])
axis square

See Also


Introduced before R2006a