sequentialfs
Sequential feature selection using custom criterion
Syntax
Description
selects a subset of features in tf
= sequentialfs(fun
,X
,y
)X
that are important for predicting
y
. The function defines a random nonstratified partition for 10-fold
cross-validation using X
and y
, and then
sequentially selects features based on the cross-validate prediction criterion values
computed by the fun
function. The initial feature set includes no
features. sequentialfs
adds one feature to the set at each iteration,
until adding a feature does not decrease the criterion value by greater than the termination
tolerance value. The output tf
is a logical vector that indicates the
selected features. For more details, see Algorithms.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example, specify
tf
= sequentialfs(___,Name,Value
)"Direction","backward"
to perform recursive feature elimination (RFE).
The initial feature set includes all features. sequentialfs
removes one
feature from the set at each iteration, until removing a feature does not decrease the
prediction criterion.
Examples
Forward Feature Selection
Find important features by performing forward sequential feature selection using the wrapper type.
Load the fisheriris
data set.
load fisheriris
Display the variables in the data set.
whos
Name Size Bytes Class Attributes meas 150x4 4800 double species 150x1 18100 cell
The matrix meas
contains four measurements from three species of iris flowers for 150 different flowers. The variable species
lists the species for each flower.
Specify the predictor data X
and the response data y. Define X
to include the four measurements and six random variables. Place the measurement variables in columns 1, 3, 5, and 7.
rng("default") % For reproducibility X = randn(150,10); X(:,[1 3 5 7])= meas; y = species;
Define the function handle myfun
for an anonymous function that takes four inputs: training data (XTrain
and yTrain
) and test data (XTest
and yTest
). The anonymous function trains a classification model by using the training data, and returns a loss value on the test data for the trained model.
myfun = @(XTrain,yTrain,XTest,yTest) ...
size(XTest,1)*loss(fitcecoc(XTrain,yTrain),XTest,yTest);
The loss
function of a classification model object returns an average loss value, but sequentialfs
also divides the sum of the criterion values returned by myfun
by the total number of test observations. Therefore, the anonymous function must return the loss value multiplied by the number of test observations.
Create a random partition for stratified 10-fold cross-validation.
cv = cvpartition(y,"KFold",10);
Use the sequentialfs
function to sequentially select important features in X
based on the criterion value returned by myfun
. Specify to use the stratified partition cv
, and set the iteration option to display information about the feature selection process at each iteration.
opts = statset("Display","iter"); tf = sequentialfs(myfun,X,y,"CV",cv,"Options",opts);
Start forward sequential feature selection: Initial columns included: none Columns that can not be included: none Step 1, added column 7, criterion value 0.04 Step 2, added column 5, criterion value 0.0333333 Step 3, added column 1, criterion value 0.0266667 Step 4, added column 3, criterion value 0.0133333 Final columns included: 1 3 5 7
sequentialfs
correctly finds the important predictors in columns 1, 3, 5, and 7.
Backward Feature Selection
Find important features by performing backward sequential feature selection, or recursive feature elimination (RFE), using the wrapper type.
Load the hald
data set, which measures the effect of cement composition on its hardening heat.
load hald
This data set includes the variables ingredients
and heat
. The matrix ingredients
contains the percent composition of four chemicals present in the cement. The vector heat
contains the values for the heat hardening after 180 days for each cement sample.
Use the sequentialfs
function to perform backward sequential feature selection based on the criterion value returned by myfun
. The code for the helper function myfun
appears at the end of this example. Specify the Direction
name-value argument as "backward"
to include all features in the initial feature set and then sequentially exclude one feature at each iteration. Set the iteration option to display information about the feature selection process at each iteration.
rng("default") % For reproducibility opts = statset("Display","iter"); tf = sequentialfs(@myfun,ingredients,heat, ... "Direction","backward","Options",opts);
Start backward sequential feature selection: Initial columns included: all Columns that must be included: none Step 1, used initial columns, criterion value 12.4989 Step 2, removed column 3, criterion value 6.25866 Final columns included: 1 2 4
sequentialfs
excludes the third variable from the features in ingredients
.
Helper Function
The myfun
function takes four inputs: training data (XTrain
and yTrain
) and test data (XTest
and yTest
). The function trains a regression model by using the training data, and returns the sum of squared errors on the test data for the trained model.
function criterion = myfun(XTrain,yTrain,XTest,yTest) mdl = fitrlinear(XTrain,yTrain); predictedYTest = predict(mdl,XTest); e = yTest - predictedYTest; criterion = e'*e; end
Filter Type Feature Selection
Perform filter type feature selection based on the correlation coefficients for the features.
Load the carsmall
data set.
load carsmall
Create the feature matrix X
containing six variables.
X = [Acceleration Cylinders Displacement ...
Horsepower Model_Year Weight];
Compute the matrix of the pairwise linear correlation coefficients between each pair of features in X
by using the corr
function. Specify the Rows
name-value argument as "pairwise"
to omit any rows containing NaN
on a pairwise basis for each two-column correlation coefficient calculation.
corr(X,"Rows","pairwise")
ans = 6×6
1.0000 -0.6473 -0.6947 -0.6968 0.4843 -0.4879
-0.6473 1.0000 0.9512 0.8622 -0.6053 0.8844
-0.6947 0.9512 1.0000 0.9134 -0.5779 0.8895
-0.6968 0.8622 0.9134 1.0000 -0.6082 0.8733
0.4843 -0.6053 -0.5779 -0.6082 1.0000 -0.4964
-0.4879 0.8844 0.8895 0.8733 -0.4964 1.0000
X
contains highly correlated features. For example, the correlation between the second and third features (Cylinders
and Displacement
) is 0.9512.
Use the sequentialfs
function to rank the features in X
based on the correlation values. Specify these options when you call the sequentialfs
function:
Use the helper function
mycorr
, which returns the maximum absolute value of the off-diagonal elements in the matrix of correlation coefficients. The code for this helper function appears at the end of this example.Specify
"Direction","backward"
and"NullModel",true
so thatsequentialfs
starts from the initial feature set containing all features and then excludes all features from the set, one feature at a time.Specify
"CV","none"
to perform feature selection without cross-validation.Set the iteration option to display information about the feature selection process at each iteration.
opts = statset("Display","iter"); [~,history] = sequentialfs(@mycorr,X, ... "Direction","backward","NullModel",true, ... "CV","none","Options",opts);
Start backward sequential feature selection: Initial columns included: all Columns that must be included: none Step 1, used initial columns, criterion value 0.951167 Step 2, removed column 3, criterion value 0.884401 Step 3, removed column 6, criterion value 0.862164 Step 4, removed column 4, criterion value 0.647346 Step 5, removed column 2, criterion value 0.484253 Step 6, removed column 1, criterion value 0 Step 7, removed column 5, criterion value 0 Final columns included: none
sequentialfs
returns the structure array history
with two fields (In
and Crit
) containing information about the feature selection process. The In
field contains a logical matrix where row i
indicates the features selected at iteration i
. A true
(logical 1
) entry in a row indicates that the corresponding feature is in the feature set after the iteration.
history.In
ans = 7x6 logical array
1 1 1 1 1 1
1 1 0 1 1 1
1 1 0 1 1 0
1 1 0 0 1 0
1 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 0 0
The Crit
field contains the criterion values computed at each iteration.
history.Crit
ans = 1×7
0.9512 0.8844 0.8622 0.6473 0.4843 0 0
The last two criterion values are zero because the mycorr
function returns 0 if the input contains fewer than two features.
Extract the indices of the excluded features from the matrix in the In
field.
p = size(X,2); idx = NaN(1,p); for i = 1 : p idx(i) = find(history.In(i,:)~=history.In(i+1,:)); end idx
idx = 1×6
3 6 4 2 1 5
Find the set of features whose criterion value is less than 0.8.
threshold = 0.8; iter_last_exclude = find(history.Crit(2:end)<threshold,1); idx_selected = idx(iter_last_exclude+1:end)
idx_selected = 1×3
2 1 5
Compute the correlation coefficient matrix for the selected features.
corr(X(:,idx_selected),"Rows","pairwise")
ans = 3×3
1.0000 -0.6473 -0.6053
-0.6473 1.0000 0.4843
-0.6053 0.4843 1.0000
The absolute values of the off-diagonal elements are less than the threshold value 0.8.
Helper Function
The mycorr
function takes a matrix that contains features in columns, and returns the maximum absolute value of the off-diagonal elements in the matrix of correlation coefficients. The off-diagonal elements are the correlations between two distinct features in the input data. Therefore, mycorr
returns zero if the input data does not have at least two distinct features.
function criterion = mycorr(X) if size(X,2) < 2 criterion = 0; else p = size(X,2); R = corr(X,"Rows","pairwise"); R(logical(eye(p))) = NaN; criterion = max(abs(R),[],"all"); end end
Select Features in Table
Convert a table that contains both numeric and categorical variables to an array by using the onehotencode
and table2array
functions. Then, select important features in the array by using the sequentialfs
function.
Load the carbig
data set.
load carbig
This data set contains variables that describe several aspects of cars, such as miles per gallon (MPG
), country of origin (Origin
), and number of cylinders (Cylinders
). You can create a regression model of MPG
using the other variables.
Specify the predictor data tblX
in a table, and specify the response data y
.
tblX = table(Acceleration,Cylinders,Displacement, ...
Horsepower,Model_Year,Weight,Origin);
y = MPG;
All variables in tblX
are numeric except the Origin
variable.
One-hot encode the Origin
variable by using the onehotencode
function.
tblOrigin = table(categorical(string(Origin))); tblOrigin = onehotencode(tblOrigin);
Remove the Origin
variable from tblX
, and add the encoded values to tblX
.
tblX.Origin = []; tblX = [tblX tblOrigin];
Convert the table tblX
to an array.
X = table2array(tblX);
Define the function handle myfun
for an anonymous function that takes four inputs: training data (XTrain
and yTrain
) and test data (XTest
and yTest
). The anonymous function trains a regression model by using the training data, and returns a loss value on the test data for the trained model.
myfun = @(XTrain,yTrain,XTest,yTest) ...
size(XTest,1)*loss(fitrtree(XTrain,yTrain),XTest,yTest);
The loss
function of a regression model object returns the mean squared error (MSE), but sequentialfs
also divides the sum of the criterion values returned by myfun
by the total number of test observations. Therefore, the anonymous function must return the loss value multiplied by the number of test observations.
Use the sequentialfs
function to sequentially select important features in X
based on the criterion value returned by myfun
.
rng("default") % For reproducibility tf = sequentialfs(myfun,X,y);
Display the variable names of the selected features.
tblX.Properties.VariableNames(tf)'
ans = 6x1 cell
{'Cylinders' }
{'Displacement'}
{'Model_Year' }
{'Weight' }
{'Germany' }
{'Italy' }
Input Arguments
fun
— Function to compute feature selection criterion
function handle
Function to compute the feature selection criterion, specified as a function handle.
For each candidate feature set, sequentialfs
computes the
cross-validated criterion value by repeatedly calling the fun
function as follows:
For each fold (a group of training and test data sets) defined by the
CV
name-value argument,sequentialfs
calls thefun
function to get the criterion value for the fold.sequentialfs
divides the sum of the criterion values by the total number of test observations.
If you specify X
and y
, then the
fun
function must have this form:
criterion = fun(XTrain,yTrain,XTest,yTest)
The
fun
function accepts the training data (XTrain
andyTrain
) and test data (XTest
andyTest
).XTrain
andXTest
contain a subset of the columns ofX
that corresponds to the current candidate feature set.The
fun
function returns a scalar valuecriterion
.Typically,
fun
trains a model by using the training data (XTrain
,yTrain
), predicts response values forXTest
, and returns a loss of the predicted values compared toyTest
. Common loss measures include the sum of squared errors for regression models and the number of misclassified observations for classification models.For example, you can define the
myFun
function as follows, and then specifyfun
as@myFun
.function criterion = myFun(XTrain,yTrain,XTest,yTest) mdl = fitcsvm(XTrain,yTrain); predictedYTest = predict(mdl,XTest); criterion = sum(~strcmp(yTest,predictedYTest)); end
Alternatively, you can define the function handle
myFunHandle
for an anonymous function as follows, and then specifyfun
asmyFunHandle
.myFunHandle = @(XTrain,yTrain,XTest,yTest) ... loss(fitcsvm(XTrain,yTrain),XTest,yTest)*size(XTest,1);
sequentialfs
divides the sum of the criterion values returned byfun
by the total number of test observations. So,fun
must not divide the loss value by the number of test observations. Theloss
function of a classification or regression object returns an averaged loss value. Therefore,fun
must return the loss value multiplied by the number of test observations. If you define thefun
function to return the sum of squared errors or the number of misclassified observations, then the cross-validated criterion value is the mean squared error or the misclassification rate, respectively.
If you specify X1,...,XN
, sequentialfs
selects features from X1
only, but otherwise imposes no
interpretation on X1,...,XN
. The function fun
still must have this form:
criterion = fun(X1Train,⋯,XNTrain,X1Test,⋯,XNTest)
The
fun
function accepts the training data (X1Train
,…,XNTrain
) and test data (X1Test
,…,XNTest
).X1Train
andX1Test
contain a subset of the columns ofX1
that corresponds to the current candidate feature set.The
fun
function returns a scalar valuecriterion
.
Data Types: function_handle
X
— Feature data
numeric matrix
Feature data, specified as a numeric matrix. The rows of X
correspond to observations, and the columns of X
correspond to
features. X
and y
must have the same number of
rows.
The custom function defined by the fun
argument must accept a
group of training and test data sets defined by splitting X
. For
details, see the fun
argument and CV
name-value argument.
Data Types: single
| double
y
— Responses (labels)
column vector
Responses (labels), specified as a column vector. X
and
y
must have the same number of rows.
The custom function defined by the fun
argument must accept a
group of training and test data sets defined by splitting y
. For
details, see the fun
argument and CV
name-value argument.
Data Types: single
| double
| logical
| char
| string
| cell
| categorical
X1,...,XN
— Input data
matrices
Input data, specified as matrices. The matrices must have the same number of rows.
sequentialfs
selects features from X1
only,
but otherwise imposes no interpretation on X1,...,XN
.
The custom function defined by the fun
argument must accept a
group of training and test data sets defined by splitting
X1,...,XN
. For details, see the fun
argument
and CV
name-value argument.
Data Types: single
| double
| logical
| char
| string
| cell
| categorical
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: KeepIn=[1 0 0 0],KeepOut=[0 0 0 1]
always includes the first
feature and excludes the last feature.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: "KeepIn",[1 0 0 0],"KeepOut",[0 0 0 1]
CV
— Cross-validation option
10 (default) | positive integer | cvpartition
object | "resubstitution"
| "none"
Cross-validation option to compute the criterion for each candidate feature
subset, specified as a positive integer, cvpartition
object, "resubstitution"
, or
"none"
.
For each candidate feature subset, sequentialfs
uses the
partition specified by this argument to cross-validate the criterion value returned by
the fun
function.
Positive integer
k
—sequentialfs
uses a random nonstratified partition fork
-fold cross-validation.cvpartition
object —sequentialfs
uses a partition specified in thecvpartition
object. You can specify a stratified partition, a partition for holdout validation, or a partition for leave-one-out cross-validation. For details, seecvpartition
."resubstitution"
—sequentialfs
does not partition the input data. Both the training set and the test set contain all of the original observations. For example, if you specifyX
andy
, thensequentialfs
callsfun
ascriterion = fun(X,y,X,y)
."none"
—sequentialfs
does not validate the criterion value and callsfun
ascriterion = fun(X,y)
, without separating the training and test sets.
Example: "CV","none"
MCReps
— Number of Monte Carlo repetitions for cross-validation
1
(default) | positive integer
Number of Monte Carlo repetitions for cross-validation, specified as a positive integer.
If you specify a positive integer greater than 1
,
sequentialfs
repeats the cross-validation computation for the
specified number of repetitions for each candidate feature subset.
If CV
is "none"
,
"resubstitution"
, a cvpartition
object of type
"resubstitution"
, a cvpartition
object of type
"leaveout"
, or a custom cvpartition
object (with
the IsCustom
property set to 1
), then the
software sets the MCReps
value to 1
.
Example: "MCReps",10
Data Types: single
| double
Direction
— Direction of sequential search
"forward"
(default) | "backward"
Direction of the sequential search, specified as "forward"
or
"backward"
.
"forward"
— The initial feature set includes no features, and thesequentialfs
function sequentially adds features to the set."backward"
— The initial feature set includes all features, and thesequentialfs
function sequentially removes features from the set. That is, thesequentialfs
function performs recursive feature elimination (RFE).
Example: "Direction","backward"
Data Types: char
| string
KeepIn
— Features to include
[]
(default) | logical vector | vector of positive integers
Features to include, specified as []
, a logical vector, or a
vector of positive integers.
By default, sequentialfs
examines all features for the
feature selection process. If you specify features to include using this argument,
sequentialfs
always includes the features in the candidate
feature sets. A true
entry in a logical vector or an index value in
a vector of positive integers indicates that the output argument
tf
must include the corresponding feature.
Example: "KeepIn",[1 0 0 0]
Data Types: logical
KeepOut
— Features to exclude
[]
(default) | logical vector | vector of positive integers
Features to exclude, specified as []
, a logical vector, or a
vector of positive integers.
By default, sequentialfs
examines all features for the
feature selection process. If you specify features to exclude using this argument,
sequentialfs
excludes the features from the candidate feature
sets. A true
entry in a logical vector or an index value in a
vector of positive integers indicates that the output argument tf
must exclude the corresponding feature.
Example: "KeepOut",[0 0 0 1]
Data Types: logical
NFeatures
— Number of features to select
[]
(default) | positive integer
Number of features to select, specified as []
or a positive
integer.
By default, sequentialfs
stops iterations when the function
satisfies one of the stopping criteria (MaxIter
or
TolFun
) specified by the Options
name-value
argument. If you specify the NFeatures
name-value argument as a
positive integer, sequentialfs
stops iterations after selecting
the specified number of features. This argument overrides other iteration
options.
Example: "NFeatures",2
Data Types: single
| double
NullModel
— Flag to include null model
false
or 0
(default) | true
or 1
Flag to include the null model (model containing no features), specified as a
logical 1
(true
) or 0
(false
).
If you specify true
, the sequentialfs
function includes the null model as a valid option for the output
tf
and computes the criterion value for the empty input data.
Therefore, the fun
function must be able to accept empty matrices
as input argument values.
Example: "NullModel",true
Data Types: logical
Options
— Options for iterations and parallel computation
statset("sequentialfs")
(default) | structure returned by statset
Options for the iterations and parallel computation, specified as a structure
returned by statset
.
This table lists the option fields and their values.
Field Name | Field Value | Default Value |
---|---|---|
Display | Level of display, specified as
| "off" |
MaxIter | Maximum number of iterations allowed, specified as a positive integer | Inf |
TolFun | Termination tolerance on the criterion value, specified as a positive scalar | 1e-6 if Direction is
"forward" ; 0 if
Direction is "backward" |
TolTypeFun | Type of the termination tolerance for the criterion value, specified as
"abs" (absolute tolerance) or "rel"
(relative tolerance) | "rel" |
UseParallel | Flag to run in parallel, specified as logical 1
(true ) or 0
(false ) | false |
UseSubstreams | Flag to run computations in a reproducible manner, specified as
logical To compute
reproducibly, set | false |
Streams | Random number streams, specified as a | MATLAB® default random number stream |
To compute in parallel, you need Parallel Computing Toolbox™.
Example: "Options",statset("Display","iter")
Data Types: struct
Output Arguments
tf
— Selected features
logical vector
Selected features, returned as a logical vector. A true
(logical
1
) entry indicates that the corresponding feature is
selected.
history
— History of feature selection process
structure
History of the feature selection process, returned as a structure array including
the In
and Crit
fields.
In
is a logical matrix in which rowi
indicates the features selected at iterationi
.Crit
is a vector containing the criterion values computed at each iteration.
More About
Feature Selection
Feature selection reduces the dimensionality of data by selecting only a subset of measured features (predictor variables) to create a model. Feature selection algorithms search for a subset of predictors that optimally models measured responses, subject to constraints such as required or excluded features and the size of the subset.
You can categorize feature selection algorithms into three types:
Filter type — The filter type feature selection algorithm measures feature importance based on the characteristics of the features, such as feature variance and feature relevance to the response. You select important features as part of a data preprocessing step and then train a model using the selected features. Therefore, filter type feature selection is uncorrelated to the training algorithm.
Wrapper type — The wrapper type feature selection algorithm starts training using a subset of features and then adds or removes a feature using a selection criterion. The selection criterion directly measures the change in model performance that results from adding or removing a feature. The algorithm repeats training and improving a model until its stopping criteria are satisfied.
Embedded type — The embedded type feature selection algorithm learns feature importance as part of the model learning process. Once you train a model, you obtain the importance of the features in the trained model. This type of algorithm selects features that work well with a particular learning process.
For more details, see Introduction to Feature Selection.
Algorithms
sequentialfs
sequentially selects features in X
by performing these steps:
Define a random nonstratified partition for 10-fold cross-validation on
n
observations, wheren
is the number of observations inX
.Initialize the selected feature set
S
as an empty set.For each feature xi in
X
, compute the cross-validated criterion value using thefun
function.Add the feature with the smallest criterion value to
S
.For each feature xi in
X\S
, define a candidate feature setC
i asS
∪{xi}. Compute the cross-validated criterion value usingfun
forC
i.Among the candidate sets (
C
is), select the set that reduces the criterion value the most, compared to the criterion value forS
. Add the feature corresponding to the selected candidate set toS
.Repeat steps 5 and 6 until adding a feature does not decrease the criterion value by greater than the termination tolerance value.
To customize the feature selection process, use the name-value arguments of
sequentialfs
.
You can specify cross-validation options by using the
CV
andMCReps
name-value arguments.For wrapper type feature selection, specify the arguments to cross-validate the criterion value for each candidate feature set. You can define the
fun
function to train a model and return a criterion value for the trained model. For an example, see Forward Feature Selection.For filter type feature selection, which does not involve cross-validation, specify
CV
as"none"
and use thefun
function to measure characteristics of the input data, such as correlation. For an example, see Filter Type Feature Selection.
To perform backward feature selection, or recursive feature elimination (RFE), specify the
Direction
name-value argument as"backward"
.sequentialfs
initializes the selected feature setS
as a set with all features, and then removes one feature at a time from the set.You can specify which features to always include or exclude, the number of features in the final selected feature set, and whether to consider a model with no features as a valid option. For details, see the
KeepIn
,KeepOut
,NFeatures
, andNullModel
name-value arguments.Use the
Options
name-value argument to specify options for the iterations and parallel computation. For example,Options,statset("TolFun",1e-2)
sets the iteration termination tolerance on the criterion value to1e-2
.
Extended Capabilities
Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.
To run in parallel, specify the Options
name-value argument in the call to
this function and set the UseParallel
field of the
options structure to true
using
statset
:
Options=statset(UseParallel=true)
For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).
Version History
Introduced in R2008a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)