Framework for Ensemble Learning
Using various methods, you can meld results from many weak learners into one high-quality ensemble predictor. These methods closely follow the same syntax, so you can try different methods with minor changes in your commands.
You can create an ensemble for classification by using fitcensemble
or for regression by using fitrensemble
.
To train an ensemble for classification using fitcensemble
, use this syntax.
ens = fitcensemble(X,Y,Name,Value)
X
is the matrix of data. Each row contains one observation, and each column contains one predictor variable.Y
is the vector of responses, with the same number of observations as the rows inX
.Name,Value
specify additional options using one or more name-value pair arguments. For example, you can specify the ensemble aggregation method with the'Method'
argument, the number of ensemble learning cycles with the'NumLearningCycles'
argument, and the type of weak learners with the'Learners'
argument. For a complete list of name-value pair arguments, see thefitcensemble
function page.
This figure shows the information you need to create a classification ensemble.
Similarly, you can train an ensemble for regression by using fitrensemble
, which follows the same syntax as fitcensemble
. For details on the input arguments and name-value pair arguments, see the fitrensemble
function page.
For all classification or nonlinear regression problems, follow these steps to create an ensemble:
Prepare the Predictor Data
All supervised learning methods start with predictor data, usually called X
in this documentation. X
can be stored in a matrix or a table. Each row of X
represents one observation, and each column of X
represents one variable or predictor.
Prepare the Response Data
You can use a wide variety of data types for the response data.
For regression ensembles,
Y
must be a numeric vector with the same number of elements as the number of rows ofX
.For classification ensembles,
Y
can be a numeric vector, categorical vector, character array, string array, cell array of character vectors, or logical vector.For example, suppose your response data consists of three observations in the following order:
true
,false
,true
. You could expressY
as:[1;0;1]
(numeric vector)categorical({'true','false','true'})
(categorical vector)[true;false;true]
(logical vector)['true ';'false';'true ']
(character array, padded with spaces so each row has the same length)["true","false","true"]
(string array){'true','false','true'}
(cell array of character vectors)
Use whichever data type is most convenient. Because you cannot represent missing values with logical entries, do not use logical entries when you have missing values in
Y
.
fitcensemble
and fitrensemble
ignore missing values in Y
when creating an ensemble. This table contains the method of including missing entries.
Data Type | Missing Entry |
---|---|
Numeric vector | NaN |
Categorical vector | <undefined> |
Character array | Row of spaces |
String array | <missing> or "" |
Cell array of character vectors | '' |
Logical vector | (not possible to represent) |
Choose an Applicable Ensemble Aggregation Method
To create classification and regression ensembles with fitcensemble
and fitrensemble
, respectively, choose appropriate algorithms from this list.
For classification with two classes:
'AdaBoostM1'
— Adaptive boosting'LogitBoost'
— Adaptive logistic regression'GentleBoost'
— Gentle adaptive boosting'RobustBoost'
— Robust boosting (requires Optimization Toolbox™)'LPBoost'
— Linear programming boosting (requires Optimization Toolbox)'TotalBoost'
— Totally corrective boosting (requires Optimization Toolbox)'RUSBoost'
— Random undersampling boosting'Subspace'
— Random subspace'Bag'
— Bootstrap aggregation (bagging)
For classification with three or more classes:
'AdaBoostM2'
— Adaptive boosting'LPBoost'
— Linear programming boosting (requires Optimization Toolbox)'TotalBoost'
— Totally corrective boosting (requires Optimization Toolbox)'RUSBoost'
— Random undersampling boosting'Subspace'
— Random subspace'Bag'
— Bootstrap aggregation (bagging)
For regression:
'LSBoost'
— Least-squares boosting'Bag'
— Bootstrap aggregation (bagging)
For descriptions of the various algorithms, see Ensemble Algorithms.
See Suggestions for Choosing an Appropriate Ensemble Algorithm.
This table lists characteristics of the various algorithms. In the table titles:
Imbalance — Good for imbalanced data (one class has many more observations than the other)
Stop — Algorithm self-terminates
Sparse — Requires fewer weak learners than other ensemble algorithms
Algorithm | Regression | Binary Classification | Multiclass Classification | Class Imbalance | Stop | Sparse |
---|---|---|---|---|---|---|
Bag | × | × | × | |||
AdaBoostM1 | × | |||||
AdaBoostM2 | × | |||||
LogitBoost | × | |||||
GentleBoost | × | |||||
RobustBoost | × | |||||
LPBoost | × | × | × | × | ||
TotalBoost | × | × | × | × | ||
RUSBoost | × | × | × | |||
LSBoost | × | |||||
Subspace | × | × |
RobustBoost
, LPBoost
, and TotalBoost
require an Optimization Toolbox license. Try TotalBoost
before LPBoost
, as TotalBoost
can be more robust.
Suggestions for Choosing an Appropriate Ensemble Algorithm
Regression — Your choices are
LSBoost
orBag
. See General Characteristics of Ensemble Algorithms for the main differences between boosting and bagging.Binary Classification — Try
AdaBoostM1
first, with these modifications:Data Characteristic Recommended Algorithm Many predictors Subspace
Skewed data (many more observations of one class) RUSBoost
Label noise (some training data has the wrong class) RobustBoost
Many observations Avoid LPBoost
andTotalBoost
Multiclass Classification — Try
AdaBoostM2
first, with these modifications:Data Characteristic Recommended Algorithm Many predictors Subspace
Skewed data (many more observations of one class) RUSBoost
Many observations Avoid LPBoost
andTotalBoost
For details of the algorithms, see Ensemble Algorithms.
General Characteristics of Ensemble Algorithms
Boost
algorithms generally use very shallow trees. This construction uses relatively little time or memory. However, for effective predictions, boosted trees might need more ensemble members than bagged trees. Therefore it is not always clear which class of algorithms is superior.Bag
generally constructs deep trees. This construction is both time consuming and memory-intensive. This also leads to relatively slow predictions.Bag
can estimate the generalization error without additional cross validation. SeeoobLoss
.Except for
Subspace
, all boosting and bagging algorithms are based on decision tree learners.Subspace
can use either discriminant analysis or k-nearest neighbor learners.
For details of the characteristics of individual ensemble members, see Characteristics of Classification Algorithms.
Set the Number of Ensemble Members
Choosing the size of an ensemble involves balancing speed and accuracy.
Larger ensembles take longer to train and to generate predictions.
Some ensemble algorithms can become overtrained (inaccurate) when too large.
To set an appropriate size, consider starting with several dozen to several hundred members in an ensemble, training the ensemble, and then checking the ensemble quality, as in Test Ensemble Quality. If it appears that you need more members, add them using the resume
method (classification) or the resume
method (regression). Repeat until adding more members does not improve ensemble quality.
Tip
For classification, the LPBoost
and TotalBoost
algorithms are self-terminating, meaning you do not have to investigate the appropriate ensemble size. Try setting NumLearningCycles
to 500
. The algorithms usually terminate with fewer members.
Prepare the Weak Learners
Currently the weak learner types are:
'Discriminant'
(recommended forSubspace
ensemble)'KNN'
(only forSubspace
ensemble)'Tree'
(for any ensemble exceptSubspace
)
There are two ways to set the weak learner type in an ensemble.
To create an ensemble with default weak learner options, specify the value of the
'Learners'
name-value pair argument as the character vector or string scalar of the weak learner name. For example:ens = fitcensemble(X,Y,'Method','Subspace', ... 'NumLearningCycles',50,'Learners','KNN'); % or ens = fitrensemble(X,Y,'Method','Bag', ... 'NumLearningCycles',50,'Learners','Tree');
To create an ensemble with nondefault weak learner options, create a nondefault weak learner using the appropriate
template
method.For example, if you have missing data, and want to use classification trees with surrogate splits for better accuracy:
templ = templateTree('Surrogate','all'); ens = fitcensemble(X,Y,'Method','AdaBoostM2', ... 'NumLearningCycles',50,'Learners',templ);
To grow trees with leaves containing a number of observations that is at least 10% of the sample size:
templ = templateTree('MinLeafSize',size(X,1)/10); ens = fitcensemble(X,Y,'Method','AdaBoostM2', ... 'NumLearningCycles',50,'Learners',templ);
Alternatively, choose the maximal number of splits per tree:
templ = templateTree('MaxNumSplits',4); ens = fitcensemble(X,Y,'Method','AdaBoostM2', ... 'NumLearningCycles',50,'Learners',templ);
You can also use nondefault weak learners in
fitrensemble
.
While you can give fitcensemble
and fitrensemble
a cell array of learner templates, the most common usage is to give just one weak learner template.
For examples using a template, see Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles and Surrogate Splits.
Decision trees can handle NaN
values in X
. Such values are called “missing”. If you have some missing values in a row of X
, a decision tree finds optimal splits using nonmissing values only. If an entire row consists of NaN
, fitcensemble
and fitrensemble
ignore that row. If you have data with a large fraction of missing values in X
, use surrogate decision splits. For examples of surrogate splits, see Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles and Surrogate Splits.
Common Settings for Tree Weak Learners
The depth of a weak learner tree makes a difference for training time, memory usage, and predictive accuracy. You control the depth these parameters:
MaxNumSplits
— The maximal number of branch node splits isMaxNumSplits
per tree. Set large values ofMaxNumSplits
to get deep trees. The default for bagging issize(X,1) - 1
. The default for boosting is1
.MinLeafSize
— Each leaf has at leastMinLeafSize
observations. Set small values ofMinLeafSize
to get deep trees. The default for classification is1
and5
for regression.MinParentSize
— Each branch node in the tree has at leastMinParentSize
observations. Set small values ofMinParentSize
to get deep trees. The default for classification is2
and10
for regression.
If you supply both
MinParentSize
andMinLeafSize
, the learner uses the setting that gives larger leaves (shallower trees):MinParent = max(MinParent,2*MinLeaf)
If you additionally supply
MaxNumSplits
, then the software splits a tree until one of the three splitting criteria is satisfied.Surrogate
— Grow decision trees with surrogate splits whenSurrogate
is'on'
. Use surrogate splits when your data has missing values.Note
Surrogate splits cause slower training and use more memory.
PredictorSelection
—fitcensemble
,fitrensemble
, andTreeBagger
grow trees using the standard CART algorithm [1] by default. If the predictor variables are heterogeneous or there are predictors having many levels and other having few levels, then standard CART tends to select predictors having many levels as split predictors. For split-predictor selection that is robust to the number of levels that the predictors have, consider specifying'curvature'
or'interaction-curvature'
. These specifications conduct chi-square tests of association between each predictor and the response or each pair of predictors and the response, respectively. The predictor that yields the minimal p-value is the split predictor for a particular node. For more details, see Choose Split Predictor Selection Technique.Note
When boosting decision trees, selecting split predictors using the curvature or interaction tests is not recommended.
Call fitcensemble
or fitrensemble
The syntaxes for fitcensemble
and fitrensemble
are identical. For fitrensemble
, the syntax is:
ens = fitrensemble(X,Y,Name,Value)
X
is the matrix of data. Each row contains one observation, and each column contains one predictor variable.Y
is the responses, with the same number of observations as rows inX
.Name,Value
specify additional options using one or more name-value pair arguments. For example, you can specify the ensemble aggregation method with the'Method'
argument, the number of ensemble learning cycles with the'NumLearningCycles'
argument, and the type of weak learners with the'Learners'
argument. For a complete list of name-value pair arguments, see thefitrensemble
function page.
The result of fitrensemble
and fitcensemble
is an ensemble object, suitable for making predictions on new data. For a basic example of creating a regression ensemble, see Train Regression Ensemble. For a basic example of creating a classification ensemble, see Train Classification Ensemble.
Where to Set Name-Value Pairs
There are several name-value pairs you can pass to fitcensemble
or fitrensemble
, and several that apply to the weak learners (templateDiscriminant
, templateKNN
, and templateTree
). To determine which name-value pair argument is appropriate, the ensemble or the weak learner:
Use template name-value pairs to control the characteristics of the weak learners.
Use
fitcensemble
orfitrensemble
name-value pair arguments to control the ensemble as a whole, either for algorithms or for structure.
For example, for an ensemble of boosted classification trees with each tree deeper than the default, set the templateTree
name-value pair arguments MinLeafSize
and MinParentSize
to smaller values than the defaults. Or, MaxNumSplits
to a larger value than the defaults. The trees are then leafier (deeper).
To name the predictors in a classification ensemble (part of the structure of the ensemble), use the PredictorNames
name-value pair in fitcensemble
.
References
[1] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall, 1984.
See Also
fitcensemble
| fitrensemble
| oobLoss
| resume
| resume
| templateDiscriminant
| templateKNN
| templateTree