MathWorks Machine Translation
The automated translation of this page is provided by a general purpose third party translator tool.
MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.
Fit binary classification decision tree for multiclass classification
returns
a fitted binary classification decision tree based on the input variables
(also known as predictors, features, or attributes) contained in the
table tree
= fitctree(Tbl
,ResponseVarName
)Tbl
and output (response or labels) contained
in ResponseVarName
. The returned binary tree splits
branching nodes based on the values of a column of Tbl
.
fits
a tree with additional options specified by one or more namevalue
pair arguments, using any of the previous syntaxes. For example, you
can specify the algorithm used to find the best split on a categorical
predictor, grow a crossvalidated tree, or hold out a fraction of
the input data for validation.tree
= fitctree(___,Name,Value
)
Grow a classification tree using the ionosphere
data set.
load ionosphere
tc = fitctree(X,Y)
tc = ClassificationTree ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' NumObservations: 351
You can control the depth of the trees using the MaxNumSplits
, MinLeafSize
, or MinParentSize
namevalue pair parameters. fitctree
grows deep decision trees by default. You can grow shallower trees to reduce model complexity or computation time.
Load the ionosphere
data set.
load ionosphere
The default values of the tree depth controllers for growing classification trees are:
n  1
for MaxNumSplits
. n
is the training sample size.
1
for MinLeafSize
.
10
for MinParentSize
.
These default values tend to grow deep trees for large training sample sizes.
Train a classification tree using the default values for tree depth control. Cross validate the model using 10fold cross validation.
rng(1); % For reproducibility MdlDefault = fitctree(X,Y,'CrossVal','on');
Draw a histogram of the number of imposed splits on the trees. Also, view one of the trees.
numBranches = @(x)sum(x.IsBranch); mdlDefaultNumSplits = cellfun(numBranches, MdlDefault.Trained); figure; histogram(mdlDefaultNumSplits) view(MdlDefault.Trained{1},'Mode','graph')
The average number of splits is around 15.
Suppose that you want a classification tree that is not as complex (deep) as the ones trained using the default number of splits. Train another classification tree, but set the maximum number of splits at 7, which is about half the mean number of splits from the default classification tree. Cross validate the model using 10fold cross validation.
Mdl7 = fitctree(X,Y,'MaxNumSplits',7,'CrossVal','on'); view(Mdl7.Trained{1},'Mode','graph')
Compare the cross validation classification errors of the models.
classErrorDefault = kfoldLoss(MdlDefault) classError7 = kfoldLoss(Mdl7)
classErrorDefault = 0.1140 classError7 = 0.1254
Mdl7
is much less complex and performs only slightly worse than MdlDefault
.
This example shows how to optimize hyperparameters automatically using fitctree
. The example uses Fisher's iris data.
Load Fisher's iris data.
load fisheriris
Optimize the crossvalidation loss of the classifier, using the data in meas
to predict the response in species
.
X = meas; Y = species; Mdl = fitctree(X,Y,'OptimizeHyperparameters','auto')
==================================================================================  Iter  Eval  Objective  Objective  BestSoFar  BestSoFar  MinLeafSize    result   runtime  (observed)  (estim.)   ==================================================================================  1  Best  0.33333  4.6938  0.33333  0.33333  49   2  Best  0.053333  1.0408  0.053333  0.070853  5   3  Accept  0.06  0.4202  0.053333  0.05335  1   4  Accept  0.053333  0.59433  0.053333  0.075203  17   5  Accept  0.053333  0.86431  0.053333  0.053317  19   6  Accept  0.053333  0.34218  0.053333  0.053307  2   7  Best  0.046667  0.20661  0.046667  0.048442  3   8  Accept  0.053333  0.40946  0.046667  0.046965  12   9  Accept  0.046667  0.45098  0.046667  0.046672  3   10  Accept  0.053333  0.36073  0.046667  0.046675  8   11  Accept  0.053333  0.34443  0.046667  0.046709  4   12  Accept  0.046667  0.27124  0.046667  0.046693  3   13  Accept  0.046667  0.30381  0.046667  0.046685  3   14  Accept  0.66667  0.24246  0.046667  0.046829  75   15  Accept  0.053333  0.3977  0.046667  0.046708  28   16  Accept  0.053333  0.24388  0.046667  0.046692  24   17  Accept  0.053333  0.42266  0.046667  0.046696  6   18  Accept  0.053333  0.26421  0.046667  0.046697  10   19  Accept  0.053333  0.38414  0.046667  0.046671  35   20  Accept  0.053333  0.32551  0.046667  0.046669  32  ==================================================================================  Iter  Eval  Objective  Objective  BestSoFar  BestSoFar  MinLeafSize    result   runtime  (observed)  (estim.)   ==================================================================================  21  Accept  0.053333  0.50884  0.046667  0.04667  14   22  Accept  0.06  0.2789  0.046667  0.046668  7   23  Accept  0.33333  0.27015  0.046667  0.034841  41   24  Accept  0.053333  0.44637  0.046667  0.038382  23   25  Accept  0.053333  0.37655  0.046667  0.036502  31   26  Accept  0.053333  0.40015  0.046667  0.036589  9   27  Accept  0.053333  0.2608  0.046667  0.038774  21   28  Accept  0.053333  0.36868  0.046667  0.038809  2   29  Accept  0.66667  0.31257  0.046667  0.040052  61   30  Accept  0.053333  0.38354  0.046667  0.040271  15  __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 159.3222 seconds. Total objective function evaluation time: 16.19 Best observed feasible point: MinLeafSize ___________ 3 Observed objective function value = 0.046667 Estimated objective function value = 0.040271 Function evaluation time = 0.20661 Best estimated feasible point (according to models): MinLeafSize ___________ 24 Estimated objective function value = 0.040271 Estimated function evaluation time = 0.40236 Mdl = ClassificationTree ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 HyperparameterOptimizationResults: [1×1 BayesianOptimization]
Load the census1994
data set. Consider a model that predicts a person's salary category given their age, working class, education level, martial status, race, sex, capital gain and loss, and number of working hours per week.
load census1994 X = adultdata(:,{'age','workClass','education_num','marital_status','race',... 'sex','capital_gain','capital_loss','hours_per_week','salary'});
Display the number of categories represented in the categorical variables using summary
.
summary(X)
Variables: age: 32561×1 double Values: min 17 median 37 max 90 workClass: 32561×1 categorical Values: Federalgov 960 Localgov 2093 Neverworked 7 Private 22696 Selfempinc 1116 Selfempnotinc 2541 Stategov 1298 Withoutpay 14 <undefined> 1836 education_num: 32561×1 double Values: min 1 median 10 max 16 marital_status: 32561×1 categorical Values: Divorced 4443 MarriedAFspouse 23 Marriedcivspouse 14976 Marriedspouseabsent 418 Nevermarried 10683 Separated 1025 Widowed 993 race: 32561×1 categorical Values: AmerIndianEskimo 311 AsianPacIslander 1039 Black 3124 Other 271 White 27816 sex: 32561×1 categorical Values: Female 10771 Male 21790 capital_gain: 32561×1 double Values: min 0 median 0 max 99999 capital_loss: 32561×1 double Values: min 0 median 0 max 4356 hours_per_week: 32561×1 double Values: min 1 median 40 max 99 salary: 32561×1 categorical Values: <=50K 24720 >50K 7841
Because there are few categories represented in the categorical variables compared to levels in the continuous variables, the standard CART, predictorsplitting algorithm prefers splitting a continuous predictor over the categorical variables.
Train a classification tree using the entire data set. To grow unbiased trees, specify usage of the curvature test for splitting predictors. Because there are missing observations in the data, specify usage of surrogate splits.
Mdl = fitctree(X,'salary','PredictorSelection','curvature',... 'Surrogate','on');
Estimate predictor importance values by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. Compare the estimates using a bar graph.
imp = predictorImportance(Mdl); figure; bar(imp); title('Predictor Importance Estimates'); ylabel('Estimates'); xlabel('Predictors'); h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none';
In this case, capital_gain
is the most important predictor, followed by education_num
.
Tbl
— Sample dataSample data used to train the model, specified as a table. Each
row of Tbl
corresponds to one observation, and
each column corresponds to one predictor variable. Optionally, Tbl
can
contain one additional column for the response variable. Multicolumn
variables and cell arrays other than cell arrays of character vectors
are not allowed.
If Tbl
contains the response variable, and
you want to use all remaining variables in Tbl
as
predictors, then specify the response variable using ResponseVarName
.
If Tbl
contains the response variable, and
you want to use only a subset of the remaining variables in Tbl
as
predictors, then specify a formula using formula
.
If Tbl
does not contain the response variable,
then specify a response variable using Y
. The
length of response variable and the number of rows of Tbl
must
be equal.
Data Types: table
ResponseVarName
— Response variable nameTbl
Response variable name, specified as the name of a variable
in Tbl
.
You must specify ResponseVarName
as a character
vector. For example, if the response variable Y
is
stored as Tbl.Y
, then specify it as 'Y'
.
Otherwise, the software treats all columns of Tbl
,
including Y
, as predictors when training the model.
The response variable must be a categorical or character array,
logical or numeric vector, or cell array of character vectors. If Y
is
a character array, then each element must correspond to one row of
the array.
It is good practice to specify the order of the classes using
the ClassNames
namevalue pair argument.
Data Types: char
formula
— Explanatory model of response and subset of predictor variablesExplanatory model of the response and a subset of the predictor
variables, specified as a character vector in the form of 'Y~X1+X2+X3'
.
In this form, Y
represents the response variable,
and X1
, X2
, and X3
represent
the predictor variables. The variables must be variable names in Tbl
(Tbl.Properties.VariableNames
).
To specify a subset of variables in Tbl
as
predictors for training the model, use a formula. If you specify a
formula, then the software does not use any variables in Tbl
that
do not appear in formula
.
Data Types: char
Y
— Class labelsClass labels, specified as a numeric vector, categorical vector,
logical vector, character array, or cell array of character vectors.
Each row of X
represents the classification of
the corresponding row of X
.
When fitting the tree, fitctree
considers NaN
, ''
(empty
character vector), and <undefined>
values
in Y
to be missing values. fitctree
does
not use observations with missing values for Y
in
the fit.
For numeric Y
, consider fitting a regression
tree using fitrtree
instead.
Data Types: single
 double
 char
 logical
 cell
X
— Predictor dataPredictor data, specified as a numeric matrix.
fitctree
considers NaN
values
in X
as missing values. fitctree
does
not use observations with all missing values for X
in
the fit. fitctree
uses observations with some
missing values for X
to find splits on variables
for which these observations have valid values.
Data Types: single
 double
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'CrossVal','on','MinLeafSize',40
specifies
a crossvalidated classification tree with a minimum of 40 observations
per leaf.Note:
You cannot use any crossvalidation namevalue pair along with 
'AlgorithmForCategorical'
— Algorithm for best categorical predictor split'Exact'
 'PullLeft'
 'PCA'
 'OVAbyClass'
Algorithm to find the best split on a categorical predictor
with C categories for data and K ≥
3 classes, specified as the commaseparated pair consisting of 'AlgorithmForCategorical'
and
one of the following values.
Value  Description 

'Exact'  Consider all 2^{C–1} – 1 combinations. 
'PullLeft'  Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity. 
'PCA'  Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C – 1 splits. 
'OVAbyClass'  Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the minimum impurity. 
fitctree
automatically
selects the optimal subset of algorithms for each split using the
known number of classes and levels of a categorical predictor. For K =
2 classes, fitctree
always performs
the exact search. To specify a particular algorithm, use the 'AlgorithmForCategorical'
namevalue
pair argument.
Example: 'AlgorithmForCategorical','PCA'
'CategoricalPredictors'
— Categorical predictors list'all'
Categorical predictors list, specified as the commaseparated
pair consisting of 'CategoricalPredictors'
and
one of the following:
A numeric vector with indices from 1
through p
,
where p
is the number of columns of X
.
A logical vector of length p
, where
a true
entry means that the corresponding column
of X
is a categorical variable.
A cell array of character vectors, where each element
in the array is the name of a predictor variable. The names must match
entries in PredictorNames
values.
A character matrix, where each row of the matrix is
a name of a predictor variable. The names must match entries in PredictorNames
values.
Pad the names with extra blanks so each row of the character matrix
has the same length.
'all'
, meaning all predictors are
categorical.
By default, if the predictor data is in a matrix (X
),
the software assumes that none of the predictors are categorical.
If the predictor data is in a table (Tbl
), the
software assumes that a variable is categorical if it contains, logical
values, values of the unordered data type categorical
,
or a cell array of character vectors.
Example: 'CategoricalPredictors','all'
Data Types: single
 double
 char
 logical
 cell
'ClassNames'
— Names of classes to use for trainingNames of classes to use for training, specified as the commaseparated
pair consisting of 'ClassNames'
and a categorical
or character array, logical or numeric vector, or cell array of character
vectors. ClassNames
must be the same data type
as Y
.
If ClassNames
is a character array, then
each element must correspond to one row of the
array.
Use ClassNames
to:
Order the classes during training.
Specify the order of any input or output argument
dimension that corresponds to the class order. For example, use ClassNames
to
specify the order of the dimensions of Cost
or
the column order of classification scores returned by predict
.
Select a subset of classes for training. For example,
suppose that the set of all distinct class names in Y
is {'a','b','c'}
.
To train the model using observations from classes 'a'
and 'c'
only,
specify 'ClassNames',{'a','c'}
.
The default is the set of all distinct class names in Y
.
Example: 'ClassNames',{'b','g'}
Data Types: categorical
 char
 logical
 single
 double
 cell
'Cost'
— Cost of misclassificationCost of misclassification of a point, specified as the commaseparated
pair consisting of 'Cost'
and one of the following:
Square matrix, where Cost(i,j)
is
the cost of classifying a point into class j
if
its true class is i
(i.e., the rows correspond
to the true class and the columns correspond to the predicted class).
To specify the class order for the corresponding rows and columns
of Cost
, also specify the ClassNames
namevalue
pair argument.
Structure S
having two fields: S.ClassNames
containing
the group names as a variable of the same data type as Y
,
and S.ClassificationCosts
containing the cost matrix.
The default is Cost(i,j)=1
if i~=j
,
and Cost(i,j)=0
if i=j
.
Data Types: single
 double
 struct
'MaxNumCategories'
— Maximum category levels10
(default)  nonnegative scalar valueMaximum category levels, specified as the commaseparated pair
consisting of 'MaxNumCategories'
and a nonnegative
scalar value. fitctree
splits
a categorical predictor using the exact search algorithm if the predictor
has at most MaxNumCategories
levels in the split
node. Otherwise, fitctree
finds
the best categorical split using one of the inexact algorithms.
Passing a small value can lead to loss of accuracy and passing a large value can increase computation time and memory overload.
Example: 'MaxNumCategories',8
'MergeLeaves'
— Leaf merge flag'on'
(default)  'off'
Leaf merge flag, specified as the commaseparated pair consisting
of 'MergeLeaves'
and 'on'
or 'off'
.
If MergeLeaves
is 'on'
,
then fitctree
:
Merges leaves that originate from the same parent node, and that yields a sum of risk values greater or equal to the risk associated with the parent node
Estimates the optimal sequence of pruned subtrees, but does not prune the classification tree
Otherwise, fitctree
does not merge leaves.
Example: 'MergeLeaves','off'
'MinParentSize'
— Minimum number of branch node observations10
(default)  positive integer valueMinimum number of branch node observations, specified as the
commaseparated pair consisting of 'MinParentSize'
and
a positive integer value. Each branch node in the tree has at least MinParentSize
observations.
If you supply both MinParentSize
and MinLeafSize
, fitctree
uses the setting that gives larger
leaves: MinParentSize = max(MinParentSize,2*MinLeafSize)
.
Example: 'MinParentSize',8
Data Types: single
 double
'PredictorNames'
— Predictor variable namesPredictor variable names, specified as the commaseparated pair
consisting of 'PredictorNames'
and a cell array
of unique character vectors. The functionality of 'PredictorNames'
depends
on the way you supply the training data.
If you supply X
and Y
,
then you can use 'PredictorNames'
to give the predictor
variables in X
names.
The order of the names in PredcitorNames
must
correspond to the column order of X
. That is, PredictorNames{1}
is
the name of X(:,1)
, PredictorNames{2}
is
the name of X(:,2)
, and so on. Also, size(X,2)
and numel(PredictorNames)
must
be equal.
By default, PredictorNames
is {x1,x2,...}
.
If you supply Tbl
, then you can
use 'PredictorNames'
to choose which predictor
variables to use in training. That is, fitctree
uses
the predictor variables in PredictorNames
and the
response only in training.
PredictorNames
must be a subset
of Tbl.Properties.VariableNames
and cannot include
the name of the response variable.
By default, PredictorNames
contains
the names of all predictor variables.
It good practice to specify the predictors for training
using one of 'PredictorNames'
or formula
only.
Example: 'PredictorNames',{'SepalLength','SepalWidth','PedalLength','PedalWidth'}
Data Types: cell
'PredictorSelection'
— Algorithm used to select the best split predictor'allsplits'
(default)  'curvature'
 'interactioncurvature'
Algorithm used to select the best split predictor at each node,
specified as the commaseparated pair consisting of 'PredictorSelection'
and
a value in this table.
Value  Description 

'allsplits'  Standard CART — Selects the split predictor that maximizes the splitcriterion gain over all possible splits of all predictors [1]. 
'curvature'  Curvature test — Selects the split predictor that minimizes the pvalue of chisquare tests of independence between each predictor and the response [4]. Training speed is similar to standard CART. 
'interactioncurvature'  Interaction test — Chooses the split predictor that minimizes the pvalue of chisquare tests of independence between each predictor and the response, and that minimizes the pvalue of a chisquare test of independence between each pair of predictors and response [3]. Training speed can be slower than standard CART. 
For 'curvature'
and 'interactioncurvature'
,
if all tests yield pvalues greater than 0.05,
then fitctree
stops splitting nodes.
Tip

For details on how fitctree
selects
split predictors, see Node Splitting Rules.
Example: 'PredictorSelection','curvature'
Data Types: char
'Prior'
— Prior probabilities'empirical'
(default)  'uniform'
 vector of scalar values  structurePrior probabilities for each class, specified as the commaseparated
pair consisting of 'Prior'
and one of the following.
A character vector:
'empirical'
determines class probabilities
from class frequencies in Y
. If you pass observation
weights, fitctree
uses the weights to compute
the class probabilities.
'uniform'
sets all class probabilities
equal.
A vector (one scalar value for each class). To specify
the class order for the corresponding elements of Prior
,
also specify the ClassNames
namevalue pair argument.
A structure S
with two fields:
S.ClassNames
containing the class
names as a variable of the same type as Y
S.ClassProbs
containing a vector
of corresponding probabilities
If you set values for both weights
and prior
,
the weights are renormalized to add up to the value of the prior probability
in the respective class.
Example: 'Prior','uniform'
'Prune'
— Flag to estimate optimal sequence of pruned subtrees'on'
(default)  'off'
Flag to estimate the optimal sequence of pruned subtrees, specified
as the commaseparated pair consisting of 'Prune'
and 'on'
or 'off'
.
If Prune
is 'on'
, then fitctree
grows
the classification tree without pruning it, but estimates the optimal
sequence of pruned subtrees. Otherwise, fitctree
grows
the classification tree without estimating the optimal sequence of
pruned subtrees.
To prune a trained ClassificationTree
model, pass it to prune
.
Example: 'Prune','off'
'PruneCriterion'
— Pruning criterion'error'
(default)  'impurity'
Pruning criterion, specified as the commaseparated pair consisting
of 'PruneCriterion'
and 'error'
or 'impurity'
.
Example: 'PruneCriterion','impurity'
'ResponseName'
— Response variable name'Y'
(default)  character vectorResponse variable name, specified as the commaseparated pair
consisting of 'ResponseName'
and a character vector
representing the name of the response variable.
This namevalue pair is not valid when using the ResponseVarName
or formula
input
arguments.
Example: 'ResponseName','IrisType'
'ScoreTransform'
— Score transform function'none'
 character vector  function handleScore transform function, specified as the commaseparated pair
consisting of 'ScoreTransform'
and a function handle
for transforming scores. Your function must accept a matrix (the original
scores) and return a matrix of the same size (the transformed scores).
Alternatively, you can specify one of the following character vectors representing a builtin transformation function.
Value  Formula 

'doublelogit'  1/(1 + e^{–2x}) 
'invlogit'  log(x / (1–x)) 
'ismax'  Set the score for the class with the largest score to 1 ,
and scores for all other classes to 0 . 
'logit'  1/(1 + e^{–x}) 
'none' or 'identity'  x (no transformation) 
'sign'  –1 for x < 0 0 for x = 0 1 for x > 0 
'symmetric'  2x – 1 
'symmetriclogit'  2/(1 + e^{–x}) – 1 
'symmetricismax'  Set the score for the class with the largest score to 1 ,
and scores for all other classes to 1 . 
Example: 'ScoreTransform','logit'
'Surrogate'
— Surrogate decision splits flag'off'
(default)  'on'
 'all'
 positive integer valueSurrogate
decision splits flag, specified as the commaseparated pair
consisting of 'Surrogate'
and 'on'
, 'off'
, 'all'
,
or a positive integer value.
When set to 'on'
, fitctree
finds at most 10 surrogate splits
at each branch node.
When set to 'all'
, fitctree
finds all surrogate splits at
each branch node. The 'all'
setting can use considerable
time and memory.
When set to a positive integer value, fitctree
finds at most the specified number
of surrogate splits at each branch node.
Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also lets you compute measures of predictive association between predictors. For more details, see Node Splitting Rules.
Example: 'Surrogate','on'
Data Types: single
 double
 char
'Weights'
— Observation weightsones(size(x,1),1)
(default)  vector of scalar valuesObservation weights, specified as the commaseparated pair consisting
of 'Weights'
and a vector of scalar values. The
software weights the observations in each row of X
or Tbl
with
the corresponding value in Weights
. The size of Weights
must
equal the number of rows in X
or Tbl
.
If you specify the input data as a table Tbl
,
then Weights
can be the name of a variable in Tbl
that
contains a numeric vector. In this case, you must specify Weights
as
a character vector. For example, if weights vector W
is
stored as Tbl.W
, then specify it as 'W'
.
Otherwise, the software treats all columns of Tbl
,
including W
, as predictors when training the model.
fitctree
normalizes the
weights in each class to add up to the value of the prior probability
of the class.
Data Types: single
 double
'CrossVal'
— Flag to grow crossvalidated decision tree'off'
(default)  'on'
Flag to grow a crossvalidated decision tree, specified as
the commaseparated pair consisting of 'CrossVal'
and 'on'
or 'off'
.
If 'on'
, fitctree
grows
a crossvalidated decision tree with 10 folds. You can override this
crossvalidation setting using one of the 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
namevalue pair arguments. You
can only use one of these four arguments at a time when creating a
crossvalidated tree.
Alternatively, cross validate tree
later
using the crossval
method.
Example: 'CrossVal','on'
'CVPartition'
— Partition for crossvalidated treecvpartition
objectPartition to use in a crossvalidated tree, specified as the
commaseparated pair consisting of 'CVPartition'
and
an object created using cvpartition
.
If you use 'CVPartition'
, you cannot use
any of the 'KFold'
, 'Holdout'
,
or 'Leaveout'
namevalue pair arguments.
'Holdout'
— Fraction of data for holdout validation0
(default)  scalar value in the range [0,1]
Fraction of data used for holdout validation, specified as the
commaseparated pair consisting of 'Holdout'
and
a scalar value in the range [0,1]
. Holdout validation
tests the specified fraction of the data, and uses the rest of the
data for training.
If you use 'Holdout'
, you cannot use any
of the 'CVPartition'
, 'KFold'
,
or 'Leaveout'
namevalue pair arguments.
Example: 'Holdout',0.1
Data Types: single
 double
'KFold'
— Number of folds10
(default)  positive integer value greater than 1Number of folds to use in a crossvalidated classifier, specified
as the commaseparated pair consisting of 'KFold'
and
a positive integer value greater than 1. If you specify, e.g., 'KFold',k
,
then the software:
Randomly partitions the data into k sets
For each set, reserves the set as validation data, and trains the model using the other k – 1 sets
Stores the k
compact, trained
models in the cells of a k
by1 cell vector
in the Trained
property of the crossvalidated
model.
To create a crossvalidated model, you can use one of these
four options only: CVPartition
, Holdout
, KFold
,
or Leaveout
.
Example: 'KFold',8
Data Types: single
 double
'Leaveout'
— Leaveoneout crossvalidation flag'off'
(default)  'on'
Leaveoneout crossvalidation flag, specified as the commaseparated
pair consisting of 'Leaveout'
and 'on'
or 'off'
.
Specify 'on'
to use leaveoneout crossvalidation.
If you use 'Leaveout'
, you cannot use any
of the 'CVPartition'
, 'Holdout'
,
or 'KFold'
namevalue pair arguments.
Example: 'Leaveout','on'
'MaxNumSplits'
— Maximal number of decision splitssize(X,1)  1
(default)  positive integerMaximal number of decision splits (or branch nodes), specified
as the commaseparated pair consisting of 'MaxNumSplits'
and
a positive integer. fitctree
splits MaxNumSplits
or
fewer branch nodes. For more details on splitting behavior, see Algorithms.
Example: 'MaxNumSplits',5
Data Types: single
 double
'MinLeafSize'
— Minimum number of leaf node observations1
(default)  positive integer valueMinimum number of leaf node observations, specified as the commaseparated
pair consisting of 'MinLeafSize'
and a positive
integer value. Each leaf has at least MinLeafSize
observations
per tree leaf. If you supply both MinParentSize
and MinLeafSize
, fitctree
uses
the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize)
.
Example: 'MinLeafSize',3
Data Types: single
 double
'NumVariablesToSample'
— Number of predictors to select at random for each split'all'
 positive integer valueNumber of predictors to select at random for each split, specified
as the commaseparated pair consisting of 'NumVariablesToSample'
and
a positive integer value. You can also specify 'all'
to
use all available predictors.
Example: 'NumVariablesToSample',3
Data Types: single
 double
'SplitCriterion'
— Split criterion'gdi'
(default)  'twoing'
 'deviance'
Split criterion, specified as the commaseparated pair consisting
of 'SplitCriterion'
and 'gdi'
(Gini's
diversity index), 'twoing'
for the twoing rule,
or 'deviance'
for maximum deviance reduction (also
known as cross entropy).
Example: 'SplitCriterion','deviance'
'OptimizeHyperparameters'
— Parameters to optimize'none'
(default)  'auto'
 'all'
 cell array of eligible parameter names  vector of optimizableVariable
objectsParameters to optimize, specified as:
'none'
— Do not optimize.
'auto'
— Use {'MinLeafSize'}
'all'
— Optimize all eligible
parameters.
Cell array of eligible parameter names
Vector of optimizableVariable
objects,
typically the output of hyperparameters
The optimization attempts to minimize the crossvalidation loss
(error) for fitctree
by varying the parameters.
For information about crossvalidation loss (albeit in a different
context), see Classification Loss.
To control the crossvalidation type and other aspects of the optimization,
use the HyperparameterOptimizationOptions
namevalue
pair.
The eligible parameters for fitctree
are:
MaxNumSplits
— fitctree
searches
among integers, by default logscaled in the range [1,max(2,NumObservations1)]
.
MinLeafSize
— fitctree
searches
among integers, by default logscaled in the range [1,max(2,floor(NumObservations/2))]
.
SplitCriterion
— For two
classes, fitctree
searches among 'gdi'
and 'deviance'
.
For three or more classes, fitctree
also searches
among 'twoing'
.
NumVariablesToSample
— fitctree
does
not optimize over this hyperparameter. If you pass NumVariablesToSample
as
a parameter name, fitctree
simply uses the full
number of predictors. However, fitcensemble
does
optimize over this hyperparameter.
Set nondefault parameters by passing a vector of optimizableVariable
objects
that have nondefault values. For example,
load fisheriris params = hyperparameters('fitctree',meas,species); params(1).Range = [1,30];
Pass params
as the value of OptimizeHyperparameters
.
By default, iterative display appears at the command line, and
plots appear according to the number of hyperparameters in the optimization.
For the optimization and plots, the objective function is log(1 + crossvalidation loss) for
regression, and the misclassification rate for classification. To
control the iterative display, set the HyperparameterOptimizationOptions
namevalue
pair, Verbose
field. To control the plots, set
the HyperparameterOptimizationOptions
namevalue
pair, ShowPlots
field.
For an example, see Optimize Classification Tree.
Example: 'auto'
Data Types: char
 cell
'HyperparameterOptimizationOptions'
— Options for optimizationOptions for optimization, specified as a structure. Modifies
the effect of the OptimizeHyperparameters
namevalue
pair. All fields in the structure are optional.
Field Name  Values  Default 

Optimizer 
 'bayesopt' 
AcquisitionFunctionName 
bayesopt AcquisitionFunctionName namevalue
pair, or Acquisition Function Types.  'expectedimprovementpersecondplus' 
MaxObjectiveEvaluations  Maximum number of objective function evaluations.  30 for 'bayesopt' or 'randomsearch' ,
and the entire grid for 'gridsearch' 
NumGridDivisions  For 'gridsearch' , the number of values in
each dimension. Can be a vector of positive integers giving the number
of values for each dimension, or a scalar that applies to all dimensions.
Ignored for categorical variables.  10 
ShowPlots  Logical value indicating whether to show plots. If true ,
plots the best objective function value against iteration number.
If there are one or two optimization parameters, and if Optimizer is 'bayesopt' ,
then ShowPlots also plots a model of the objective
function against the parameters.  true 
SaveIntermediateResults  Logical value indicating whether to save results when Optimizer is 'bayesopt' .
If true , overwrites a workspace variable named 'BayesoptResults' at
each iteration. The variable is a BayesianOptimization object.  false 
Verbose  Display to the command line.
bayesopt Verbose namevalue
pair.  1 
Repartition  Logical value indicating whether to repartition the crossvalidation
at every iteration. If
 false 
Use no more than one of the following three field names.  
CVPartition  A cvpartition object, as created by cvpartition  Kfold = 5 
Holdout  A scalar in the range (0,1) representing
the holdout fraction.  
Kfold  An integer greater than 1. 
Example: struct('MaxObjectiveEvaluations',60)
Data Types: struct
tree
— Classification treeClassification tree, returned as a classification tree object.
Using the 'CrossVal'
, 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
options results in a tree of class ClassificationPartitionedModel
.
You cannot use a partitioned tree for prediction, so this kind of
tree does not have a predict
method. Instead, use kfoldpredict
to predict responses for observations
not used for training.
Otherwise, tree
is of class ClassificationTree
,
and you can use the predict
method to make predictions.
The curvature test is a statistical test assessing the null hypothesis that two variables are unassociated.
The curvature test between predictor variable x and y is conducted using this process.
If x is continuous, then partition it into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.
For each level in the partitioned predictor j = 1...J and class in the response k = 1,...,K, compute the weighted proportion of observations in class k
$${\widehat{\pi}}_{jk}={\displaystyle \sum _{i=1}^{n}I\{{y}_{i}=k\}}{w}_{i}.$$
w_{i} is the weight of observation i, $$\sum {w}_{i}}=1$$, I is the indicator function, and n is the sample size. If all observations have the same weight, then $${\widehat{\pi}}_{jk}=\frac{{n}_{jk}}{n}$$, where n_{jk} is the number of observations in level j of the predictor that are in class k.
Compute the test statistic
$$t=n{\displaystyle \sum _{k=1}^{K}{\displaystyle \sum _{j=1}^{J}\frac{{\left({\widehat{\pi}}_{jk}{\widehat{\pi}}_{j+}{\widehat{\pi}}_{+k}\right)}^{2}}{{\widehat{\pi}}_{j+}{\widehat{\pi}}_{+k}}}}$$
$${\widehat{\pi}}_{j+}={\displaystyle \sum _{k}{\widehat{\pi}}_{jk}}$$, that is, the marginal probability of observing the predictor at level j. $${\widehat{\pi}}_{+k}={\displaystyle \sum _{j}{\widehat{\pi}}_{jk}}$$, that is the marginal probability of observing class k. If n is large enough, then t is distributed as a χ^{2} with (K – 1)(J – 1) degrees of freedom.
If the pvalue for the test is less than 0.05, then reject the null hypothesis that there is no association between x and y.
When determining the best split predictor at each node, the standard CART algorithm prefers to select continuous predictors that have many levels. Sometimes, such a selection can be spurious and can also mask more important predictors that have fewer levels, such as categorical predictors.
The curvature test can be applied instead of standard CART to determine the best split predictor at each node. In that case, the best split predictor variable is the one that minimizes the significant pvalues (those less than 0.05) of curvature tests between each predictor and the response variable. Such a selection is robust to the number of levels in individual predictors.
Note:
If levels of a predictor are pure for a particular class, then 
For more details on how the curvature test applies to growing classification trees, see Node Splitting Rules and [4].
ClassificationTree
splits
nodes based on either impurity or node
error.
Impurity means one of several things, depending on your choice
of the SplitCriterion
namevalue pair argument:
Gini's Diversity Index (gdi
) —
The Gini index of a node is
$$1{\displaystyle \sum _{i}{p}^{2}(i)},$$
where the sum is over the classes i at the
node, and p(i) is the observed
fraction of classes with class i that reach the
node. A node with just one class (a pure node)
has Gini index 0
; otherwise the Gini index is positive.
So the Gini index is a measure of node impurity.
Deviance ('deviance'
) —
With p(i) defined the same as
for the Gini index, the deviance of a node is
$${\displaystyle \sum _{i}p(i)\mathrm{log}p(i)}.$$
A pure node has deviance 0
; otherwise, the
deviance is positive.
Twoing rule ('twoing'
) —
Twoing is not a purity measure of a node, but is a different measure
for deciding how to split a node. Let L(i)
denote the fraction of members of class i in the
left child node after a split, and R(i)
denote the fraction of members of class i in the
right child node after a split. Choose the split criterion to maximize
$$P(L)P(R){\left({\displaystyle \sum _{i}\leftL(i)R(i)\right}\right)}^{2},$$
where P(L) and P(R) are the fractions of observations that split to the left and right respectively. If the expression is large, the split made each child node purer. Similarly, if the expression is small, the split made each child node similar to each other, and hence similar to the parent node, and so the split did not increase node purity.
Node error — The node error is the fraction of misclassified classes at a node. If j is the class with the largest number of training samples at a node, the node error is
1 – p(j).
The interaction test is a statistical test that assesses the null hypothesis that there is no interaction between a pair of predictor variables and the response variable.
The interaction test assessing the association between predictor variables x_{1} and x_{2} with respect to y is conducted using this process.
If x_{1} or x_{2} is continuous, then partition that variable into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.
Create the nominal variable z with J = J_{1}J_{2} levels that assigns an index to observation i according to which levels of x_{1} and x_{2} it belongs. Remove any levels of z that do not correspond to any observations.
Conduct a curvature test between z and y.
When growing decision trees, if there are important interactions between pairs of predictors, but there are also many other less important predictors in the data, then standard CART tends to miss the important interactions. However, conducting curvature and interaction tests for predictor selection instead can improve detection of important interactions, which can yield more accurate decision trees.
For more details on how the interaction test applies to growing decision trees, see Curvature Test, Node Splitting Rules and [3].
The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split yields the maximum predictive measure of association. The secondbest surrogate split has the secondlargest predictive measure of association.
Suppose x_{j} and x_{k} are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split x_{j} < u and a surrogate split x_{k} < v is
$${\lambda}_{jk}=\frac{\text{min}\left({P}_{L},{P}_{R}\right)\left(1{P}_{{L}_{j}{L}_{k}}{P}_{{R}_{j}{R}_{k}}\right)}{\text{min}\left({P}_{L},{P}_{R}\right)}.$$
P_{L} is the proportion of observations in node t, such that x_{j} < u. The subscript L stands for the left child of node t.
P_{R} is the proportion of observations in node t, such that x_{j} ≥ u. The subscript R stands for the right child of node t.
$${P}_{{L}_{j}{L}_{k}}$$ is the proportion of observations at node t, such that x_{j} < u and x_{k} < v.
$${P}_{{R}_{j}{R}_{k}}$$ is the proportion of observations at node t, such that x_{j} ≥ u and x_{k} ≥ v.
Observations with missing values for x_{j} or x_{k} do not contribute to the proportion calculations.
λ_{jk} is a value in (–∞,1]. If λ_{jk} > 0, then x_{k} < v is a worthwhile surrogate split for x_{j} < u.
A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion.
When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the secondbest surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association.
fitctree
uses these processes to determine
how to split node t.
For standard CART (that is, if PredictorSelection
is 'allpairs'
)
and for all predictors x_{i}, i = 1,...,p:
fitctree
computes the weighted
impurity of node t, i_{t}.
For supported impurity measures, see SplitCriterion
.
fitctree
estimates the probability
that an observation is in node t using
$$P\left(T\right)={\displaystyle \sum _{j\in T}{w}_{j}}.$$
w_{j} is
the weight of observation j, and T is
the set of all observation indices in node t. If
you do not specify Prior
or Weights
, then w_{j} =
1/n, where n is the sample size.
fitctree
sorts x_{i} in
ascending order. Each element of the sorted predictor is a splitting
candidate or cut point. fitctree
stores any
indices corresponding to missing values in the set T_{U},
which is the unsplit set.
fitctree
determines the best
way to split node t using x_{i} by
maximizing the impurity gain (ΔI) over all
splitting candidates. That is, for all splitting candidates in x_{i}:
fitctree
splits the observations
in node t into left and right child nodes (t_{L} and t_{R},
respectively).
fitctree
computes ΔI.
Suppose that for a particular splitting candidate, t_{L} and t_{R} contain
observation indices in the sets T_{L} and T_{R},
respectively.
If x_{i} does not contain any missing values, then the impurity gain for the current splitting candidate is
$$\Delta I=P\left(T\right){i}_{t}P\left({T}_{L}\right){i}_{{t}_{L}}P\left({T}_{R}\right){i}_{{t}_{R}}.$$
If x_{i} contains missing values then, assuming that the observations are missing at random, the impurity gain is
$$\Delta {I}_{U}=P\left(T{T}_{U}\right){i}_{t}P\left({T}_{L}\right){i}_{{t}_{L}}P\left({T}_{R}\right){i}_{{t}_{R}}.$$
T – T_{U} is the set of all observation indices in node t that are not missing.
If you use surrogate decision splits, then:
fitctree
computes the predictive
measures of association between the decision split x_{j} < u and
all possible decision splits x_{k} < v, j ≠ k.
fitctree
sorts the possible
alternative decision splits in descending order by their predictive
measure of association with the optimal split. The surrogate split
is the decision split yielding the largest measure.
fitctree
decides the child
node assignments for observations with a missing value for x_{i} using
the surrogate split. If the surrogate predictor also contains a missing
value, then fitctree
uses the decision split
with the second largest measure, and so on, until there are no other
surrogates. It is possible for fitctree
to
split two different observations at node t using
two different surrogate splits. For example, suppose the predictors x_{1} and x_{2} are
the best and second best surrogates, respectively, for the predictor x_{i}, i ∉
{1,2}, at node t. If observation m of
predictor x_{i} is missing
(i.e., x_{mi} is missing),
but x_{m1} is
not missing, then x_{1} is
the surrogate predictor for observation x_{mi}.
If observations x_{(m +
1),i} and x(m +
1),1 are missing, but x_{(m +
1),2} is not missing, then x_{2} is
the surrogate predictor for observation m + 1.
fitctree
uses the appropriate
impurity gain formula. That is, if fitctree
fails
to assign all missing observations in node t to
children nodes using surrogate splits, then the impurity gain is ΔI_{U}.
Otherwise, fitctree
uses ΔI for
the impurity gain.
fitctree
chooses the candidate
that yields the largest impurity gain.
fitctree
splits the predictor
variable at the cut point that maximizes the impurity gain.
For the curvature test (that is, if PredictorSelection
is 'curvature'
):
fitctree
conducts curvature tests between
each predictor and the response for observations in node t.
If all pvalues are at least 0.05,
then fitctree
does not split node t.
If there is a minimal pvalue,
then fitctree
chooses the corresponding predictor
to split node t.
If more than one pvalue is zero
due to underflow, then fitctree
applies standard
CART to the corresponding predictors to choose the split predictor.
If fitctree
chooses a split
predictor, then it uses standard CART to choose the cut point (see
step 4 in the standard CART process).
For the interaction test (that is, if PredictorSelection
is 'interactioncurvature'
):
For observations in node t, fitctree
conducts curvature tests between
each predictor and the response and interaction tests between each pair
of predictors and the response.
If all pvalues are at least 0.05,
then fitctree
does not split node t.
If there is a minimal pvalue and
it is the result of a curvature test, then fitctree
chooses
the corresponding predictor to split node t.
If there is a minimal pvalue and
it is the result of an interaction test, then fitctree
chooses
the split predictor using standard CART on the corresponding pair
of predictors.
If more than one pvalue is zero
due to underflow, then fitctree
applies standard
CART to the corresponding predictors to choose the split predictor.
If fitctree
chooses a split
predictor, then it uses standard CART to choose the cut point (see
step 4 in the standard CART process).
If MergeLeaves
is 'on'
and PruneCriterion
is 'error'
(which
are the default values for these namevalue pair arguments), then
the software applies pruning only to the leaves and by using classification
error. This specification amounts to merging leaves that share the
most popular class per leaf.
To accommodate MaxNumSplits
, fitctree
splits
all nodes in the current layer, and then counts
the number of branch nodes. A layer is the set of nodes that are equidistant
from the root node. If the number of branch nodes exceeds MaxNumSplits
, fitctree
follows
this procedure:
Determine how many branch nodes in the current layer
must be unsplit so that there are at most MaxNumSplits
branch
nodes.
Sort the branch nodes by their impurity gains.
Unsplit the number of least successful branches.
Return the decision tree grown so far.
This procedure produces maximally balanced trees.
The software splits branch nodes layer by layer until at least one of these events occurs:
There are MaxNumSplits
branch
nodes.
A proposed split causes the number of observations
in at least one branch node to be fewer than MinParentSize
.
A proposed split causes the number of observations
in at least one leaf node to be fewer than MinLeafSize
.
The algorithm cannot find a good split within a layer
(i.e., the pruning criterion (see PruneCriterion
),
does not improve for all proposed splits in a layer). A special case
is when all nodes are pure (i.e., all observations in the node have
the same class).
For values 'curvature'
or 'interactioncurvature'
of PredictorSelection
,
all tests yield pvalues greater than 0.05.
MaxNumSplits
and MinLeafSize
do
not affect splitting at their default values. Therefore, if you set 'MaxNumSplits'
,
splitting might stop due to the value of MinParentSize
,
before MaxNumSplits
splits occur.
For dualcore systems and above, fitctree
parallelizes
training decision trees using Intel^{®} Threading Building Blocks
(TBB). For details on Intel TBB, see https://software.intel.com/enus/inteltbb.
[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.
[2] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.
[3] Loh, W.Y. "Regression Trees with Unbiased Variable Selection and Interaction Detection." Statistica Sinica, Vol. 12, 2002, pp. 361–386.
[4] Loh, W.Y. and Y.S. Shih. "Split Selection Methods for Classification Trees." Statistica Sinica, Vol. 7, 1997, pp. 815–840.
ClassificationPartitionedModel
 ClassificationTree
 kfoldpredict
 predict
 prune
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
You can also select a location from the following list: