kfoldEdge
Classification edge for cross-validated classification model
Description
returns the classification edge obtained by the cross-validated
classification model E = kfoldEdge(CVMdl)CVMdl. For every fold,
kfoldEdge computes the classification edge for validation-fold
observations using a classifier trained on training-fold observations.
CVMdl.X and CVMdl.Y contain both sets of
observations.
returns the classification edge with additional options specified by one or more name-value
arguments. For example, specify the folds to use or specify to compute the classification
edge for each individual fold.E = kfoldEdge(CVMdl,Name,Value)
Examples
Compute the k-fold edge for a model trained on Fisher's iris data.
Load Fisher's iris data set.
load fisheririsTrain a classification tree classifier.
tree = fitctree(meas,species);
Cross-validate the classifier using 10-fold cross-validation.
cvtree = crossval(tree);
Compute the k-fold edge.
edge = kfoldEdge(cvtree)
edge = 0.8578
Compute the k-fold edge for an ensemble trained on the Fisher iris data.
Load the sample data set.
load fisheririsTrain an ensemble of 100 boosted classification trees.
t = templateTree('MaxNumSplits',1); % Weak learner template tree object ens = fitcensemble(meas,species,'Learners',t);
Create a cross-validated ensemble from ens and find the classification edge.
rng(10,'twister') % For reproducibility cvens = crossval(ens); E = kfoldEdge(cvens)
E = 3.2033
Input Arguments
Cross-validated partitioned classifier, specified as a ClassificationPartitionedModel, ClassificationPartitionedEnsemble, or ClassificationPartitionedGAM object. You can create the object in two ways:
Pass a trained classification model listed in the following table to its
crossvalobject function.Train a classification model using a function listed in the following table and specify one of the cross-validation name-value arguments for the function.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
Example: kfoldEdge(CVMdl,'Folds',[1 2 3 5]) specifies to use the
first, second, third, and fifth folds to compute the classification edge, but to exclude the
fourth fold.
Fold indices to use, specified as a positive integer vector. The elements of Folds must be within the range from 1 to CVMdl.KFold.
The software uses only the folds specified in Folds.
Example: 'Folds',[1 4 10]
Data Types: single | double
Flag to include interaction terms of the model, specified as true or
false. This argument is valid only for a generalized
additive model (GAM). That is, you can specify this argument only when
CVMdl is ClassificationPartitionedGAM.
The default value is true if the models in
CVMdl (CVMdl.Trained) contain
interaction terms. The value must be false if the models do not
contain interaction terms.
Example: 'IncludeInteractions',false
Data Types: logical
Aggregation level for the output, specified as 'average', 'individual', or 'cumulative'.
| Value | Description |
|---|---|
'average' | The output is a scalar average over all folds. |
'individual' | The output is a vector of length k containing one value per fold, where k is the number of folds. |
'cumulative' | Note If you want to specify this value,
|
Example: 'Mode','individual'
Output Arguments
Classification edge, returned as a numeric scalar or numeric column vector.
If
Modeis'average', thenEis the average classification edge over all folds.If
Modeis'individual', thenEis a k-by-1 numeric column vector containing the classification edge for each fold, where k is the number of folds.If
Modeis'cumulative'andCVMdlisClassificationPartitionedEnsemble, thenEis amin(CVMdl.NumTrainedPerFold)-by-1 numeric column vector. Each elementjis the average classification edge over all folds that the function obtains by using ensembles trained with weak learners1:j.If
Modeis'cumulative'andCVMdlisClassificationPartitionedGAM, then the output value depends on theIncludeInteractionsvalue.If
IncludeInteractionsisfalse, thenLis a(1 + min(NumTrainedPerFold.PredictorTrees))-by-1 numeric column vector. The first element ofLis the average classification edge over all folds that is obtained using only the intercept (constant) term. The(j + 1)th element ofLis the average edge obtained using the intercept term and the firstjpredictor trees per linear term.If
IncludeInteractionsistrue, thenLis a(1 + min(NumTrainedPerFold.InteractionTrees))-by-1 numeric column vector. The first element ofLis the average classification edge over all folds that is obtained using the intercept (constant) term and all predictor trees per linear term. The(j + 1)th element ofLis the average edge obtained using the intercept term, all predictor trees per linear term, and the firstjinteraction trees per interaction term.
More About
The classification edge is the weighted mean of the classification margins.
One way to choose among multiple classifiers, for example to perform feature selection, is to choose the classifier that yields the greatest edge.
The classification margin for binary classification is, for each observation, the difference between the classification score for the true class and the classification score for the false class. The classification margin for multiclass classification is the difference between the classification score for the true class and the maximal score for the false classes.
If the margins are on the same scale (that is, the score values are based on the same score transformation), then they serve as a classification confidence measure. Among multiple classifiers, those that yield greater margins are better.
Algorithms
kfoldEdge computes the classification edge as described in the
corresponding edge object function. For a model-specific description, see
the appropriate edge function reference page in the following
table.
| Model Type | edge Function |
|---|---|
| Discriminant analysis classifier | edge |
| Ensemble classifier | edge |
| Generalized additive model classifier | edge |
| k-nearest neighbor classifier | edge |
| Naive Bayes classifier | edge |
| Neural network classifier | edge |
| Support vector machine classifier | edge |
| Binary decision tree for multiclass classification | edge |
Extended Capabilities
Usage notes and limitations:
This function fully supports GPU arrays for the following cross-validated model objects:
Ensemble classifier trained with
fitcensemblek-nearest neighbor classifier trained with
fitcknnSupport vector machine classifier trained with
fitcsvmBinary decision tree for multiclass classification trained with
fitctreeNeural network for classification trained with
fitcnet
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011akfoldEdge fully supports GPU arrays for ClassificationPartitionedModel models trained using
fitcnet.
Starting in R2023b, the following classification model object functions use observations with missing predictor values as part of resubstitution ("resub") and cross-validation ("kfold") computations for classification edges, losses, margins, and predictions.
In previous releases, the software omitted observations with missing predictor values from the resubstitution and cross-validation computations.
If you specify a nondefault cost matrix when you cross-validate the input model object for an SVM or ensemble classification model, the kfoldEdge function returns a different value compared to previous releases.
The kfoldEdge function uses the
observation weights stored in the W property. The way the function uses the
W property value has not changed. However, the property value stored in the input model object has changed for
cross-validated SVM and ensemble model objects with a nondefault cost matrix, so the
function can return a different value.
For details about the property value change, see Cost property stores the user-specified cost matrix (cross-validated SVM classifier) or Cost property stores the user-specified cost matrix (cross-validated ensemble classifier).
If you want the software to handle the cost matrix, prior
probabilities, and observation weights in the same way as in previous releases, adjust the prior
probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a
classification model, specify the adjusted prior probabilities and observation weights by using
the Prior and Weights name-value arguments, respectively,
and use the default cost matrix.
See Also
kfoldPredict | kfoldMargin | kfoldLoss | kfoldfun | ClassificationPartitionedModel
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)