# ClassificationPartitionedLinear

Cross-validated linear model for binary classification of high-dimensional data

## Description

`ClassificationPartitionedLinear`

is a set of linear
classification models trained on cross-validated folds. You can estimate the quality of
classification, or how well the linear classification model generalizes, using one or
more *kfold* functions: `kfoldPredict`

, `kfoldLoss`

, `kfoldMargin`

, and `kfoldEdge`

.

Every *kfold* object function uses models trained on
training-fold (in-fold) observations to predict the response for validation-fold
(out-of-fold) observations. For example, suppose that you cross-validate using five
folds. The software randomly assigns each observation into five groups of equal size
(roughly). The *training fold* contains four of the groups (roughly
4/5 of the data), and the *validation fold* contains the other
group (roughly 1/5 of the data). In this case, cross-validation proceeds as follows:

The software trains the first model (stored in

`CVMdl.Trained{1}`

) by using the observations in the last four groups, and reserves the observations in the first group for validation.The software trains the second model (stored in

`CVMdl.Trained{2}`

) by using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation.The software proceeds in a similar manner for the third, fourth, and fifth models.

If you validate by using `kfoldPredict`

, the software computes
predictions for the observations in group *i* by using model
*i*. In short, the software estimates a response for every
observation by using the model trained without that observation.

**Note**

`ClassificationPartitionedLinear`

model objects do not store the
predictor data set.

## Creation

You can create a `ClassificationPartitionedLinear`

object by using the
`fitclinear`

function and specifying one of
the name-value arguments `CrossVal`

,
`CVPartition`

, `Holdout`

,
`KFold`

, or `Leaveout`

.

## Properties

### Cross-Validation Properties

`CrossValidatedModel`

— Cross-validated model name

character vector

Cross-validated model name, specified as a character vector.

For example, `'Linear'`

specifies a cross-validated
linear model for binary classification or regression.

**Data Types: **`char`

`KFold`

— Number of cross-validated folds

positive integer

Number of cross-validated folds, specified as a positive integer.

**Data Types: **`double`

`ModelParameters`

— Cross-validation parameter values

object

Cross-validation parameter values, e.g., the name-value pair
argument values used to cross-validate the linear model, specified
as an object. `ModelParameters`

does not contain
estimated parameters.

Access properties of `ModelParameters`

using
dot notation.

`NumObservations`

— Number of observations

positive numeric scalar

Number of observations in the training data, specified as a positive numeric scalar.

**Data Types: **`double`

`Partition`

— Data partition

`cvpartition`

model

Data partition indicating how the software splits the data into cross-validation folds,
specified as a `cvpartition`

model.

`Trained`

— Linear classification models trained on cross-validation folds

cell array of `ClassificationLinear`

model objects

Linear classification models trained on cross-validation folds,
specified as a cell array of `ClassificationLinear`

models.
`Trained`

has *k* cells, where
*k* is the number of folds.

**Data Types: **`cell`

`W`

— Observation weights

numeric vector

Observation weights used to cross-validate the model, specified as a
numeric vector. `W`

has
`NumObservations`

elements.

The software normalizes `W`

so that the weights for
observations within a particular class sum up to the prior probability
of that class.

**Data Types: **`single`

| `double`

`Y`

— Observed class labels

categorical array | character array | logical vector | vector of numeric values | cell array of character vectors

Observed class labels used to cross-validate the model, specified as a
categorical or character array, logical or numeric vector, or cell array
of character vectors. `Y`

has
`NumObservations`

elements, and is the same data
type as the input argument `Y`

that you passed to
`fitclinear`

to
cross-validate the model. (The software treats string arrays as cell arrays of character
vectors.)

Each row of `Y`

represents the observed
classification of the corresponding observation in the predictor
data.

**Data Types: **`categorical`

| `char`

| `logical`

| `single`

| `double`

| `cell`

### Other Classification Properties

`CategoricalPredictors`

— Categorical predictor indices

vector of positive integers | `[]`

Categorical predictor
indices, specified as a vector of positive integers. `CategoricalPredictors`

contains index values indicating that the corresponding predictors are categorical. The index
values are between 1 and `p`

, where `p`

is the number of
predictors used to train the model. If none of the predictors are categorical, then this
property is empty (`[]`

).

**Data Types: **`single`

| `double`

`ClassNames`

— Unique class labels

categorical array | character array | logical vector | numeric vector | cell array of character vectors

This property is read-only.

Unique class labels used in training, specified as a categorical or
character array, logical or numeric vector, or cell array of
character vectors. `ClassNames`

has the same
data type as the class labels `Y`

.
(The software treats string arrays as cell arrays of character
vectors.)
`ClassNames`

also determines the class
order.

**Data Types: **`categorical`

| `char`

| `logical`

| `single`

| `double`

| `cell`

`Cost`

— Misclassification costs

square numeric matrix

This property is read-only.

Misclassification costs, specified as a square numeric matrix. `Cost`

has *K* rows
and columns, where *K* is the number of classes.

`Cost(`

is
the cost of classifying a point into class * i*,

*)*

`j`

*if its true class is*

`j`

*. The order of the rows and columns of*

`i`

`Cost`

corresponds to the order of
the classes in `ClassNames`

.**Data Types: **`double`

`PredictorNames`

— Predictor names

cell array of character vectors

Predictor names in order of their appearance in the predictor data, specified as a
cell array of character vectors. The length of `PredictorNames`

is
equal to the number of variables in the training data `X`

or
`Tbl`

used as predictor variables.

**Data Types: **`cell`

`Prior`

— Prior class probabilities

numeric vector

This property is read-only.

Prior class probabilities, specified as a numeric vector.
`Prior`

has as many elements as
classes in `ClassNames`

, and the order of the
elements corresponds to the elements of
`ClassNames`

.

**Data Types: **`double`

`ResponseName`

— Response variable name

character vector

Response variable name, specified as a character vector.

**Data Types: **`char`

`ScoreTransform`

— Score transformation function

`'doublelogit'`

| `'invlogit'`

| `'ismax'`

| `'logit'`

| `'none'`

| function handle | ...

Score transformation function to apply to predicted scores, specified as a function name or function handle.

For linear classification models and before transformation, the predicted
classification score for the observation *x* (row vector) is *f*(*x*) =
*x**β* + *b*, where *β* and *b* correspond to
`Mdl.Beta`

and `Mdl.Bias`

, respectively.

To change the score transformation function to, for example,
* function*, use dot notation.

For a built-in function, enter this code and replace

with a value in the table.`function`

Mdl.ScoreTransform = '

*function*';Value Description `"doublelogit"`

1/(1 + *e*^{–2x})`"invlogit"`

log( *x*/ (1 –*x*))`"ismax"`

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0 `"logit"`

1/(1 + *e*^{–x})`"none"`

or`"identity"`

*x*(no transformation)`"sign"`

–1 for *x*< 0

0 for*x*= 0

1 for*x*> 0`"symmetric"`

2 *x*– 1`"symmetricismax"`

Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1 `"symmetriclogit"`

2/(1 + *e*^{–x}) – 1For a MATLAB

^{®}function, or a function that you define, enter its function handle.Mdl.ScoreTransform = @

*function*;must accept a matrix of the original scores for each class, and then return a matrix of the same size representing the transformed scores for each class.`function`

**Data Types: **`char`

| `function_handle`

## Object Functions

`kfoldEdge` | Classification edge for cross-validated linear classification model |

`kfoldLoss` | Classification loss for cross-validated linear classification model |

`kfoldMargin` | Classification margins for cross-validated linear classification model |

`kfoldPredict` | Classify observations in cross-validated linear classification model |

## Examples

### Create Cross-Validated Binary Linear Classification Model

Load the NLP data set.

`load nlpdata`

`X`

is a sparse matrix of predictor data, and `Y`

is a categorical vector of class labels. There are more than two classes in the data.

Identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages.

`Ystats = Y == 'stats';`

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation.

rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on')

CVMdl = ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 10 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none'

`CVMdl`

is a `ClassificationPartitionedLinear`

cross-validated model. Because `fitclinear`

implements 10-fold cross-validation by default, `CVMdl.Trained`

contains ten `ClassificationLinear`

models that contain the results of training linear classification models for each of the folds.

Estimate labels for out-of-fold observations and estimate the generalization error by passing `CVMdl`

to `kfoldPredict`

and `kfoldLoss`

, respectively.

oofLabels = kfoldPredict(CVMdl); ge = kfoldLoss(CVMdl)

ge = 7.6017e-04

The estimated generalization error is less than 0.1% misclassified observations.

### Find Good Lasso Penalty Using Cross-Validation

To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, implement 5-fold cross-validation.

Load the NLP data set.

`load nlpdata`

`X`

is a sparse matrix of predictor data, and `Y`

is a categorical vector of class labels. There are more than two classes in the data.

The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages.

`Ystats = Y == 'stats';`

Create a set of 11 logarithmically-spaced regularization strengths from $$1{0}^{-6}$$ through $$1{0}^{-0.5}$$.

Lambda = logspace(-6,-0.5,11);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Estimate the coefficients using SpaRSA. Lower the tolerance on the gradient of the objective function to `1e-8`

.

X = X'; rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns','KFold',5,... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8)

CVMdl = ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none'

numCLModels = numel(CVMdl.Trained)

numCLModels = 5

`CVMdl`

is a `ClassificationPartitionedLinear`

model. Because `fitclinear`

implements 5-fold cross-validation, `CVMdl`

contains 5 `ClassificationLinear`

models that the software trains on each fold.

Display the first trained linear classification model.

Mdl1 = CVMdl.Trained{1}

Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023x11 double] Bias: [-13.2936 -13.2936 -13.2936 -13.2936 -13.2936 -6.8954 -5.4359 -4.7170 -3.4108 -3.1566 -2.9792] Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0071 0.0251 0.0891 0.3162] Learner: 'logistic'

`Mdl1`

is a `ClassificationLinear`

model object. `fitclinear`

constructed `Mdl1`

by training on the first four folds. Because `Lambda`

is a sequence of regularization strengths, you can think of `Mdl1`

as 11 models, one for each regularization strength in `Lambda`

.

Estimate the cross-validated classification error.

ce = kfoldLoss(CVMdl);

Because there are 11 regularization strengths, `ce`

is a 1-by-11 vector of classification error rates.

Higher values of `Lambda`

lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.

Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off

Choose the index of the regularization strength that balances predictor variable sparsity and low classification error. In this case, a value between $$1{0}^{-4}$$ to $$1{0}^{-1}$$ should suffice.

idxFinal = 7;

Select the model from `Mdl`

with the chosen regularization strength.

MdlFinal = selectModels(Mdl,idxFinal);

`MdlFinal`

is a `ClassificationLinear`

model containing one regularization strength. To estimate labels for new observations, pass `MdlFinal`

and the new data to `predict`

.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The object functions of the `ClassificationPartitionedLinear`

model fully support GPU arrays.

## Version History

**Introduced in R2016a**

### R2024a: Specify GPU arrays for `ClassificationPartitionedLinear`

object
functions (requires Parallel Computing Toolbox)

You can fit a `ClassificationPartitionedLinear`

object with GPU arrays by using
`fitclinear`

.
`ClassificationPartitionedLinear`

object functions now support GPU array input
arguments so that they can execute on a GPU.

### R2022a: `Cost`

property stores the user-specified cost matrix

Starting in R2022a, the `Cost`

property stores the user-specified cost
matrix, so that you can compute the observed misclassification cost using the specified cost
value. The software stores normalized prior probabilities (`Prior`

)
and observation weights (`W`

) that do not reflect the penalties described
in the cost matrix. To compute the observed misclassification cost, specify the
`LossFun`

name-value argument as `"classifcost"`

when you call the `kfoldLoss`

function.

Note that model training has not changed and, therefore, the decision boundaries between classes have not changed.

For training, the fitting function updates the specified prior probabilities by
incorporating the penalties described in the specified cost matrix, and then normalizes the
prior probabilities and observation weights. This behavior has not changed. In previous
releases, the software stored the default cost matrix in the `Cost`

property and stored the prior probabilities and observation weights used for training in the
`Prior`

and `W`

properties, respectively. Starting
in R2022a, the software stores the user-specified cost matrix without modification, and stores normalized
prior probabilities and observation weights that do not reflect the cost penalties. For more
details, see Misclassification Cost Matrix, Prior Probabilities, and Observation Weights.

Some object functions use the `Cost`

and `W`

properties:

The

`kfoldLoss`

function uses the cost matrix stored in the`Cost`

property if you specify the`LossFun`

name-value argument as`"classifcost"`

or`"mincost"`

.The

`kfoldLoss`

and`kfoldEdge`

functions use the observation weights stored in the`W`

property.

If you specify a nondefault cost matrix when you train a classification model, the object functions return a different value compared to previous releases.

If you want the software to handle the cost matrix, prior
probabilities, and observation weights in the same way as in previous releases, adjust the prior
probabilities and observation weights for the nondefault cost matrix, as described in Adjust Prior Probabilities and Observation Weights for Misclassification Cost Matrix. Then, when you train a
classification model, specify the adjusted prior probabilities and observation weights by using
the `Prior`

and `Weights`

name-value arguments, respectively,
and use the default cost matrix.

## See Also

`ClassificationLinear`

| `fitclinear`

| `kfoldPredict`

| `kfoldLoss`

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

Americas

- América Latina (Español)
- Canada (English)
- United States (English)

Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)

Asia Pacific

- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)