# kfoldPredict

Predict responses for observations not used for training

## Description

returns cross-validated predicted responses by the cross-validated linear regression
model `YHat`

= kfoldPredict(`CVMdl`

)`CVMdl`

. For every fold, `kfoldPredict`

predicts the responses for validation-fold observations using a model trained on
training-fold observations.

`YHat`

contains predicted responses for each
regularization strength in the linear regression models that compose `CVMdl`

.

uses the `YHat`

= kfoldPredict(`CVMdl`

,PredictionForMissingValue=`prediction`

)`prediction`

value as the predicted response for
observations with missing values in the predictor data. By default,
`kfoldPredict`

uses the median of the observed response
values in the training-fold data.* (since R2023b)*

## Input Arguments

`CVMdl`

— Cross-validated, linear regression model

`RegressionPartitionedLinear`

model object

Cross-validated, linear regression model, specified as a `RegressionPartitionedLinear`

model object. You can create a
`RegressionPartitionedLinear`

model using `fitrlinear`

and specifying any of the one of the cross-validation,
name-value pair arguments, for example, `CrossVal`

.

To obtain estimates, kfoldPredict applies the same data used to cross-validate the linear
regression model (`X`

and `Y`

).

`prediction`

— Predicted response value to use for observations with missing predictor values

`"median"`

(default) | `"mean"`

| numeric scalar

*Since R2023b*

Predicted response value to use for observations with missing predictor
values, specified as `"median"`

, `"mean"`

,
or a numeric scalar.

Value | Description |
---|---|

`"median"` | `kfoldPredict` uses the median of the
observed response values in the training-fold data as the
predicted response value for observations with missing
predictor values. |

`"mean"` | `kfoldPredict` uses the mean of the
observed response values in the training-fold data as the
predicted response value for observations with missing
predictor values. |

Numeric scalar | `kfoldPredict` uses this value as the
predicted response value for observations with missing
predictor values. |

**Example: **`"mean"`

**Example: **`NaN`

**Data Types: **`single`

| `double`

| `char`

| `string`

## Output Arguments

`YHat`

— Cross-validated predicted responses

numeric array

Cross-validated predicted responses, returned as an
*n*-by-*L* numeric array.
*n* is the number of observations in the predictor data
that created `CVMdl`

(see `X`

) and
*L* is the number of regularization strengths in
`CVMdl.Trained{1}.Lambda`

.
`YHat(`

is the predicted response for observation * i*,

*)*

`j`

*using the linear regression model that has regularization strength*

`i`

`CVMdl.Trained{1}.Lambda(``j`

)

.The predicted response using the model with regularization strength *j* is $${\widehat{y}}_{j}=x{\beta}_{j}+{b}_{j}.$$

*x*is an observation from the predictor data matrix`X`

, and is row vector.$${\beta}_{j}$$ is the estimated column vector of coefficients. The software stores this vector in

`Mdl.Beta(:,`

.)`j`

$${b}_{j}$$ is the estimated, scalar bias, which the software stores in

`Mdl.Bias(`

.)`j`

## Examples

### Predict Cross-Validated Responses

Simulate 10000 observations from this model

$$y={x}_{100}+2{x}_{200}+e.$$

$$X={x}_{1},...,{x}_{1000}$$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

*e*is random normal error with mean 0 and standard deviation 0.3.

```
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
```

Cross-validate a linear regression model.

CVMdl = fitrlinear(X,Y,'CrossVal','on')

CVMdl = RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1x1 cvpartition] ResponseTransform: 'none'

Mdl1 = CVMdl.Trained{1}

Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: 0.0107 Lambda: 1.1111e-04 Learner: 'svm'

By default, `fitrlinear`

implements 10-fold cross-validation. `CVMdl`

is a `RegressionPartitionedLinear`

model. It contains the property `Trained`

, which is a 10-by-1 cell array holding 10 `RegressionLinear`

models that the software trained using the training set.

Predict responses for observations that `fitrlinear`

did not use in training the folds.

yHat = kfoldPredict(CVMdl);

Because there is one regularization strength in `Mdl`

, `yHat`

is a numeric vector.

### Predict for Models Containing Several Regularization Strengths

Simulate 10000 observations as in Predict Cross-Validated Responses.

```
rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);
```

Create a set of 15 logarithmically-spaced regularization strengths from $$1{0}^{-5}$$ through $$1{0}^{-1}$$.

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.

X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');

`CVMdl`

is a `RegressionPartitionedLinear`

model. Its `Trained`

property contains a 5-by-1 cell array of trained `RegressionLinear`

models, each one holds out a different fold during training. Because `fitrlinear`

trained using 15 regularization strengths, you can think of each `RegressionLinear`

model as 15 models.

Predict cross-validated responses.

YHat = kfoldPredict(CVMdl); size(YHat)

`ans = `*1×2*
10000 15

YHat(2,:)

`ans = `*1×15*
-1.7338 -1.7332 -1.7319 -1.7299 -1.7266 -1.7239 -1.7135 -1.7210 -1.7324 -1.7063 -1.6397 -1.5112 -1.2631 -0.7841 -0.0096

`YHat`

is a 10000-by-15 matrix. `YHat(2,:)`

is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced in R2016a**

### R2024a: Specify GPU arrays (requires Parallel Computing Toolbox)

`kfoldPredict`

fully supports GPU arrays.

### R2023b: Specify predicted response value to use for observations with missing predictor values

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the `PredictionForMissingValue`

name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support the
`PredictionForMissingValue`

name-value argument. By default, the
functions use the training set median as the predicted response value for observations with
missing predictor values.

Model Type | Model Objects | Object Functions |
---|---|---|

Gaussian process regression (GPR) model | `RegressionGP` , `CompactRegressionGP` | `loss` , `predict` , `resubLoss` , `resubPredict` |

`RegressionPartitionedGP` | `kfoldLoss` , `kfoldPredict` | |

Gaussian kernel regression model | `RegressionKernel` | `loss` , `predict` |

`RegressionPartitionedKernel` | `kfoldLoss` , `kfoldPredict` | |

Linear regression model | `RegressionLinear` | `loss` , `predict` |

`RegressionPartitionedLinear` | `kfoldLoss` , `kfoldPredict` | |

Neural network regression model | `RegressionNeuralNetwork` , `CompactRegressionNeuralNetwork` | `loss` , `predict` , `resubLoss` , `resubPredict` |

`RegressionPartitionedNeuralNetwork` | `kfoldLoss` , `kfoldPredict` | |

Support vector machine (SVM) regression model | `RegressionSVM` , `CompactRegressionSVM` | `loss` , `predict` , `resubLoss` , `resubPredict` |

`RegressionPartitionedSVM` | `kfoldLoss` , `kfoldPredict` |

In previous releases, the regression model `loss`

and `predict`

functions listed above used `NaN`

predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

## See Also

`RegressionPartitionedLinear`

| `predict`

| `RegressionLinear`

| `fitrlinear`

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)