# predict

Predict labels using *k*-nearest neighbor classification
model

## Description

returns a vector of predicted class labels for the predictor data in the table or
matrix `label`

= predict(`mdl`

,`X`

)`X`

, based on the trained *k*-nearest
neighbor classification model `mdl`

. See Predicted Class Label.

`[`

also returns:`label`

,`score`

,`cost`

]
= predict(`mdl`

,`X`

)

A matrix of classification scores (

`score`

) indicating the likelihood that a label comes from a particular class. For*k*-nearest neighbor, scores are posterior probabilities. See Posterior Probability.A matrix of expected classification cost (

`cost`

). For each observation in`X`

, the predicted class label corresponds to the minimum expected classification costs among all classes. See Expected Cost.

## Examples

*k*-Nearest Neighbor Classification Predictions

Create a *k*-nearest neighbor classifier for Fisher's iris data, where *k* = 5. Evaluate some model predictions on new data.

Load the Fisher iris data set.

```
load fisheriris
X = meas;
Y = species;
```

Create a classifier for five nearest neighbors. Standardize the noncategorical predictor data.

mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1);

Predict the classifications for flowers with minimum, mean, and maximum characteristics.

Xnew = [min(X);mean(X);max(X)]; [label,score,cost] = predict(mdl,Xnew)

`label = `*3x1 cell*
{'versicolor'}
{'versicolor'}
{'virginica' }

`score = `*3×3*
0.4000 0.6000 0
0 1.0000 0
0 0 1.0000

`cost = `*3×3*
0.6000 0.4000 1.0000
1.0000 0 1.0000
1.0000 1.0000 0

The second and third rows of the score and cost matrices have binary values, which means all five nearest neighbors of the mean and maximum flower measurements have identical classifications.

## Input Arguments

`mdl`

— *k*-nearest neighbor classifier model

`ClassificationKNN`

object

*k*-nearest neighbor classifier model, specified as a
`ClassificationKNN`

object.

`X`

— Predictor data to be classified

numeric matrix | table

Predictor data to be classified, specified as a numeric matrix or table.

Each row of `X`

corresponds to one observation, and
each column corresponds to one variable.

For a numeric matrix:

The variables that make up the columns of

`X`

must have the same order as the predictor variables used to train`mdl`

.If you train

`mdl`

using a table (for example,`Tbl`

), then`X`

can be a numeric matrix if`Tbl`

contains all numeric predictor variables.*k*-nearest neighbor classification requires homogeneous predictors. Therefore, to treat all numeric predictors in`Tbl`

as categorical during training, set`'CategoricalPredictors','all'`

when you train using`fitcknn`

. If`Tbl`

contains heterogeneous predictors (for example, numeric and categorical data types) and`X`

is a numeric matrix, then`predict`

throws an error.

For a table:

`predict`

does not support multicolumn variables and cell arrays other than cell arrays of character vectors.If you train

`mdl`

using a table (for example,`Tbl`

), then all predictor variables in`X`

must have the same variable names and data types as those used to train`mdl`

(stored in`mdl.PredictorNames`

). However, the column order of`X`

does not need to correspond to the column order of`Tbl`

. Both`Tbl`

and`X`

can contain additional variables (response variables, observation weights, and so on), but`predict`

ignores them.If you train

`mdl`

using a numeric matrix, then the predictor names in`mdl.PredictorNames`

and corresponding predictor variable names in`X`

must be the same. To specify predictor names during training, see the`PredictorNames`

name-value pair argument of`fitcknn`

. All predictor variables in`X`

must be numeric vectors.`X`

can contain additional variables (response variables, observation weights, and so on), but`predict`

ignores them.

If you set `'Standardize',true`

in
`fitcknn`

to train `mdl`

, then the
software standardizes the columns of `X`

using the
corresponding means in `mdl.Mu`

and standard deviations in
`mdl.Sigma`

.

**Data Types: **`double`

| `single`

| `table`

## Output Arguments

`label`

— Predicted class labels

categorical array | character array | logical vector | vector of numeric values | cell array of character vectors

Predicted class labels for the observations (rows) in
`X`

, returned as a categorical array, character
array, logical vector, vector of numeric values, or cell array of character
vectors. `label`

has length equal to the number of rows
in `X`

. The label is the class with minimal expected
cost. See Predicted Class Label.

`score`

— Predicted class scores or posterior probabilities

numeric matrix

Predicted class scores or posterior probabilities, returned as a numeric
matrix of size *n*-by-*K*.
*n* is the number of observations (rows) in
`X`

, and *K* is the number of
classes (in `mdl.ClassNames`

).
`score(i,j)`

is the posterior probability that
observation `i`

in `X`

is of class
`j`

in `mdl.ClassNames`

. See Posterior Probability.

**Data Types: **`single`

| `double`

`cost`

— Expected classification costs

numeric matrix

Expected classification costs, returned as a numeric matrix of size
*n*-by-*K*. *n* is
the number of observations (rows) in `X`

, and
*K* is the number of classes (in
`mdl.ClassNames`

). `cost(i,j)`

is the
cost of classifying row `i`

of `X`

as
class `j`

in `mdl.ClassNames`

. See Expected Cost.

**Data Types: **`single`

| `double`

## Algorithms

### Predicted Class Label

`predict`

classifies by minimizing the expected
misclassification cost:

$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{j=1}^{K}\widehat{P}\left(j|x\right)C\left(y|j\right)},$$

where:

$$\widehat{y}$$ is the predicted classification.

*K*is the number of classes.$$\widehat{P}\left(j|x\right)$$ is the posterior probability of class

*j*for observation*x*.$$C\left(y|j\right)$$ is the cost of classifying an observation as

*y*when its true class is*j*.

### Posterior Probability

Consider a vector (single query point) `xnew`

and a model
`mdl`

.

*k*is the number of nearest neighbors used in prediction,`mdl.NumNeighbors`

.`nbd(mdl,xnew)`

specifies the*k*nearest neighbors to`xnew`

in`mdl.X`

.`Y(nbd)`

specifies the classifications of the points in`nbd(mdl,xnew)`

, namely`mdl.Y(nbd)`

.`W(nbd)`

specifies the weights of the points in`nbd(mdl,xnew)`

.`prior`

specifies the priors of the classes in`mdl.Y`

.

If the model contains a vector of prior probabilities, then the observation weights
`W`

are normalized by class to sum to the priors.
This process might involve a calculation for the point `xnew`

,
because weights can depend on the distance from `xnew`

to the
points in `mdl.X`

.

The posterior probability *p*(*j*|`xnew`

)
is

$$p\left(j|x\text{new}\right)=\frac{{\displaystyle \sum _{i\in \text{nbd}}W(i){1}_{Y(X(i))=j}}}{{\displaystyle \sum _{i\in \text{nbd}}W(i)}}.$$

Here, $${1}_{Y(X(i))=j}$$ is `1`

when
`mdl.Y(i) = j`

, and
`0`

otherwise.

### True Misclassification Cost

Two costs are associated with KNN classification: the true misclassification cost per class and the expected misclassification cost per observation.

You can set the true misclassification cost per class by using the `'Cost'`

name-value pair argument when you run `fitcknn`

. The value `Cost(i,j)`

is the cost of classifying
an observation into class `j`

if its true class is `i`

. By
default, `Cost(i,j) = 1`

if `i ~= j`

, and
`Cost(i,j) = 0`

if `i = j`

. In other words, the cost
is `0`

for correct classification and `1`

for incorrect
classification.

### Expected Cost

Two costs are associated with KNN classification: the true misclassification cost per class
and the expected misclassification cost per observation. The third output of `predict`

is the expected misclassification cost per
observation.

Suppose you have `Nobs`

observations that you want to classify with a trained
classifier `mdl`

, and you have `K`

classes. You place the
observations into a matrix `Xnew`

with one observation per row. The
command

[label,score,cost] = predict(mdl,Xnew)

returns a matrix `cost`

of size
`Nobs`

-by-`K`

, among other outputs. Each row of the
`cost`

matrix contains the expected (average) cost of classifying the
observation into each of the `K`

classes. `cost(n,j)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(j|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|X(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(j|i\right)$$ is the true misclassification cost of classifying an observation as

*j*when its true class is*i*.

## Extended Capabilities

### Tall Arrays

Calculate with arrays that have more rows than fit in memory.

This function fully supports tall arrays. For more information, see Tall Arrays.

### C/C++ Code Generation

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Use

`saveLearnerForCoder`

,`loadLearnerForCoder`

, and`codegen`

(MATLAB Coder) to generate code for the`predict`

function. Save a trained model by using`saveLearnerForCoder`

. Define an entry-point function that loads the saved model by using`loadLearnerForCoder`

and calls the`predict`

function. Then use`codegen`

to generate code for the entry-point function.To generate single-precision C/C++ code for

`predict`

, specify the name-value argument`"DataType","single"`

when you call the`loadLearnerForCoder`

function.This table contains notes about the arguments of

`predict`

. Arguments not included in this table are fully supported.Argument Notes and Limitations `mdl`

A

`ClassificationKNN`

model object is a full object that does not have a corresponding compact object. For this model,`saveLearnerForCoder`

saves a compact version that does not include the hyperparameter optimization properties.If

`mdl`

is a model trained using the*k*d-tree search algorithm, and the code generation build type is a MEX function, then`codegen`

(MATLAB Coder) generates a MEX function using Intel^{®}Threading Building Blocks (TBB) for parallel computation. Otherwise,`codegen`

generates code using`parfor`

(MATLAB Coder).MEX function for the

*k*d-tree search algorithm —`codegen`

generates an optimized MEX function using Intel TBB for parallel computation on multicore platforms. You can use the MEX function to accelerate MATLAB^{®}algorithms. For details on Intel TBB, see https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onetbb.html.If you generate the MEX function to test the generated code of the

`parfor`

version, you can disable the usage of Intel TBB. Set the`ExtrinsicCalls`

property of the MEX configuration object to`false`

. For details, see`coder.MexCodeConfig`

(MATLAB Coder).MEX function for the exhaustive search algorithm and standalone C/C++ code for both algorithms — The generated code of

`predict`

uses`parfor`

(MATLAB Coder) to create loops that run in parallel on supported shared-memory multicore platforms in the generated code. If your compiler does not support the Open Multiprocessing (OpenMP) application interface or you disable OpenMP library, MATLAB Coder™ treats the`parfor`

-loops as`for`

-loops. To find supported compilers, see Supported Compilers. To disable OpenMP library, set the`EnableOpenMP`

property of the configuration object to`false`

. For details, see`coder.CodeConfig`

(MATLAB Coder).

For the usage notes and limitations of the model object, see Code Generation of the

`ClassificationKNN`

object.

`X`

`X`

must be a single-precision or double-precision matrix or a table containing numeric variables, categorical variables, or both.The number of rows, or observations, in

`X`

can be a variable size, but the number of columns in`X`

must be fixed.If you want to specify

`X`

as a table, then your model must be trained using a table, and your entry-point function for prediction must do the following:Accept data as arrays.

Create a table from the data input arguments and specify the variable names in the table.

Pass the table to

`predict`

.

For an example of this table workflow, see Generate Code to Classify Data in Table. For more information on using tables in code generation, see Code Generation for Tables (MATLAB Coder) and Table Limitations for Code Generation (MATLAB Coder).

For more information, see Introduction to Code Generation.

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

`predict`

does not support GPU arrays for`ClassificationKNN`

models with the following specifications:The

`'NSMethod'`

property is specified as`'kdtree'`

.The

`'Distance'`

property is specified as a function handle.The

`'IncludeTies'`

property is specified as`true`

.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced in R2012a**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)