# FeatureSelectionNCARegression class

Feature selection for regression using neighborhood component analysis (NCA)

## Description

`FeatureSelectionNCARegression`

contains the data, fitting
information, feature weights, and other model parameters of a neighborhood
component analysis (NCA) model. `fsrnca`

learns the feature
weights using a diagonal adaptation of NCA and returns an instance of
`FeatureSelectionNCARegression`

object. The
function achieves feature selection by regularizing the feature weights.

## Construction

Create a `FeatureSelectionNCAClassification`

object using
`fsrnca`

.

## Properties

`NumObservations`

— Number of observations in the training data

scalar

Number of observations in the training data (`X`

and `Y`

)
after removing `NaN`

or `Inf`

values,
stored as a scalar.

**Data Types: **`double`

`ModelParameters`

— Model parameters

structure

Model parameters used for training the model, stored as a structure.

You can access the fields of `ModelParameters`

using
dot notation.

For example, for a FeatureSelectionNCARegression object named `mdl`

,
you can access the `LossFunction`

value using `mdl.ModelParameters.LossFunction`

.

**Data Types: **`struct`

`Lambda`

— Regularization parameter

scalar

Regularization parameter used for training this model, stored
as a scalar. For *n* observations, the best `Lambda`

value
that minimizes the generalization error of the NCA model is expected
to be a multiple of 1/*n*.

**Data Types: **`double`

`FitMethod`

— Name of the fitting method used to fit this model

`'exact'`

| `'none'`

| `'average'`

Name of the fitting method used to fit this model, stored as one of the following:

`'exact'`

— Perform fitting using all of the data.`'none'`

— No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call to`fsrnca`

.`'average'`

— The software divides the data into partitions (subsets), fits each partition using the`exact`

method, and returns the average of the feature weights. You can specify the number of partitions using the`NumPartitions`

name-value pair argument.

`Solver`

— Name of the solver used to fit this model

`'lbfgs'`

| `'sgd'`

| `'minibatch-lbfgs'`

Name of the solver used to fit this model, stored as one of the following:

`'lbfgs'`

— Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm`'sgd'`

— Stochastic gradient descent (SGD) algorithm`'minibatch-lbfgs'`

— stochastic gradient descent with LBFGS algorithm applied to mini-batches

`GradientTolerance`

— Relative convergence tolerance on gradient norm

positive scalar

Relative convergence tolerance on the gradient norm for the `'lbfgs'`

and `'minibatch-lbfgs'`

solvers,
stored as a positive scalar value.

**Data Types: **`double`

`IterationLimit`

— Maximum number of iterations for optimization

positive integer

Maximum number of iterations for optimization, stored as a positive integer value.

**Data Types: **`double`

`PassLimit`

— Maximum number of passes

positive integer

Maximum number of passes for `'sgd'`

and `'minibatch-lbfgs'`

solvers. Every
pass processes all of the observations in the data.

**Data Types: **`double`

`InitialLearningRate`

— Initial learning rate

positive real scalar

Initial learning rate for `'sgd'`

and
`'minibatch-lbfgs'`

solvers.
The
learning rate decays over iterations starting at the
value specified for
`InitialLearningRate`

.

Use the `NumTuningIterations`

and
`TuningSubsetSize`

to control
the automatic tuning of initial learning rate in the
call to `fsrnca`

.

**Data Types: **`double`

`Verbose`

— Verbosity level indicator

nonnegative integer

Verbosity level indicator, stored as a nonnegative integer. Possible values are:

0 — No convergence summary

1 — Convergence summary, including norm of gradient and objective function value

>1 — More convergence information, depending on the fitting algorithm. When you use the

`'minibatch-lbfgs'`

solver and verbosity level > 1, the convergence information includes the iteration log from intermediate mini-batch LBFGS fits.

**Data Types: **`double`

`InitialFeatureWeights`

— Initial feature weights

*p*-by-1 vector of positive real scalars

Initial feature weights, stored as a *p*-by-1
vector of positive real scalars, where *p* is the
number of predictors in `X`

.

**Data Types: **`double`

`FeatureWeights`

— Feature weights

*p*-by-1 vector of real scalar
values

Feature weights, stored as a *p*-by-1
vector of real scalar values, where
*p* is the number of predictors
in `X`

.

For `'FitMethod'`

equal to
`'average'`

,
`FeatureWeights`

is a
*p*-by-*m*
matrix, where *m* is the number of
partitions specified via the
`'NumPartitions'`

name-value pair
argument in the call to
`fsrnca`

.

The absolute value of
`FeatureWeights(k)`

is a measure
of the importance of predictor `k`

.
If `FeatureWeights(k)`

is close to
0, then this indicates that predictor
`k`

does not influence the
response in `Y`

.

**Data Types: **`double`

`FitInfo`

— Fit information

structure

Fit information, stored as a structure with the following fields.

Field Name | Meaning |
---|---|

`Iteration` | Iteration index |

`Objective` | Regularized objective function for minimization |

`UnregularizedObjective` | Unregularized objective function for minimization |

`Gradient` | Gradient of regularized objective function for minimization |

For classification,

`UnregularizedObjective`

represents the negative of the leave-one-out accuracy of the NCA classifier on the training data.For regression,

`UnregularizedObjective`

represents the leave-one-out loss between the true response and the predicted response when using the NCA regression model.For the

`'lbfgs'`

solver,`Gradient`

is the final gradient. For the`'sgd'`

and`'minibatch-lbfgs'`

solvers,`Gradient`

is the final mini-batch gradient.If

`FitMethod`

is`'average'`

, then`FitInfo`

is an*m*-by-1 structure array, where*m*is the number of partitions specified via the`'NumPartitions'`

name-value pair argument.

You can access the fields of `FitInfo`

using
dot notation. For example, for a FeatureSelectionNCARegressionobject named `mdl`

,
you can access the `Objective`

field using `mdl.FitInfo.Objective`

.

**Data Types: **`struct`

`Mu`

— Predictor means

*p*-by-1 vector | `[]`

Predictor means, stored as a *p*-by-1 vector
for standardized training data. In this case, the `predict`

method
centers predictor matrix `X`

by subtracting the
respective element of `Mu`

from every column.

If data is not standardized during training, then `Mu`

is
empty.

**Data Types: **`double`

`Sigma`

— Predictor standard deviations

*p*-by-1 vector | `[]`

Predictor standard deviations, stored as a *p*-by-1
vector for standardized training data. In this case, the `predict`

method
scales predictor matrix `X`

by dividing every column
by the respective element of `Sigma`

after centering
the data using `Mu`

.

If data is not standardized during training, then `Sigma`

is
empty.

**Data Types: **`double`

`X`

— Predictor values

*n*-by-*p* matrix

Predictor values used to train this model, stored as an *n*-by-*p* matrix. *n* is
the number of observations and *p* is the number
of predictor variables in the training data.

**Data Types: **`double`

`Y`

— Response values

numeric vector of size *n*

Response values used to train this model, stored as a numeric
vector of size *n*, where n is the number of observations.

**Data Types: **`double`

`W`

— Observation weights

numeric vector of size *n*

Observation weights used to train this model, stored as a numeric
vector of size *n*. The sum of observation weights
is *n*.

**Data Types: **`double`

## Methods

loss | Evaluate accuracy of learned feature weights on test data |

predict | Predict responses using neighborhood component analysis (NCA) regression model |

refit | Refit neighborhood component analysis (NCA) model for regression |

## Examples

### Explore `FeatureSelectionNCARegression`

Object

Load the sample data.

`load imports-85`

The first 15 columns contain the continuous predictor variables, whereas the 16th column contains the response variable, which is the price of a car. Define the variables for the neighborhood component analysis model.

Predictors = X(:,1:15); Y = X(:,16);

Fit a neighborhood component analysis (NCA) model for regression to detect the relevant features.

mdl = fsrnca(Predictors,Y);

The returned NCA model, `mdl`

, is a `FeatureSelectionNCARegression`

object. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.

Plot the feature weights.

figure() plot(mdl.FeatureWeights,'ro') xlabel('Feature Index') ylabel('Feature Weight') grid on

The weights of the irrelevant features are zero. The `'Verbose',1`

option in the call to `fsrnca`

displays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.

figure() plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,'ro-') grid on xlabel('Iteration Number') ylabel('Objective')

The `ModelParameters`

property is a `struct`

that contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.

mdl.ModelParameters.Standardize

`ans = `*logical*
0

`0`

means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the `'Standardize',1`

name-value pair argument in the call to `fsrnca`

.

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

## Version History

**Introduced in R2016b**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)