templateKernel

Kernel learner template

Syntax

t = templateKernel

t = templateKernel(Name,Value)

Description

t = templateKernel returns a kernel learner template suitable for training a Gaussian kernel model for nonlinear classification or regression.

example

t = templateKernel(Name,Value) returns a template with additional options specified by one or more name-value arguments.

For example, you can specify the learner or the number of dimensions of the expanded space.

If you specify the type of model by using the Type name-value argument, then the display of t in the Command Window shows all options as empty ([]), except those that you specify using name-value arguments. If you do not specify the type of model, then the display suppresses the empty options. During training, the software uses default values for empty options.

example

Examples

collapse all

Create Default Kernel Model Template

Open Live Script

Create a default kernel model template and use it to train an error-correcting output codes (ECOC) multiclass model.

Load Fisher's iris data set.

load fisheriris

Create a default kernel model template.

t = templateKernel

t = 
Fit template for Kernel.
    Learner: 'svm'

During training, the software fills in the empty properties with their respective default values.

Specify t as a binary learner for an ECOC multiclass model.

Mdl = fitcecoc(meas,species,'Learners',t)

Mdl = 
  CompactClassificationECOC
      ResponseName: 'Y'
        ClassNames: {'setosa'  'versicolor'  'virginica'}
    ScoreTransform: 'none'
    BinaryLearners: {3x1 cell}
      CodingMatrix: [3x3 double]

Mdl is a CompactClassificationECOC multiclass classifier.

Specify Kernel Model Template Options

Open Live Script

Create a kernel model template with additional options to implement logistic regression with a kernel scale parameter selected by a heuristic procedure.

t = templateKernel('Learner','logistic','KernelScale','auto')

t = 
Fit template for classification Kernel.

             BetaTolerance: []
                 BlockSize: []
             BoxConstraint: []
                   Epsilon: []
    NumExpansionDimensions: []
         GradientTolerance: []
        HessianHistorySize: []
            IterationLimit: []
               KernelScale: 'auto'
                    Lambda: []
                   Learner: 'logistic'
              LossFunction: []
                    Stream: []
            VerbosityLevel: []
           StandardizeData: []
                   Version: 1
                    Method: 'Kernel'
                      Type: 'classification'

Input Arguments

collapse all

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Learner','logistic','NumExpansionDimensions',2^15,'KernelScale','auto' specifies to implement logistic regression after mapping the predictor data to the 2^15 dimensional space using feature expansion with a kernel scale parameter selected by a heuristic procedure.

For Classification Models and Regression Models

collapse all

`Learner` — Kernel learner type
`"svm"` (default) | `"logistic"` | `"leastsquares"`

Kernel learner type, specified as "svm", "logistic", or "leastsquares".

In the following table, $f (x) = T (x) β + b .$

x is an observation (row vector) from p predictor variables.
$T (\cdot)$ is a transformation of an observation (row vector) for feature expansion. T(x) maps x in $ℝ^{p}$ to a high-dimensional space ( $ℝ^{m}$ ).
β is a vector of coefficients.
b is the scalar bias.

Value Algorithm Response Range Loss Function

Value	Algorithm	Response Range	Loss Function
`"svm"`	Support vector machine (classification or regression)	Classification: y ∊ {–1,1}; 1 for the positive class and –1 otherwise Regression: y ∊ (-∞,∞)	Classification: Hinge $ℓ [y, f (x)] = \max [0, 1 - y f (x)]$ Regression: Epsilon-insensitive $ℓ [y, f (x)] = \max [0, \| y - f (x) \| - ε]$
`"logistic"`	Logistic regression (classification only)	y ∊ {–1,1}; 1 for the positive class and –1 otherwise	Deviance (logistic) $ℓ [y, f (x)] = \log {1 + \exp [- y f (x)]}$
`"leastsquares"`	Linear regression via ordinary least squares (regression only)	y ∊ (-∞,∞)	Mean squared error (MSE) $ℓ [y, f (x)] = \frac{1}{2} {[y - f (x)]}^{2}$

"svm"

Support vector machine (classification or regression)

Classification: y ∊ {–1,1}; 1 for the positive class and –1 otherwise

Regression: y ∊ (-∞,∞)

Classification: Hinge $ℓ [y, f (x)] = \max [0, 1 - y f (x)]$

Regression: Epsilon-insensitive $ℓ [y, f (x)] = \max [0, | y - f (x) | - ε]$

"logistic" Logistic regression (classification only) y ∊ {–1,1}; 1 for the positive class and –1 otherwise Deviance (logistic) $ℓ [y, f (x)] = \log {1 + \exp [- y f (x)]}$

"leastsquares" Linear regression via ordinary least squares (regression only) y ∊ (-∞,∞) Mean squared error (MSE) $ℓ [y, f (x)] = \frac{1}{2} {[y - f (x)]}^{2}$

Example: "Learner","logistic"

`NumExpansionDimensions` — Number of dimensions of expanded space
`'auto'` (default) | positive integer

Number of dimensions of the expanded space, specified as the comma-separated pair consisting of 'NumExpansionDimensions' and 'auto' or a positive integer. For 'auto', the templateKernel function selects the number of dimensions using 2.^ceil(min(log2(p)+5,15)), where p is the number of predictors.

For details, see Random Feature Expansion.

Example: 'NumExpansionDimensions',2^15

Data Types: char | string | single | double

`KernelScale` — Kernel scale parameter
`1` (default) | `"auto"` | positive scalar

Kernel scale parameter, specified as "auto" or a positive scalar. The software obtains a random basis for random feature expansion by using the kernel scale parameter. For details, see Random Feature Expansion.

If you specify "auto", then the software selects an appropriate kernel scale parameter using a heuristic procedure. This heuristic procedure uses subsampling, so estimates can vary from one call to another. Therefore, to reproduce results, set a random number seed by using rng before training.

Example: KernelScale="auto"

Data Types: char | string | single | double

`BoxConstraint` — Box constraint
1 (default) | positive scalar

Box constraint, specified as the comma-separated pair consisting of 'BoxConstraint' and a positive scalar.

This argument is valid only when 'Learner' is 'svm'(default) and you do not specify a value for the regularization term strength 'Lambda'. You can specify either 'BoxConstraint' or 'Lambda' because the box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn), where n is the number of observations.

Example: 'BoxConstraint',100

Data Types: single | double

`Lambda` — Regularization term strength
`'auto'` (default) | nonnegative scalar

Regularization term strength, specified as the comma-separated pair consisting of 'Lambda' and 'auto' or a nonnegative scalar.

For 'auto', the value of Lambda is 1/n, where n is the number of observations.

When Learner is 'svm', you can specify either BoxConstraint or Lambda because the box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn).

Example: 'Lambda',0.01

Data Types: char | string | single | double

`Standardize` — Flag to standardize predictor data
`false` or `0` (default) | `true` or `1`

Since R2023b

Flag to standardize the predictor data, specified as a numeric or logical 0 (false) or 1 (true). If you set Standardize to true, then the software centers and scales each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors.

Example: "Standardize",true

Data Types: single | double | logical

`Type` — Kernel model type
`"classification"` | `"regression"`

Since R2023b

Kernel model type, specified as "classification" or "regression".

Value	Description
`"classification"`	Create a classification kernel learner template. If you do not specify `Type` as `"classification"`, the fitting functions `fitcecoc`, `testckfold`, and `fitsemigraph` set this value when you pass `t` to them.
`"regression"`	Create a regression kernel learner template. If you do not specify `Type` as `"regression"`, the fitting function `directforecaster` sets this value when you pass `t` to it.

Example: "Type","classification"

Data Types: char | string

For Regression Models Only

collapse all

`Epsilon` — Half the width of epsilon-insensitive band
`iqr(Y)/13.49` (default) | nonnegative scalar value

Half the width of the epsilon-insensitive band, specified as a nonnegative scalar value. This argument applies to support vector machine learners only.

The default Epsilon value is iqr(Y)/13.49, which is an estimate of standard deviation using the interquartile range of the response variable Y. If iqr(Y) is equal to zero, then the default Epsilon value is 0.1.

Example: "Epsilon",0.3

Data Types: single | double

Convergence Controls

collapse all

`BetaTolerance` — Relative tolerance on linear coefficients and bias term
`1e–4` (default) | nonnegative scalar

Relative tolerance on the linear coefficients and the bias term (intercept), specified as a nonnegative scalar.

Let $B_{t} = [β_{t}^{'} b_{t}]$ , that is, the vector of the coefficients and the bias term at optimization iteration t. If ${‖ \frac{B_{t} - B_{t - 1}}{B_{t}} ‖}_{2} < BetaTolerance$ , then optimization terminates.

If you also specify GradientTolerance, then optimization terminates when the software satisfies either stopping criterion.

Example: BetaTolerance=1e–6

Data Types: single | double

`GradientTolerance` — Absolute gradient tolerance
`1e–6` (default) | nonnegative scalar

Absolute gradient tolerance, specified as a nonnegative scalar.

Let $\nabla ℒ_{t}$ be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If ${‖ \nabla ℒ_{t} ‖}_{\infty} = \max | \nabla ℒ_{t} | < GradientTolerance$ , then optimization terminates.

If you also specify BetaTolerance, then optimization terminates when the software satisfies either stopping criterion.

Example: GradientTolerance=1e–5

Data Types: single | double

`IterationLimit` — Maximum number of optimization iterations
positive integer

Maximum number of optimization iterations, specified as a positive integer.

The default value is 1000 if the transformed data fits in memory, as specified by the BlockSize name-value argument. Otherwise, the default value is 100.

Example: IterationLimit=500

Data Types: single | double

Other Kernel Options

collapse all

`BlockSize` — Maximum amount of allocated memory
`4e^3` (4GB) (default) | positive scalar

Maximum amount of allocated memory (in megabytes), specified as the comma-separated pair consisting of 'BlockSize' and a positive scalar.

If templateKernel requires more memory than the value of 'BlockSize' to hold the transformed predictor data, then the software uses a block-wise strategy. For details about the block-wise strategy, see Algorithms.

Example: 'BlockSize',1e4

Data Types: single | double

`RandomStream` — Random number stream
global stream (default) | random stream object

Random number stream for reproducibility of data transformation, specified as a random stream object. For details, see Random Feature Expansion.

Use RandomStream to reproduce the random basis functions used by templateKernel to transform the predictor data to a high-dimensional space. For details, see Managing the Global Stream Using RandStream and Creating and Controlling a Random Number Stream.

Example: RandomStream=RandStream("mlfg6331_64")

`HessianHistorySize` — Size of history buffer for Hessian approximation
`15` (default) | positive integer

Size of the history buffer for Hessian approximation, specified as the comma-separated pair consisting of 'HessianHistorySize' and a positive integer. At each iteration, templateKernel composes the Hessian approximation by using statistics from the latest HessianHistorySize iterations.

Example: 'HessianHistorySize',10

Data Types: single | double

`Verbose` — Verbosity level
`0` (default) | `1`

Verbosity level, specified as the comma-separated pair consisting of 'Verbose' and either 0 or 1. Verbose controls the display of diagnostic information at the command line.

Value	Description
`0`	`templateKernel` does not display diagnostic information.
`1`	`templateKernel` displays the value of the objective function, gradient magnitude, and other diagnostic information.

Example: 'Verbose',1

Data Types: single | double

Output Arguments

collapse all

`t` — Kernel learner template
template object

Kernel learner template suitable for training a Gaussian kernel model for nonlinear classification or regression, returned as a template object. During training, the software uses default values for empty options.

More About

collapse all

Random Feature Expansion

Random feature expansion, such as Random Kitchen Sinks [1] or Fastfood [2], is a scheme to approximate Gaussian kernels of the kernel classification algorithm to use for big data in a computationally efficient way. Random feature expansion is more practical for big data applications that have large training sets, but can also be applied to smaller data sets that fit in memory.

The kernel classification algorithm searches for an optimal hyperplane that separates the data into two classes after mapping features into a high-dimensional space. Nonlinear features that are not linearly separable in a low-dimensional space can be separable in the expanded high-dimensional space. All the calculations for hyperplane classification use only dot products. You can obtain a nonlinear classification model by replacing the dot product x₁x₂' with the nonlinear kernel function $G (x_{1}, x_{2}) = 〈 φ (x_{1}), φ (x_{2}) 〉$ , where x_i is the ith observation (row vector) and φ(x_i) is a transformation that maps x_i to a high-dimensional space (called the “kernel trick”). However, evaluating G(x₁,x₂) (Gram matrix) for each pair of observations is computationally expensive for a large data set (large n).

The random feature expansion scheme finds a random transformation so that its dot product approximates the Gaussian kernel. That is,

$G (x_{1}, x_{2}) = 〈 φ (x_{1}), φ (x_{2}) 〉 \approx T (x_{1}) T (x_{2})',$

where T(x) maps x in $ℝ^{p}$ to a high-dimensional space ( $ℝ^{m}$ ). The Random Kitchen Sinks scheme uses the random transformation

$T (x) = m^{- 1 / 2} \exp (i Z x')',$

where $Z \in ℝ^{m \times p}$ is a sample drawn from $N (0, σ^{- 2})$ and σ is a kernel scale. This scheme requires O(mp) computation and storage.

The Fastfood scheme introduces another random basis V instead of Z using Hadamard matrices combined with Gaussian scaling matrices. This random basis reduces the computation cost to O(mlogp) and reduces storage to O(m).

You can specify values for m and σ by setting NumExpansionDimensions and KernelScale, respectively, of templateKernel.

The templateKernel function uses the Fastfood scheme for random feature expansion, and uses linear classification to train a Gaussian kernel classification model. Unlike solvers in the templateSVM function, which require computation of the n-by-n Gram matrix, the solver in templateKernel only needs to form a matrix of size n-by-m, with m typically much less than n for big data.

Box Constraint

A box constraint is a parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization). Increasing the box constraint can lead to longer training times.

The box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn), where n is the number of observations.

Algorithms

templateKernel minimizes the regularized objective function using a Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) solver with ridge (L₂) regularization. To find the type of LBFGS solver used for training, type FitInfo.Solver in the Command Window.

'LBFGS-fast' — LBFGS solver.
'LBFGS-blockwise' — LBFGS solver with a block-wise strategy. If templateKernel requires more memory than the value of BlockSize to hold the transformed predictor data, then the function uses a block-wise strategy.
'LBFGS-tall' — LBFGS solver with a block-wise strategy for tall arrays.

When templateKernel uses a block-wise strategy, it implements LBFGS by distributing the calculation of the loss and gradient among different parts of the data at each iteration. Also, templateKernel refines the initial estimates of the linear coefficients and the bias term by fitting the model locally to parts of the data and combining the coefficients by averaging. If you specify 'Verbose',1, then templateKernel displays diagnostic information for each data pass and stores the information in the History field of FitInfo.

When templateKernel does not use a block-wise strategy, the initial estimates are zeros. If you specify 'Verbose',1, then templateKernel displays diagnostic information for each iteration and stores the information in the History field of FitInfo.

References

[1] Rahimi, A., and B. Recht. “Random Features for Large-Scale Kernel Machines.” Advances in Neural Information Processing Systems. Vol. 20, 2008, pp. 1177–1184.

[2] Le, Q., T. Sarlós, and A. Smola. “Fastfood — Approximating Kernel Expansions in Loglinear Time.” Proceedings of the 30th International Conference on Machine Learning. Vol. 28, No. 3, 2013, pp. 244–252.

[3] Huang, P. S., H. Avron, T. N. Sainath, V. Sindhwani, and B. Ramabhadran. “Kernel methods match Deep Neural Networks on TIMIT.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. 2014, pp. 205–209.

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations when you train a model by passing a kernel model template and tall arrays to fitcecoc:

The default values for these name-value pair arguments are different when you work with tall arrays.
- 'Verbose' — Default value is 1.
- 'BetaTolerance' — Default value is relaxed to 1e–3.
- 'GradientTolerance' — Default value is relaxed to 1e–5.
- 'IterationLimit' — Default value is relaxed to 20.
If 'KernelScale' is 'auto', then templateKernel uses the random stream controlled by tallrng for subsampling. For reproducibility, you must set a random number seed for both the global stream and the random stream controlled by tallrng.
If 'Lambda' is 'auto', then templateKernel might take an extra pass through the data to calculate the number of observations.
templateKernel uses a block-wise strategy. For details, see Algorithms.

For more information, see Tall Arrays.

Version History

Introduced in R2018b

expand all

R2023b: Kernel models support standardization of predictors

templateKernel supports the standardization of numeric predictors. That is, you can specify the Standardize value as true to center and scale each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors.

R2023b: Support for regression learner templates

templateKernel supports the creation of regression learner templates. Specify the Type name-value argument as "regression" in the call to the function. When creating a regression learner template, you can additionally specify the Epsilon name-value argument for support vector machine learners.

templateKernel

Syntax

Description

Examples

Create Default Kernel Model Template

Specify Kernel Model Template Options

Input Arguments

Name-Value Arguments

Learner — Kernel learner type "svm" (default) | "logistic" | "leastsquares"

NumExpansionDimensions — Number of dimensions of expanded space 'auto' (default) | positive integer

KernelScale — Kernel scale parameter 1 (default) | "auto" | positive scalar

BoxConstraint — Box constraint 1 (default) | positive scalar

Lambda — Regularization term strength 'auto' (default) | nonnegative scalar

Standardize — Flag to standardize predictor data false or 0 (default) | true or 1

Type — Kernel model type "classification" | "regression"

Epsilon — Half the width of epsilon-insensitive band iqr(Y)/13.49 (default) | nonnegative scalar value

BetaTolerance — Relative tolerance on linear coefficients and bias term 1e–4 (default) | nonnegative scalar

GradientTolerance — Absolute gradient tolerance 1e–6 (default) | nonnegative scalar

IterationLimit — Maximum number of optimization iterations positive integer

BlockSize — Maximum amount of allocated memory 4e^3 (4GB) (default) | positive scalar

RandomStream — Random number stream global stream (default) | random stream object

HessianHistorySize — Size of history buffer for Hessian approximation 15 (default) | positive integer

Verbose — Verbosity level 0 (default) | 1

Output Arguments

t — Kernel learner template template object

More About

Random Feature Expansion

Box Constraint

Algorithms

References

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

Version History

R2023b: Kernel models support standardization of predictors

R2023b: Support for regression learner templates

See Also

`Learner` — Kernel learner type
`"svm"` (default) | `"logistic"` | `"leastsquares"`

`NumExpansionDimensions` — Number of dimensions of expanded space
`'auto'` (default) | positive integer

`KernelScale` — Kernel scale parameter
`1` (default) | `"auto"` | positive scalar

`BoxConstraint` — Box constraint
1 (default) | positive scalar

`Lambda` — Regularization term strength
`'auto'` (default) | nonnegative scalar

`Standardize` — Flag to standardize predictor data
`false` or `0` (default) | `true` or `1`

`Type` — Kernel model type
`"classification"` | `"regression"`

`Epsilon` — Half the width of epsilon-insensitive band
`iqr(Y)/13.49` (default) | nonnegative scalar value

`BetaTolerance` — Relative tolerance on linear coefficients and bias term
`1e–4` (default) | nonnegative scalar

`GradientTolerance` — Absolute gradient tolerance
`1e–6` (default) | nonnegative scalar

`IterationLimit` — Maximum number of optimization iterations
positive integer

`BlockSize` — Maximum amount of allocated memory
`4e^3` (4GB) (default) | positive scalar

`RandomStream` — Random number stream
global stream (default) | random stream object

`HessianHistorySize` — Size of history buffer for Hessian approximation
`15` (default) | positive integer

`Verbose` — Verbosity level
`0` (default) | `1`

`t` — Kernel learner template
template object

Tall Arrays
Calculate with arrays that have more rows than fit in memory.