templateKernel
Kernel learner template
Description
returns a kernel learner
template suitable for training a Gaussian kernel model for nonlinear classification or
regression.t
= templateKernel
returns a template with additional options specified by one or more name-value arguments. t
= templateKernel(Name,Value
)
For example, you can specify the learner or the number of dimensions of the expanded space.
If you specify the type of model by using the Type
name-value
argument, then the display of t
in the Command Window shows all options
as empty ([]
), except those that you specify using name-value arguments.
If you do not specify the type of model, then the display suppresses the empty options.
During training, the software uses default values for empty options.
Examples
Create Default Kernel Model Template
Create a default kernel model template and use it to train an error-correcting output codes (ECOC) multiclass model.
Load Fisher's iris data set.
load fisheriris
Create a default kernel model template.
t = templateKernel
t = Fit template for Kernel. Learner: 'svm'
During training, the software fills in the empty properties with their respective default values.
Specify t
as a binary learner for an ECOC multiclass model.
Mdl = fitcecoc(meas,species,'Learners',t)
Mdl = CompactClassificationECOC ResponseName: 'Y' ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' BinaryLearners: {3x1 cell} CodingMatrix: [3x3 double]
Mdl
is a CompactClassificationECOC
multiclass classifier.
Specify Kernel Model Template Options
Create a kernel model template with additional options to implement logistic regression with a kernel scale parameter selected by a heuristic procedure.
t = templateKernel('Learner','logistic','KernelScale','auto')
t = Fit template for classification Kernel. BetaTolerance: [] BlockSize: [] BoxConstraint: [] Epsilon: [] NumExpansionDimensions: [] GradientTolerance: [] HessianHistorySize: [] IterationLimit: [] KernelScale: 'auto' Lambda: [] Learner: 'logistic' LossFunction: [] Stream: [] VerbosityLevel: [] StandardizeData: [] Version: 1 Method: 'Kernel' Type: 'classification'
Input Arguments
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'Learner','logistic','NumExpansionDimensions',2^15,'KernelScale','auto'
specifies to implement logistic regression after mapping the predictor data to the
2^15
dimensional space using feature expansion with a kernel scale
parameter selected by a heuristic procedure.
Learner
— Kernel learner type
"svm"
(default) | "logistic"
| "leastsquares"
Kernel learner type, specified as "svm"
,
"logistic"
, or "leastsquares"
.
In the following table,
x is an observation (row vector) from p predictor variables.
is a transformation of an observation (row vector) for feature expansion. T(x) maps x in to a high-dimensional space ().
β is a vector of coefficients.
b is the scalar bias.
Value | Algorithm | Response Range | Loss Function |
---|---|---|---|
"svm" | Support vector machine (classification or regression) | Classification: y ∊ {–1,1}; 1 for the positive class and –1 otherwise Regression: y ∊ (-∞,∞) | Classification: Hinge Regression: Epsilon-insensitive |
"logistic" | Logistic regression (classification only) | y ∊ {–1,1}; 1 for the positive class and –1 otherwise | Deviance (logistic) |
"leastsquares" | Linear regression via ordinary least squares (regression only) | y ∊ (-∞,∞) | Mean squared error (MSE) |
Example: "Learner","logistic"
NumExpansionDimensions
— Number of dimensions of expanded space
'auto'
(default) | positive integer
Number of dimensions of the expanded space, specified as the comma-separated
pair consisting of 'NumExpansionDimensions'
and
'auto'
or a positive integer. For
'auto'
, the templateKernel
function selects the number of dimensions using
2.^ceil(min(log2(p)+5,15))
, where
p
is the number of predictors.
For details, see Random Feature Expansion.
Example: 'NumExpansionDimensions',2^15
Data Types: char
| string
| single
| double
KernelScale
— Kernel scale parameter
1
(default) | "auto"
| positive scalar
Kernel scale parameter, specified as "auto"
or a positive scalar. The
software obtains a random basis for random feature expansion by using the kernel scale
parameter. For details, see Random Feature Expansion.
If you specify "auto"
, then the software selects an appropriate kernel
scale parameter using a heuristic procedure. This heuristic procedure uses subsampling,
so estimates can vary from one call to another. Therefore, to reproduce results, set a
random number seed by using rng
before training.
Example: KernelScale="auto"
Data Types: char
| string
| single
| double
BoxConstraint
— Box constraint
1 (default) | positive scalar
Box constraint, specified as the comma-separated pair consisting of
'BoxConstraint'
and a positive scalar.
This argument is valid only when 'Learner'
is
'svm'
(default) and you do not
specify a value for the regularization term strength
'Lambda'
. You can specify
either 'BoxConstraint'
or
'Lambda'
because the box
constraint (C) and the
regularization term strength (λ)
are related by C =
1/(λn), where n is the
number of observations.
Example: 'BoxConstraint',100
Data Types: single
| double
Lambda
— Regularization term strength
'auto'
(default) | nonnegative scalar
Regularization term strength, specified as the comma-separated pair consisting of 'Lambda'
and 'auto'
or a nonnegative scalar.
For 'auto'
, the value of Lambda
is
1/n, where n is the number of
observations.
When Learner
is 'svm'
, you can specify either
BoxConstraint
or Lambda
because the box
constraint (C) and the regularization term strength
(λ) are related by C =
1/(λn).
Example: 'Lambda',0.01
Data Types: char
| string
| single
| double
Standardize
— Flag to standardize predictor data
false
or 0
(default) | true
or 1
Since R2023b
Flag to standardize the predictor data, specified as a numeric or logical 0
(false
) or 1
(true
). If you
set Standardize
to true
, then the software
centers and scales each numeric predictor variable by the corresponding column mean and
standard deviation. The software does not standardize the categorical predictors.
Example: "Standardize",true
Data Types: single
| double
| logical
Type
— Kernel model type
"classification"
| "regression"
Since R2023b
Kernel model type, specified as "classification"
or
"regression"
.
Value | Description |
---|---|
"classification" | Create a classification kernel learner template. If you do not specify
Type as "classification" , the
fitting functions fitcecoc , testckfold , and fitsemigraph set this value when you pass
t to them. |
"regression" | Create a regression kernel learner template. If you do not specify
Type as "regression" , the fitting
function directforecaster sets this value when you pass
t to it. |
Example: "Type","classification"
Data Types: char
| string
Epsilon
— Half the width of epsilon-insensitive band
iqr(Y)/13.49
(default) | nonnegative scalar value
Half the width of the epsilon-insensitive band, specified as a nonnegative scalar value. This argument applies to support vector machine learners only.
The default Epsilon
value is iqr(Y)/13.49
,
which is an estimate of standard deviation using the interquartile range of the
response variable Y
. If iqr(Y)
is equal to zero,
then the default Epsilon
value is 0.1.
Example: "Epsilon",0.3
Data Types: single
| double
BetaTolerance
— Relative tolerance on linear coefficients and bias term
1e–4
(default) | nonnegative scalar
Relative tolerance on the linear coefficients and the bias term (intercept), specified as a nonnegative scalar.
Let , that is, the vector of the coefficients and the bias term at optimization iteration t. If , then optimization terminates.
If you also specify GradientTolerance
, then optimization terminates when the software satisfies either stopping criterion.
Example: BetaTolerance=1e–6
Data Types: single
| double
GradientTolerance
— Absolute gradient tolerance
1e–6
(default) | nonnegative scalar
Absolute gradient tolerance, specified as a nonnegative scalar.
Let be the gradient vector of the objective function with respect to the coefficients and bias term at optimization iteration t. If , then optimization terminates.
If you also specify BetaTolerance
, then optimization terminates when the
software satisfies either stopping criterion.
Example: GradientTolerance=1e–5
Data Types: single
| double
IterationLimit
— Maximum number of optimization iterations
positive integer
Maximum number of optimization iterations, specified as a positive integer.
The default value is 1000 if the transformed data fits in memory, as specified by the
BlockSize
name-value argument. Otherwise, the default value is
100.
Example: IterationLimit=500
Data Types: single
| double
BlockSize
— Maximum amount of allocated memory
4e^3
(4GB) (default) | positive scalar
Maximum amount of allocated memory (in megabytes), specified as the comma-separated pair consisting of 'BlockSize'
and a positive scalar.
If templateKernel
requires more memory than the value of
'BlockSize'
to hold the transformed predictor data, then the
software uses a block-wise strategy. For details about the block-wise strategy, see
Algorithms.
Example: 'BlockSize',1e4
Data Types: single
| double
RandomStream
— Random number stream
global stream (default) | random stream object
Random number stream for reproducibility of data transformation, specified as a random stream object. For details, see Random Feature Expansion.
Use RandomStream
to reproduce the random basis functions used by
templateKernel
to transform the predictor data to a
high-dimensional space. For details, see Managing the Global Stream Using RandStream
and Creating and Controlling a Random Number Stream.
Example: RandomStream=RandStream("mlfg6331_64")
HessianHistorySize
— Size of history buffer for Hessian approximation
15
(default) | positive integer
Size of the history buffer for Hessian approximation, specified as the comma-separated pair
consisting of 'HessianHistorySize'
and a positive integer. At each
iteration, templateKernel
composes the Hessian approximation by using
statistics from the latest HessianHistorySize
iterations.
Example: 'HessianHistorySize',10
Data Types: single
| double
Verbose
— Verbosity level
0
(default) | 1
Verbosity level, specified as the comma-separated pair consisting of
'Verbose'
and either 0
or
1
. Verbose
controls the display of diagnostic
information at the command line.
Value | Description |
---|---|
0 | templateKernel does not display diagnostic
information. |
1 | templateKernel displays the value of the objective
function, gradient magnitude, and other diagnostic information. |
Example: 'Verbose',1
Data Types: single
| double
Output Arguments
t
— Kernel learner template
template object
Kernel learner template suitable for training a Gaussian kernel model for nonlinear classification or regression, returned as a template object. During training, the software uses default values for empty options.
More About
Random Feature Expansion
Random feature expansion, such as Random Kitchen Sinks[1] or Fastfood[2], is a scheme to approximate Gaussian kernels of the kernel classification algorithm to use for big data in a computationally efficient way. Random feature expansion is more practical for big data applications that have large training sets, but can also be applied to smaller data sets that fit in memory.
The kernel classification algorithm searches for an optimal hyperplane that separates the data into two classes after mapping features into a high-dimensional space. Nonlinear features that are not linearly separable in a low-dimensional space can be separable in the expanded high-dimensional space. All the calculations for hyperplane classification use only dot products. You can obtain a nonlinear classification model by replacing the dot product x1x2' with the nonlinear kernel function , where xi is the ith observation (row vector) and φ(xi) is a transformation that maps xi to a high-dimensional space (called the “kernel trick”). However, evaluating G(x1,x2) (Gram matrix) for each pair of observations is computationally expensive for a large data set (large n).
The random feature expansion scheme finds a random transformation so that its dot product approximates the Gaussian kernel. That is,
where T(x) maps x in to a high-dimensional space (). The Random Kitchen Sinks scheme uses the random transformation
where is a sample drawn from and σ is a kernel scale. This scheme requires O(mp) computation and storage.
The Fastfood scheme introduces another random
basis V instead of Z using Hadamard matrices combined
with Gaussian scaling matrices. This random basis reduces the computation cost to O(mlog
p) and reduces storage to O(m).
You can specify values for m and
σ by setting NumExpansionDimensions
and
KernelScale
, respectively, of templateKernel
.
The templateKernel
function uses the Fastfood scheme for random
feature expansion, and uses linear classification to train a Gaussian kernel classification
model. Unlike solvers in the templateSVM
function, which require
computation of the n-by-n Gram matrix, the solver in
templateKernel
only needs to form a matrix of size
n-by-m, with m typically much
less than n for big data.
Box Constraint
A box constraint is a parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization). Increasing the box constraint can lead to longer training times.
The box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn), where n is the number of observations.
Algorithms
templateKernel
minimizes the regularized objective function using a Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) solver with ridge (L2) regularization. To find the type of LBFGS solver used for training, type FitInfo.Solver
in the Command Window.
'LBFGS-fast'
— LBFGS solver.'LBFGS-blockwise'
— LBFGS solver with a block-wise strategy. IftemplateKernel
requires more memory than the value ofBlockSize
to hold the transformed predictor data, then the function uses a block-wise strategy.'LBFGS-tall'
— LBFGS solver with a block-wise strategy for tall arrays.
When templateKernel
uses a block-wise strategy, it implements LBFGS by
distributing the calculation of the loss and gradient among different parts of the data at
each iteration. Also, templateKernel
refines the initial estimates of the
linear coefficients and the bias term by fitting the model locally to parts of the data and
combining the coefficients by averaging. If you specify 'Verbose',1
, then
templateKernel
displays diagnostic information for each data pass and
stores the information in the History
field of
FitInfo
.
When templateKernel
does not use a block-wise strategy, the initial estimates are zeros. If you specify 'Verbose',1
, then templateKernel
displays diagnostic information for each iteration and stores the information in the History
field of FitInfo
.
References
[1] Rahimi, A., and B. Recht. “Random Features for Large-Scale Kernel Machines.” Advances in Neural Information Processing Systems. Vol. 20, 2008, pp. 1177–1184.
[2] Le, Q., T. Sarlós, and A. Smola. “Fastfood — Approximating Kernel Expansions in Loglinear Time.” Proceedings of the 30th International Conference on Machine Learning. Vol. 28, No. 3, 2013, pp. 244–252.
[3] Huang, P. S., H. Avron, T. N. Sainath, V. Sindhwani, and B. Ramabhadran. “Kernel methods match Deep Neural Networks on TIMIT.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. 2014, pp. 205–209.
Extended Capabilities
Tall Arrays
Calculate with arrays that have more rows than fit in memory.
Usage notes and limitations when you train a model by passing a kernel model template
and tall arrays to fitcecoc
:
The default values for these name-value pair arguments are different when you work with tall arrays.
'Verbose'
— Default value is1
.'BetaTolerance'
— Default value is relaxed to1e–3
.'GradientTolerance'
— Default value is relaxed to1e–5
.'IterationLimit'
— Default value is relaxed to20
.
If
'KernelScale'
is'auto'
, thentemplateKernel
uses the random stream controlled bytallrng
for subsampling. For reproducibility, you must set a random number seed for both the global stream and the random stream controlled bytallrng
.If
'Lambda'
is'auto'
, thentemplateKernel
might take an extra pass through the data to calculate the number of observations.templateKernel
uses a block-wise strategy. For details, see Algorithms.
For more information, see Tall Arrays.
Version History
Introduced in R2018bR2023b: Kernel models support standardization of predictors
templateKernel
supports the standardization of numeric predictors. That is, you can specify the Standardize
value as true
to center and scale each numeric predictor variable by the corresponding column mean and standard deviation. The software does not standardize the categorical predictors.
R2023b: Support for regression learner templates
templateKernel
supports the creation of regression learner templates. Specify the Type
name-value argument as "regression" in the call to the function. When creating a regression
learner template, you can additionally specify the Epsilon
name-value
argument for support vector machine learners.
See Also
ClassificationKernel
| RegressionKernel
| fitckernel
| fitrkernel
| fitcecoc
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)