Linear classification learner template
templateLinear
creates a template suitable for fitting a linear classification model to high-dimensional data for multiclass problems.
The template specifies the binary learner model, regularization type and strength, and solver, among other things. After creating the template, train the model by passing the template and data to fitcecoc
.
returns a linear classification learner template.t
= templateLinear()
If you specify a default template, then the software uses default values for all input arguments during training.
returns a template with additional options specified by one or more name-value pair arguments. For example, you can specify to implement logistic regression, specify the regularization type or strength, or specify the solver to use for objective-function minimization.t
= templateLinear(Name,Value
)
If you display t
in the Command Window, then all options appear empty ([]
) except options that you specify using name-value pair arguments. During training, the software uses default values for empty options.
It is a best practice to orient your predictor matrix so that observations correspond to columns and to specify 'ObservationsIn','columns'
. As a result, you can experience a significant reduction in optimization-execution time.
For better optimization accuracy if the predictor data is high-dimensional and Regularization
is 'ridge'
, set any of these combinations for Solver
:
'sgd'
'asgd'
'dual'
if Learner
is 'svm'
{'sgd','lbfgs'}
{'asgd','lbfgs'}
{'dual','lbfgs'}
if Learner
is 'svm'
Other combinations can result in poor optimization accuracy.
For better optimization accuracy if the predictor data is moderate- through low-dimensional and Regularization
is 'ridge'
, set Solver
to 'bfgs'
.
If Regularization
is 'lasso'
, set any of these combinations for Solver
:
'sgd'
'asgd'
'sparsa'
{'sgd','sparsa'}
{'asgd','sparsa'}
When choosing between SGD and ASGD, consider that:
SGD takes less time per iteration, but requires more iterations to converge.
ASGD requires fewer iterations to converge, but takes more time per iteration.
If the predictor data has few observations, but many predictor variables, then:
Specify 'PostFitBias',true
.
For SGD or ASGD solvers, set PassLimit
to a positive integer that is greater than 1, for example, 5 or 10. This setting often results in better accuracy.
For SGD and ASGD solvers, BatchSize
affects the rate of convergence.
If BatchSize
is too small, then the software achieves the minimum in many iterations, but computes the gradient per iteration quickly.
If BatchSize
is too large, then the software achieves the minimum in fewer iterations, but computes the gradient per iteration slowly.
Large learning rate (see LearnRate
) speed-up convergence to the minimum, but can lead to divergence (that is, over-stepping the minimum). Small learning rates ensure convergence to the minimum, but can lead to slow termination.
If Regularization
is 'lasso'
, then experiment with various values of TruncationPeriod
. For example, set TruncationPeriod
to 1
, 10
, and then 100
.
For efficiency, the software does not standardize predictor data. To standardize the predictor data (X
), enter
X = bsxfun(@rdivide,bsxfun(@minus,X,mean(X,2)),std(X,0,2));
The code requires that you orient the predictors and observations as the rows and columns of X
, respectively. Also, for memory-usage economy, the code replaces the original predictor data the standardized data.