Main Content

Lasso is a regularization technique. Use `lasso`

to:

Reduce the number of predictors in a regression model.

Identify important predictors.

Select among redundant predictors.

Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.

Elastic net is a related technique. Use elastic net when you have several highly correlated variables. `lasso`

provides elastic net regularization when you set the `Alpha`

name-value pair to a number strictly between `0`

and `1`

.

See Lasso and Elastic Net Details.

For lasso regularization of regression ensembles, see `regularize`

.

Lasso is a regularization technique for performing linear regression. Lasso includes a penalty term that constrains the size of the estimated coefficients. Therefore, it resembles ridge regression. Lasso is a *shrinkage estimator*: it generates coefficient estimates that are biased to be small. Nevertheless, a lasso estimator can have smaller mean squared error than an ordinary least-squares estimator when you apply it to new data.

Unlike ridge regression, as the penalty term increases, lasso sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative to stepwise regression and other model selection and dimensionality reduction techniques.

Elastic net is a related technique. Elastic net is a hybrid of ridge regression and lasso regularization. Like lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies have suggested that the elastic net technique can outperform lasso on data with highly correlated predictors.

The *lasso* technique solves this regularization problem. For a given value of *λ*, a nonnegative parameter, `lasso`

solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}\right).$$

*N*is the number of observations.*y*is the response at observation_{i}*i*.*x*is data, a vector of_{i}*p*values at observation*i*.*λ*is a positive regularization parameter corresponding to one value of`Lambda`

.The parameters

*β*_{0}and*β*are scalar and*p*-vector respectively.

As *λ* increases, the number of nonzero components of *β* decreases.

The lasso problem involves the *L*^{1} norm of *β*, as contrasted with the elastic net algorithm.

The *elastic net* technique solves this regularization problem. For an *α* strictly between 0 and 1, and a nonnegative *λ*, elastic net solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {P}_{\alpha}\left(\beta \right)\right),$$

where

$${P}_{\alpha}\left(\beta \right)=\frac{(1-\alpha )}{2}{\Vert \beta \Vert}_{2}^{2}+\alpha {\Vert \beta \Vert}_{1}={\displaystyle \sum _{j=1}^{p}\left(\frac{(1-\alpha )}{2}{\beta}_{j}^{2}+\alpha \left|{\beta}_{j}\right|\right)}.$$

Elastic net is the same as lasso when *α* = 1. As *α* shrinks toward 0, elastic net approaches `ridge`

regression. For other values of *α*, the penalty term *P _{α}*(

[1] Tibshirani, R. "Regression shrinkage and selection via the lasso." *Journal of the Royal Statistical Society,* Series B, Vol 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. "Regularization and variable selection via the elastic net." *Journal of the Royal Statistical Society, Series B,* Vol. 67, No. 2, pp. 301–320, 2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie. "Regularization paths for generalized linear models via coordinate descent." *Journal of Statistical Software,* Vol 33, No. 1, 2010. `https://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. *The Elements of Statistical Learning,* 2nd edition. Springer, New York, 2008.

`lasso`

| `lassoglm`

| `fitrlinear`

| `lassoPlot`

| `ridge`