Regularization and Variable Selection via the Elastic Net on ShortScience.org

scholar.google.com

Regularization and Variable Selection via the Elastic Net
Zou, H. and Hastie, T.
Journal of the Royal Statistical Society: Series B (Statistical Methodology) - 2003 via Local Bibsonomy
Keywords:

Summaries/Notes 1

[link] Summary by Shagun Sodhani 8 years ago

## Introduction to elastic net

* Regularization and variable selection method.
* Sparse Representation
* Exihibits grouping effect.
* Prticulary useful when number of predictors (*p*) >> number of observations (*n*).
* LARS-EN algorithm to compute elastic net regularization path.

## Lasso

* Least square method with L1-penalty on regression coefficient.
* Does continuous shrinkage and automatic variable selection

### Limitations

* If *p >> n*, lasso can select at most *n* variables.
* In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
* If *n > p* and there is a high correlation between predictors, ridge regression outperforms lasso.

## Naive elastic net

* Least square method.
* Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
* *penalty = (1−α)\*|β| + α\*|β|<sup>2</sup>* where *β* refers to the coefficient matrix.
* *α = 0* => lasso penalty
* *α = 1* => ridge penalty
* Naive elastic net can be solved by transforming to lasso on augmeneted data.
* Can be viewed as redge type shrinkage followed by lasso type thresholding.

### Limitations

* The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.

## Bridge Regression

* Generalization of lasso and ridge regression.
* Can not produce sparse solutions.

## Elastic net

* Rescaled naive elastic net coefficients to undo shrinkage.
* Retains good properties of the naive elastic net.

## Justification for scaling

* Elastic net becomes minimax optimal.
* Scaling reverses the shrinkage control introduced by ridge regression.

## LARS-EN

* Based on LARS (used to solve lasso).
* Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
* Use sparseness to save on computation.

## Conclusion

Elastic net performs superior to lasso.

Your comment: