An introduction to plasso

Michael Knaus

Stefan Glaisner

December 03, 2023

This notebook provides a detailed overview over the plasso package and its two main functions plasso and cv.plasso which were developed in the course of Knaus (2022). This package is strongly oriented around the glmnet package and rests on its standard function glmnet in its very basis. Related theory and algorithms are described in Friedman, Hastie, and Tibshirani (2010).

Getting started

The very latest version of the package can be installed from its Github page. For the installation you will need the devtools package. The latest ‘official’ version can be installed from CRAN using `install.packages()’. We recommend the latter.

General dependencies are: glmnet, Matrix, methods, parallel, doParallel, foreach and iterators.

Code
library(devtools)
devtools::install_github("stefan-1997/plasso")

install.packages("plasso")

Load plasso using library().

Code
library(plasso)

The package generally provides two functions plasso and cv.plasso which are both built on top of the glmnet functionality. Specifically, a glmnet object lives within both functions and also in their outputs (list item lasso_full).

The term plasso refers to a Post-Lasso model which estimates a least squares algorithm only for the active (i.e. non-zero) coefficients of a previously estimated Lasso models. This follows the idea that we want to do selection but without shrinkage.

The package comes with some simulated data representing the following DGP:

The covariates matrix \(X\) consists of 10 variables whose effect size one the target \(Y\) is defined by the vector \(\boldsymbol{\pi} = [1, -0.83, 0.67, -0.5, 0.33, -0.17, 0, ..., 0]'\) where the first six effect sizes decrease in absolute terms continuously from 1 to 0 and alternate in their sign. The true causal effect of all other covariates is 0. The variables in \(X\) follow a normal distribution with mean zero while the covariance matrix follows a Toeplitz matrix, which is characterized by having constant diagonals: \[ \boldsymbol{\Sigma} = \begin{bmatrix} 1 & 0.7 & 0.7^2 & ... & 0.7^{9} \\ 0.7 & 1 & 0.7 & ... & 0.7^{8} \\ 0.7^2 & 0.7 & 1 & ... & 0.7^{7} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0.7^{9} & 0.7^{8} & 0.7^{7} & ... & 1 \end{bmatrix} \]

The target \(\boldsymbol{y}\) is then a linear transformation of \(\boldsymbol{X}\) plus a vector of standard normal random variables. Each element of \(\boldsymbol{y}\) is given by: \[ y_i = \boldsymbol{X}_i \boldsymbol{\pi} + \varepsilon_i \] where \(\varepsilon_i \sim \mathcal{N}(0,4)\).

Code
data(toeplitz)

y = as.matrix(toeplitz[,1])
X = toeplitz[,-1]

plasso

plasso returns least squares estimates for all lambda values of a standard glmnet object for both a simple Lasso and a Post-Lasso model.

Code
p = plasso::plasso(X,y)

You can plot the coefficient paths for both the Post-Lasso model as well as the underlying ‘original’ Lasso model. This nicely illustrates the difference between the Lasso and Post-Lasso models where the latter is characterized by jumps in its coefficient paths every time a new variable enters the active set.

Code
plot(p, lasso=FALSE, xvar="lambda")

Code
plot(p, lasso=TRUE, xvar="lambda")

We can also have a look at which coefficients are active for a chosen lambda value. Here, the difference between Post-Lasso and Lasso becomes clearly visible. For the Lasso model, there is not only feature selection but shrinkage which results in the active coefficients being smaller than for the Post-Lasso model:

Code
coef_p = coef(p, s=0.01)

as.vector(coef_p$plasso)
##  [1]  0.1438137  1.0187628 -0.6214926  0.4673645 -0.2300834 -0.3575276
##  [7]  0.2180390  0.1180676 -0.2138268  0.1975462 -0.1047983
Code
as.vector(coef_p$lasso)
##  [1]  0.14498611  0.98729386 -0.56374511  0.40656768 -0.20023679 -0.33156564
##  [7]  0.18985685  0.08930237 -0.16087044  0.13798825 -0.06639638

cv.plasso

The cv.plasso function uses cross-validation to determine the performance of different values for the lambda penalty term for both models (Post-Lasso and Lasso). The returned output of class cv.plasso includes the mean squared errors.

When applying the summary method and setting the default parameter as FALSE, you can get some informative output considering the optimal choice of lambda.

Code
p.cv = plasso::cv.plasso(X,y,kf=5)
summary(p.cv, default=FALSE)
## 
## Call:
##  plasso::cv.plasso(x = X, y = y, kf = 5)
## 
## Lasso:
##  Minimum CV MSE Lasso:  15.22
##  Lambda at minimum:  0.01858
##  Active variables at minimum:  (Intercept) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## Post-Lasso:
##  Minimum CV MSE Post-Lasso:  15.2
##  Lambda at minimum:  0.2087
##  Active variables at minimum:  (Intercept) X1 X5

Using the plot method extends the basic glmnet visualization by the cross-validated MSEs for the Post-Lasso model.

Code
plot(p.cv, legend_pos="left", legend_size=0.5)

We can use the following code to get the optimal lambda value (for the Post-Lasso model here) and the associated coefficients at that value of \(\lambda\).

Code
p.cv$lambda_min_pl
## [1] 0.2087288
Code
coef_pcv = coef(p.cv, S="optimal")
as.vector(coef_pcv$plasso)
##  [1]  0.1410181  0.7663423  0.0000000  0.0000000  0.0000000 -0.3000942
##  [7]  0.0000000  0.0000000  0.0000000  0.0000000  0.0000000



Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.
Knaus, Michael C. 2022. Double machine learning-based programme evaluation under unconfoundedness.” The Econometrics Journal 25 (3): 602–27. https://doi.org/10.1093/ectj/utac015.