Speeding up computations

Marjolein Fokkema

The computational load of function pre can be heavy. This vignette shows some ways to reduce it.

There are two main steps in fitting a rule ensemble: 1) Rule generation and 2) Estimation of the final ensemble. Both can be adjusted to reduce computational load.

Faster rule generation

Tree fitting algorithm

By default, pre uses the conditional inference tree algorithm Hothorn, Hornik, & Zeileis (2006) as implemented in function ctree of R package partykit Hothorn & Zeileis (2015) for rule induction. The main reason is that it does not present with a selection bias towards variables with a greater number of possible cut points. Yet, use of ctree brings a relatively heavy computational load:

airq <- airquality[complete.cases(airquality), ]
airq$Month <- factor(airq$Month)
library("pre")
set.seed(42)
system.time(airq.ens <- pre(Ozone ~ ., data = airq))
##    user  system elapsed 
##    3.83    0.03    4.10
summary(airq.ens)
## 
## Final ensemble with cv error within 1se of minimum: 
## 
##   lambda =  3.229132
##   number of terms = 13
##   mean cv error (se) = 336.2188 (95.24635)
## 
##   cv error type : Mean-Squared Error

Computational load can be substantially reduced by employing the CART algorithm of Breiman, Friedman, Olshen, & Stone (1984) as implemented in function rpart from the package of the same name by Therneau & Atkinson (2022). This can be specified through the tree.unbiased argument:

set.seed(42)
system.time(airq.ens.cart <- pre(Ozone ~ ., data = airq, tree.unbiased = FALSE))
##    user  system elapsed 
##    2.40    0.03    2.51
summary(airq.ens.cart)
## 
## Final ensemble with cv error within 1se of minimum: 
## 
##   lambda =  1.935814
##   number of terms = 23
##   mean cv error (se) = 287.7377 (86.43107)
## 
##   cv error type : Mean-Squared Error

Alternatively, rules can be generated using the random-forest approach originally proposed by Breiman (2001) as implemented in function randomForest from the package of the same name by Liaw & Wiener (2002):

set.seed(42)
system.time(airq.ens.rf <- pre(Ozone ~ ., data = airq, randomForest = TRUE))
## Loading required namespace: randomForest
##    user  system elapsed 
##    1.22    0.08    1.32
summary(airq.ens.rf)
## 
## Final ensemble with cv error within 1se of minimum: 
## 
##   lambda =  1.763841
##   number of terms = 25
##   mean cv error (se) = 288.6653 (62.81364)
## 
##   cv error type : Mean-Squared Error

Note, however, that the resulting ensembles will likely be more complex and also present with a selection bias towards variables with a greater number of possible cutpoints. The higher complexity is also observed above, where CART and random forest resulted in a substantially larger number of terms. This is due to the default stopping criteria for rpart and randomForest being considerably less conservative than that of ctree. This will result in the generation of more and longer rules, which will also tend to increase complexity of the final ensemble. Furthermore, those algorithms prefer to split using variables with a larger number of cutpoints, and this bias may propagate to the final rule ensemble.

Maximum rule depth

Reducing tree (and thereby rule) depth will reduce computational load of both rule fitting and estimation of the final ensemble:

set.seed(42)
system.time(airq.ens.md <- pre(Ozone ~ ., data = airq, maxdepth = 1L))
##    user  system elapsed 
##    3.26    0.06    3.41
summary(airq.ens.md)
## 
## Final ensemble with cv error within 1se of minimum: 
## 
##   lambda =  1.676698
##   number of terms = 14
##   mean cv error (se) = 424.1761 (126.9922)
## 
##   cv error type : Mean-Squared Error

Likely, reducing the maximum depth will improve interpretability, but may decrease predictive accuracy for the final ensemble.

Number of trees

By default, 500 trees are generated. Computation time can be reduced substantially by reducing the number of trees. This may of course negatively impact predictive accuracy. When using a smaller number of trees, it is likely beneficial to increase the learning rate (learnrate = .01, by default) accordingly:

set.seed(42)
system.time(airq.ens.nt <- pre(Ozone ~ ., data = airq, ntrees = 100L, learnrate = .05))
##    user  system elapsed 
##    0.90    0.00    0.93
summary(airq.ens.nt)
## 
## Final ensemble with cv error within 1se of minimum: 
## 
##   lambda =  1.935814
##   number of terms = 11
##   mean cv error (se) = 276.9282 (69.1872)
## 
##   cv error type : Mean-Squared Error

Faster estimation of the final ensemble

Function cv.glmnet from package glmnet is used for fitting the final model. The relevant arguments of function cv.glmnet can be passed directly to function pre.

Parallel computation

For example, parallel computation can be employed by specifying par.final = TRUE in the call to pre (and registering parallel beforehand, e.g., using doMC). Parallel computation will not affect performance of the final ensemble. Note that parallel computation will only reduce computation time for datasets with a (very) large number of observations. For smaller datasets, the use of parallel computation may even increase computation time.

Number of cross-validation folds

The number of cross-validation repetitions can be also be reduced, but this will probably not reduce computation time much and may negatively affect predictive performance of the final ensemble.

set.seed(42)
system.time(airq.ens.nf <- pre(Ozone ~ ., data = airq, nfolds = 5L))
##    user  system elapsed 
##    4.00    0.08    4.33
summary(airq.ens.nf)
## 
## Final ensemble with cv error within 1se of minimum: 
## 
##   lambda =  2.559032
##   number of terms = 13
##   mean cv error (se) = 307.2755 (105.2568)
## 
##   cv error type : Mean-Squared Error

References

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. New York: Routledge. https://doi.org/10.1201/9781315139470
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. https://doi.org/10.1198/106186006X133933
Hothorn, T., & Zeileis, A. (2015). Partykit: A modular toolkit for recursive partytioning in R. The Journal of Machine Learning Research, 16(1), 3905–3909. Retrieved from https://www.jmlr.org/papers/v16/hothorn15a.html
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22. Retrieved from https://CRAN.R-project.org/doc/Rnews/
Therneau, T., & Atkinson, B. (2022). Rpart: Recursive partitioning and regression trees. Retrieved from https://CRAN.R-project.org/package=rpart