The computational load of function pre
can be heavy.
This vignette shows some ways to reduce it.
There are two main steps in fitting a rule ensemble: 1) Rule generation and 2) Estimation of the final ensemble. Both can be adjusted to reduce computational load.
By default, pre
uses the conditional inference tree
algorithm Hothorn, Hornik, & Zeileis
(2006) as implemented in function ctree
of
R
package partykit
Hothorn & Zeileis (2015) for rule induction.
The main reason is that it does not present with a selection bias
towards variables with a greater number of possible cut points. Yet, use
of ctree
brings a relatively heavy computational load:
airq <- airquality[complete.cases(airquality), ]
airq$Month <- factor(airq$Month)
library("pre")
set.seed(42)
system.time(airq.ens <- pre(Ozone ~ ., data = airq))
## user system elapsed
## 3.83 0.03 4.10
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 3.229132
## number of terms = 13
## mean cv error (se) = 336.2188 (95.24635)
##
## cv error type : Mean-Squared Error
Computational load can be substantially reduced by employing the CART
algorithm of Breiman, Friedman, Olshen, &
Stone (1984) as implemented in function rpart
from
the package of the same name by Therneau &
Atkinson (2022). This can be specified through the
tree.unbiased
argument:
## user system elapsed
## 2.40 0.03 2.51
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.935814
## number of terms = 23
## mean cv error (se) = 287.7377 (86.43107)
##
## cv error type : Mean-Squared Error
Alternatively, rules can be generated using the random-forest
approach originally proposed by Breiman
(2001) as implemented in function randomForest
from
the package of the same name by Liaw & Wiener
(2002):
## Loading required namespace: randomForest
## user system elapsed
## 1.22 0.08 1.32
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.763841
## number of terms = 25
## mean cv error (se) = 288.6653 (62.81364)
##
## cv error type : Mean-Squared Error
Note, however, that the resulting ensembles will likely be more
complex and also present with a selection bias towards variables with a
greater number of possible cutpoints. The higher complexity is also
observed above, where CART and random forest resulted in a substantially
larger number of terms. This is due to the default stopping criteria for
rpart
and randomForest
being considerably less
conservative than that of ctree
. This will result in the
generation of more and longer rules, which will also tend to increase
complexity of the final ensemble. Furthermore, those algorithms prefer
to split using variables with a larger number of cutpoints, and this
bias may propagate to the final rule ensemble.
Reducing tree (and thereby rule) depth will reduce computational load of both rule fitting and estimation of the final ensemble:
## user system elapsed
## 3.26 0.06 3.41
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.676698
## number of terms = 14
## mean cv error (se) = 424.1761 (126.9922)
##
## cv error type : Mean-Squared Error
Likely, reducing the maximum depth will improve interpretability, but may decrease predictive accuracy for the final ensemble.
By default, 500 trees are generated. Computation time can be reduced
substantially by reducing the number of trees. This may of course
negatively impact predictive accuracy. When using a smaller number of
trees, it is likely beneficial to increase the learning rate
(learnrate = .01
, by default) accordingly:
set.seed(42)
system.time(airq.ens.nt <- pre(Ozone ~ ., data = airq, ntrees = 100L, learnrate = .05))
## user system elapsed
## 0.90 0.00 0.93
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 1.935814
## number of terms = 11
## mean cv error (se) = 276.9282 (69.1872)
##
## cv error type : Mean-Squared Error
Function cv.glmnet
from package glmnet
is
used for fitting the final model. The relevant arguments of function
cv.glmnet
can be passed directly to function
pre
.
For example, parallel computation can be employed by specifying
par.final = TRUE
in the call to pre
(and
registering parallel beforehand, e.g., using doMC
).
Parallel computation will not affect performance of the final ensemble.
Note that parallel computation will only reduce computation time for
datasets with a (very) large number of observations. For smaller
datasets, the use of parallel computation may even increase computation
time.
The number of cross-validation repetitions can be also be reduced, but this will probably not reduce computation time much and may negatively affect predictive performance of the final ensemble.
## user system elapsed
## 4.00 0.08 4.33
##
## Final ensemble with cv error within 1se of minimum:
##
## lambda = 2.559032
## number of terms = 13
## mean cv error (se) = 307.2755 (105.2568)
##
## cv error type : Mean-Squared Error