| Type: | Package |
| Title: | Fast Calculation of Feature Contributions in Boosting Trees |
| Version: | 1.0 |
| Date: | 2026-03-02 |
| Description: | Computes feature-specific R-squared (R2) contributions for boosting tree models using a Shapley-value-based decomposition of the total R-squared in polynomial time. Supports models fitted with 'XGBoost' and 'LightGBM', and provides efficient parallel implementations suitable for large-scale problems. Multiple visualization tools are included for interpreting and communicating feature contributions. The methodology is described in Jiang, Zhang, and Zhang (2025) <doi:10.48550/arXiv.2407.03515>. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| URL: | https://github.com/catstats/Q-SHAP_R |
| BugReports: | https://github.com/catstats/Q-SHAP_R/issues |
| Imports: | Rcpp (≥ 1.0.14), xgboost (≥ 3.1.3.1), parallel, lightgbm, viridisLite, ggplot2, jsonlite, methods, progress |
| Suggests: | shiny |
| LinkingTo: | Rcpp, RcppEigen |
| RoxygenNote: | 7.3.2 |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| Packaged: | 2026-03-09 19:50:27 UTC; jiangzhongli |
| Author: | Steven He [aut], Zhongli Jiang [aut, cre], Dabao Zhang [aut] |
| Maintainer: | Zhongli Jiang <zhongli.jiang.stats@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-16 16:00:07 UTC |
Calculating Feature-Specific R-Squared Values for Boosting Trees
Description
The qshap package computes feature-specific R-squared values using Shapley decomposition of the total R-squared for boosting trees built in xgboost and lightgbm. It supports parallel computing.
Details
The package provides fast computation of feature importance through Shapley values for tree ensemble models. Main functions include:
-
gazer(): Create a Q-SHAP explainer from a trained model -
rsq(): Calculate feature-specific R-squared values -
loss(): Calculate feature-specific loss contributions -
plot(): Visualize R-squared values
The method uses polynomial-time complexity for Shapley value calculation and includes built-in support for multi-core processing.
Author(s)
Steven He, Zhongli Jiang, Min Zhang, Dabao Zhang
References
Zhongli Jiang, Min Zhang, and Dabao Zhang. 2025. Fast calculation of feature contributions in boosting trees. In Proceedings of the Forty-First Conference on Uncertainty in Artificial Intelligence (UAI '25), Vol. 286. JMLR.org, Article 82, 1859–1875.
See Also
Useful links:
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
phi_rsq <- rsq(explainer, X, y)
Coercion method to data.frame for qshap_result
Description
Coercion method to data.frame for qshap_result
Usage
## S3 method for class 'qshap_result'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
Arguments
x |
A qshap_result object |
row.names |
Not used |
optional |
Not used |
... |
Additional arguments (currently unused) |
Value
A data.frame with columns feature (character) and
rsq (numeric), sorted by rsq in decreasing order.
Create a QSHAP Tree Explainer
Description
Creates an explainer object for computing feature-specific Shapley values from a trained tree ensemble model. Supports XGBoost and LightGBM models.
Usage
gazer(model, max_depth = NULL, base_score = NULL, ...)
Arguments
model |
A model object of class |
max_depth |
Maximum depth of trees, extracted from |
base_score |
Base score for predictions, extracted from |
... |
Additional arguments, for future use |
Value
A class of qshap_tree_explainer object containing the model information and
preprocessed tree structures for fast Shapley value computation
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
Alias for qshap_loss
Description
This is a convenience alias for qshap_loss() that provides a shorter
function name for calculating feature-specific loss contributions.
Usage
loss(explainer, x, y, y_mean_ori = NULL)
Arguments
explainer |
A qshap_tree_explainer object created by |
x |
Feature matrix or data frame |
y |
Response vector |
y_mean_ori |
Optional pre-computed mean of y (for efficiency) |
Value
A matrix of loss contributions with dimensions (n_samples, n_features)
See Also
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
loss_matrix <- loss(explainer, X, y)
dim(loss_matrix)
Constructor for qshap_result class
Description
Creates a qshap_result object to store Q-SHAP R-squared results
Usage
new_qshap_result(
rsq,
feature_names = NULL,
total_rsq = NULL,
n_samples = NULL,
n_features = NULL,
loss = NULL
)
Arguments
rsq |
Numeric vector of feature-specific R-squared values |
feature_names |
Character vector of feature names (optional) |
total_rsq |
Numeric total R-squared (sum of feature-specific values) |
n_samples |
Integer number of samples used |
n_features |
Integer number of features |
loss |
Optional loss matrix (n_samples x n_features) |
Value
An object of class qshap_result
Constructor for qshap_tree_explainer class
Description
Creates a qshap_tree_explainer object
Usage
new_qshap_tree_explainer(
model,
model_type,
max_depth,
base_score = NULL,
trees,
store_v_invc,
store_z
)
Arguments
model |
The original tree model object |
model_type |
Character string indicating model type ("xgboost" or "lightgbm") |
max_depth |
Integer maximum tree depth |
base_score |
Numeric base score (for XGBoost) |
trees |
List of tree objects |
store_v_invc |
Precomputed complex values for SHAP computation |
store_z |
Precomputed root values for SHAP computation |
Value
An object of class qshap_tree_explainer
Constructor for simple_tree class
Description
Creates a simple_tree object with validation
Usage
new_simple_tree(
children_left,
children_right,
feature,
threshold,
max_depth,
n_node_samples,
value,
node_count
)
Arguments
children_left |
Integer vector of left child indices (-1 for leaf nodes) |
children_right |
Integer vector of right child indices (-1 for leaf nodes) |
feature |
Integer vector of feature indices used for splitting (-1 for leaf nodes) |
threshold |
Numeric vector of threshold values for splits |
max_depth |
Integer maximum depth of the tree |
n_node_samples |
Integer vector of sample counts at each node |
value |
Numeric vector of node values |
node_count |
Integer total number of nodes in the tree |
Value
An object of class simple_tree
Constructor for tree_summary class
Description
Creates a tree_summary object with validation
Usage
new_tree_summary(
children_left,
children_right,
feature,
feature_uniq,
threshold,
max_depth,
sample_weight,
init_prediction,
node_count
)
Arguments
children_left |
Integer vector of left child indices |
children_right |
Integer vector of right child indices |
feature |
Integer vector of feature indices |
feature_uniq |
Integer vector of unique feature indices used in tree |
threshold |
Numeric vector of threshold values |
max_depth |
Integer maximum depth |
sample_weight |
Numeric vector of sample weights per node |
init_prediction |
Numeric vector of initial predictions per node |
node_count |
Integer total number of nodes |
Value
An object of class tree_summary
Plot method for qshap_rsq objects
Description
This S3 method enables 'plot(x, ...)' where 'x' is a 'qshap_rsq' object. It dispatches to the visualization functions in 'vis'.
Usage
## S3 method for class 'qshap_rsq'
plot(
x,
y = NULL,
type = c("rsq", "elbow", "cumu", "gcorr", "hist", "density", "loss"),
...
)
Arguments
x |
A 'qshap_rsq' object. |
y |
Not used. |
type |
Plot type: one of "rsq", "elbow", "cumu", "gcorr", "hist", "density", or "loss". |
... |
Passed to the underlying visualization function. |
Value
A ggplot2 object (invisibly).
Plot Q-SHAP R-squared contributions
Description
Convenience wrapper that works for both a 'qshap_rsq' object and a plain numeric vector of contributions. Use this if you have a numeric vector and still want to pass arguments like 'color_map_name'.
Usage
plot_qshap(
x,
type = c("rsq", "elbow", "cumu", "gcorr", "hist", "density", "loss"),
...
)
Arguments
x |
A 'qshap_rsq' object (recommended) or a numeric vector. |
type |
Plot type; see 'plot.qshap_rsq'. Use '"loss"' to launch the interactive explorer (requires a loss matrix). |
... |
Additional arguments passed to the underlying visualization function (e.g., 'label', 'rotation', 'color_map_name', 'max_feature'). |
Value
The ggplot2 plot object (invisibly)
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15L, max_depth = 2L, verbosity = 0L, nthreads = 1L)
explainer <- gazer(model)
phi_rsq <- rsq(explainer, X, y)
plot(phi_rsq)
Print method for qshap_result
Description
Print method for qshap_result
Usage
## S3 method for class 'qshap_result'
print(x, n = 10, ...)
Arguments
x |
A qshap_result object |
n |
Integer number of top features to display (default: 10) |
... |
Additional arguments (currently unused) |
Value
The input x is returned invisibly. Called primarily for its
side effect of printing a summary of the qshap_result object to the
console.
Print method for qshap_tree_explainer
Description
Print method for qshap_tree_explainer
Usage
## S3 method for class 'qshap_tree_explainer'
print(x, ...)
Arguments
x |
A qshap_tree_explainer object |
... |
Additional arguments (currently unused) |
Value
The input x is returned invisibly. Called primarily for its
side effect of printing a summary of the qshap_tree_explainer object
to the console.
Print method for simple_tree
Description
Print method for simple_tree
Usage
## S3 method for class 'simple_tree'
print(x, ...)
Arguments
x |
A simple_tree object |
... |
Additional arguments (currently unused) |
Value
The input x is returned invisibly. Called primarily for its
side effect of printing a summary of the simple_tree object to the
console.
Print method for tree_summary
Description
Print method for tree_summary
Usage
## S3 method for class 'tree_summary'
print(x, ...)
Arguments
x |
A tree_summary object |
... |
Additional arguments (currently unused) |
Value
The input x is returned invisibly. Called primarily for its
side effect of printing a summary of the tree_summary object to the
console.
Alias for rsq
Description
This is a convenience alias for rsq() that provides a shorter
function name for calculating feature-specific R-squared values.
Usage
qshap(
explainer,
x,
y,
feature_names = NULL,
local = FALSE,
nsample = NULL,
sd_out = TRUE,
ci_out = TRUE,
level = 0.95,
nfrac = NULL,
random_state = 42,
ncore = 1L
)
Arguments
explainer |
A qshap_tree_explainer object created by |
x |
Feature matrix or data frame with n samples and p features |
y |
Response vector of length n |
feature_names |
Character vector of feature names. If NULL, uses column names from x. |
local |
Logical; if TRUE, returns both R-squared values and loss matrix |
nsample |
Optional integer; number of samples to use (random subsample if less than nrow(x)) |
sd_out |
Logical; if TRUE, returns standard deviations of R-squared estimates |
ci_out |
Logical; if TRUE, returns Wald-style confidence intervals for each feature's R-squared (normal approximation using sd_rsq) |
level |
Confidence level for the intervals (default 0.95) |
nfrac |
Optional numeric in (0,1); fraction of samples to use (alternative to nsample) |
random_state |
Integer seed for reproducible sampling |
ncore |
Number of cores for parallel processing. Use -1 for all available cores, or a positive integer. Default is 1 (no parallelization) |
Value
A qshap_result object; see rsq for details.
See Also
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
phi_rsq <- qshap(explainer, X, y)
print(phi_rsq)
S3 Class Constructors and Methods for qshap
Description
This file contains formal S3 class definitions, constructors, validators, and methods for the qshap package objects.
Calculate Q-SHAP Loss Contributions
Description
Computes the feature-specific loss contributions using Q-SHAP decomposition.
This is an internal function typically called by rsq().
Usage
qshap_loss(explainer, x, y, y_mean_ori = NULL)
Arguments
explainer |
A qshap_tree_explainer object created by |
x |
Feature matrix or data frame |
y |
Response vector |
y_mean_ori |
Optional pre-computed mean of y (for efficiency) |
Value
A matrix of loss contributions with dimensions (n_samples, n_features)
User-friendly constructor for qshap_result
Description
User-friendly constructor for qshap_result
Usage
qshap_result(
rsq,
feature_names = NULL,
total_rsq = NULL,
n_samples = NULL,
n_features = NULL,
loss = NULL
)
Arguments
rsq |
Numeric vector of feature-specific R-squared values |
feature_names |
Character vector of feature names (optional) |
total_rsq |
Numeric total R-squared (sum of feature-specific values) |
n_samples |
Integer number of samples used |
n_features |
Integer number of features |
loss |
Optional loss matrix (n_samples x n_features) |
Value
A validated qshap_result object
Calculate Feature-Specific R-Squared Values
Description
Computes feature-specific R-squared values using Q-SHAP decomposition. Supports parallel processing and sampling for large datasets.
Usage
qshap_rsq(
explainer,
x,
y,
local = FALSE,
nsample = NULL,
sd_out = TRUE,
ci_out = TRUE,
level = 0.95,
nfrac = NULL,
random_state = 42,
ncore = 1L
)
Arguments
explainer |
A qshap_tree_explainer object created by |
x |
Feature matrix or data frame with n samples and p features |
y |
Response vector of length n |
local |
Logical; if TRUE, returns both R-squared values and loss matrix |
nsample |
Optional integer; number of samples to use (random subsample if less than nrow(x)) |
sd_out |
Logical; if TRUE, returns standard deviations of R-squared estimates |
ci_out |
Logical; if TRUE, returns Wald-style confidence intervals for each feature's R-squared (normal approximation using sd_rsq) |
level |
Confidence level for the intervals (default 0.95) |
nfrac |
Optional numeric in (0,1); fraction of samples to use (alternative to nsample) |
random_state |
Integer seed for reproducible sampling |
ncore |
Number of cores for parallel processing. Use -1 for all available cores, or a positive integer. Default is 1 (no parallelization) |
Value
If local=FALSE (default), returns a numeric vector of length p
containing feature-specific R-squared values. If local=TRUE, returns
a list with components rsq (the R-squared vector) and loss
(an n x p matrix of loss contributions). When ci_out=TRUE, the returned list
also contains ci_lower and ci_upper vectors representing Wald-style confidence intervals.
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
phi_rsq <- qshap(explainer, X, y)
print(phi_rsq)
Calculate Feature-Specific R-Squared Values
Description
Computes feature-specific R-squared values using Q-SHAP decomposition,
returning a qshap_result object with better formatting and additional metadata.
The qshap_result object includes feature names, total R², sample counts,
and provides enhanced print(), summary(), and as.data.frame()
methods for easier analysis.
Usage
rsq(
explainer,
x,
y,
feature_names = NULL,
local = FALSE,
nsample = NULL,
sd_out = TRUE,
ci_out = TRUE,
level = 0.95,
nfrac = NULL,
random_state = 42,
ncore = 1L
)
Arguments
explainer |
A qshap_tree_explainer object created by |
x |
Feature matrix or data frame with n samples and p features |
y |
Response vector of length n |
feature_names |
Character vector of feature names. If NULL, uses column names from x. |
local |
Logical; if TRUE, returns both R-squared values and loss matrix |
nsample |
Optional integer; number of samples to use (random subsample if less than nrow(x)) |
sd_out |
Logical; if TRUE, returns standard deviations of R-squared estimates |
ci_out |
Logical; if TRUE, returns Wald-style confidence intervals for each feature's R-squared (normal approximation using sd_rsq) |
level |
Confidence level for the intervals (default 0.95) |
nfrac |
Optional numeric in (0,1); fraction of samples to use (alternative to nsample) |
random_state |
Integer seed for reproducible sampling |
ncore |
Number of cores for parallel processing. Use -1 for all available cores, or a positive integer. Default is 1 (no parallelization) |
Details
This function provides a user-friendly interface for Q-SHAP R² computation:
Automatically extracts feature names from the input data
Returns a structured object with metadata
Provides enhanced printing with top features displayed by default
Includes a comprehensive
summary()methodCan be easily converted to a data frame with
as.data.frame()
Value
A qshap_result object containing:
-
rsq: Numeric vector of feature-specific R² values -
feature_names: Character vector of feature names -
total_rsq: Total R² (sum of feature-specific values) -
n_samples: Number of samples -
n_features: Number of features -
loss: Loss matrix (if local=TRUE)
See Also
Examples
library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
result <- rsq(explainer, X, y)
print(result)
User-friendly constructor for simple_tree
Description
User-friendly constructor for simple_tree
Usage
simple_tree(
children_left,
children_right,
feature,
threshold,
max_depth,
n_node_samples,
value,
node_count
)
Arguments
children_left |
Integer vector of left child indices (-1 for leaf nodes) |
children_right |
Integer vector of right child indices (-1 for leaf nodes) |
feature |
Integer vector of feature indices used for splitting (-1 for leaf nodes) |
threshold |
Numeric vector of threshold values for splits |
max_depth |
Integer maximum depth of the tree |
n_node_samples |
Integer vector of sample counts at each node |
value |
Numeric vector of node values |
node_count |
Integer total number of nodes in the tree |
Value
A validated simple_tree object
Summary method for qshap_result
Description
Summary method for qshap_result
Usage
## S3 method for class 'qshap_result'
summary(object, ...)
Arguments
object |
A qshap_result object |
... |
Additional arguments (currently unused) |
Value
The input object is returned invisibly. Called primarily for
its side effect of printing a detailed summary of the qshap_result
object to the console.
Summary method for qshap_rsq objects
Description
Provides a summary of the qshap_rsq object, showing the top features by R-squared contribution
Usage
## S3 method for class 'qshap_rsq'
summary(object, n = 10, ...)
Arguments
object |
A |
n |
Integer number of top features to display (default: 10) |
... |
Additional arguments (currently unused) |
Value
The input object is returned invisibly. Called primarily for
its side effect of printing a summary of the qshap_rsq object to
the console.
Summary method for qshap_tree_explainer
Description
Provides detailed summary information about the explainer
Usage
## S3 method for class 'qshap_tree_explainer'
summary(object, ...)
Arguments
object |
A qshap_tree_explainer object |
... |
Additional arguments (currently unused) |
Value
The input object is returned invisibly. Called primarily for
its side effect of printing a detailed summary of the
qshap_tree_explainer object to the console.
User-friendly constructor for tree_summary
Description
User-friendly constructor for tree_summary
Usage
tree_summary(
children_left,
children_right,
feature,
feature_uniq,
threshold,
max_depth,
sample_weight,
init_prediction,
node_count
)
Arguments
children_left |
Integer vector of left child indices |
children_right |
Integer vector of right child indices |
feature |
Integer vector of feature indices |
feature_uniq |
Integer vector of unique feature indices used in tree |
threshold |
Numeric vector of threshold values |
max_depth |
Integer maximum depth |
sample_weight |
Numeric vector of sample weights per node |
init_prediction |
Numeric vector of initial predictions per node |
node_count |
Integer total number of nodes |
Value
A validated tree_summary object
Validator for qshap_result
Description
Validator for qshap_result
Usage
validate_qshap_result(x)
Arguments
x |
A qshap_result object |
Value
The validated object (invisibly) or stops with an error
Validator for qshap_tree_explainer
Description
Validator for qshap_tree_explainer
Usage
validate_qshap_tree_explainer(x)
Arguments
x |
A qshap_tree_explainer object |
Value
The validated object (invisibly) or stops with an error
Validator for simple_tree class
Description
Validator for simple_tree class
Usage
validate_simple_tree(x)
Arguments
x |
A simple_tree object |
Value
The validated object (invisibly) or stops with an error
Validator for tree_summary class
Description
Validator for tree_summary class
Usage
validate_tree_summary(x)
Arguments
x |
A tree_summary object |
Value
The validated object (invisibly) or stops with an error
Visualization Module for Q-SHAP Results
Description
An environment containing visualization functions for Q-SHAP results.
Access functions using vis$rsq(), vis$elbow(), etc.
Usage
vis
Format
An environment with visualization functions:
- rsq
Bar plot of feature-specific R-squared values
- elbow
Elbow plot showing top contributing features
- cumu
Cumulative explained variance plot
- gcorr
Generalized correlation plot (square root of R-squared)
- hist
Histogram of feature-specific R-squared contributions
- density
Density plot of feature-specific R-squared contributions
- loss
Interactive loss explorer (requires shiny)