Analyst Tools for betaregscale

Overview

This vignette presents analyst-oriented tools added to betaregscale for post-estimation workflows:

The examples use bounded scale data under the interval-censored beta regression framework implemented in the package.

library(betaregscale)

Mathematical foundations

Complete likelihood by censoring type

For each observation \(i\), let \(\delta_i\in\{0,1,2,3\}\) indicate exact, left-censored, right-censored, or interval-censored status. With beta CDF \(F(\cdot)\), beta density \(f(\cdot)\), and interval endpoints \([l_i,u_i]\), the contribution is:

\[ L_i(\theta)= \begin{cases} f(y_i; a_i, b_i), & \delta_i=0,\\ F(u_i; a_i, b_i), & \delta_i=1,\\ 1 - F(l_i; a_i, b_i), & \delta_i=2,\\ F(u_i; a_i, b_i)-F(l_i; a_i, b_i), & \delta_i=3. \end{cases} \]

This is the basis for fitting, prediction, and validation metrics.

Model-comparison metrics

brs_table() reports:

where \(k\) is the number of estimated parameters and \(n\) is the sample size.

Average marginal effects (AME)

brs_marginaleffects() computes AME by finite differences:

\[ \mathrm{AME}_j=\frac{1}{n}\sum_{i=1}^n\frac{\hat g_i(x_{ij}+h)-\hat g_i(x_{ij})}{h}, \]

with \(\hat g_i\) on the requested prediction scale (response or link). For binary covariates \(x_j\in\{0,1\}\), it uses the discrete contrast \(\hat g(x_j=1)-\hat g(x_j=0)\).

Score-scale probabilities

For integer scores \(s\in\{0,\dots,K\}\), brs_predict_scoreprob() computes:

\[ P(Y=s)= \begin{cases} F(\mathrm{lim}/K), & s=0,\\ 1-F((K-\mathrm{lim})/K), & s=K,\\ F((s+\mathrm{lim})/K)-F((s-\mathrm{lim})/K), & 1\le s\le K-1. \end{cases} \]

These probabilities are directly aligned with interval geometry on the original instrument scale.

Cross-validation log score

In brs_cv(), fold-level predictive quality includes:

\[ \mathrm{log\_score}=\frac{1}{n_{test}}\sum_{i \in test}\log(p_i), \]

where \(p_i\) is the predictive contribution from the same censoring-rule piecewise definition shown above.

It also reports:

\[ \mathrm{RMSE}_{yt}=\sqrt{\frac{1}{n_{test}}\sum(y_{t,i}-\hat\mu_i)^2}, \qquad \mathrm{MAE}_{yt}=\frac{1}{n_{test}}\sum|y_{t,i}-\hat\mu_i|. \]

Reproducible workflow

1) Simulate data and fit candidate models

set.seed(2026)
n <- 220
d <- data.frame(
  x1 = rnorm(n),
  x2 = rnorm(n),
  z1 = rnorm(n)
)

sim <- brs_sim(
  formula = ~ x1 + x2 | z1,
  data = d,
  beta = c(0.20, -0.45, 0.25),
  zeta = c(0.15, -0.30),
  ncuts = 100,
  repar = 2
)

fit_fixed <- brs(y ~ x1 + x2, data = sim, repar = 2)
fit_var <- brs(y ~ x1 + x2 | z1, data = sim, repar = 2)

2) Compare models in one table

tab <- brs_table(
  fixed = fit_fixed,
  variable = fit_var,
  sort_by = "AIC"
)
kbl10(tab)
model nobs npar logLik AIC BIC pseudo_r2 exact left right interval prop_exact prop_left prop_right prop_interval
variable 220 5 -931.8646 1873.729 1890.697 0.1381 0 8 22 190 0 0.0364 0.1 0.8636
fixed 220 4 -939.3243 1886.649 1900.223 0.1381 0 8 22 190 0 0.0364 0.1 0.8636

3) Estimate average marginal effects

set.seed(2026) # For marginal effects simulation
me_mean <- brs_marginaleffects(
  fit_var,
  model = "mean",
  type = "response",
  interval = TRUE,
  n_sim = 120
)
kbl10(me_mean)
variable ame std.error ci.lower ci.upper model type n
x1 -0.1035 0.0178 -0.1339 -0.0675 mean response 220
x2 0.0734 0.0188 0.0360 0.1071 mean response 220

set.seed(2026) # Reset seed for reproducibility
me_precision <- brs_marginaleffects(
  fit_var,
  model = "precision",
  type = "link",
  interval = TRUE,
  n_sim = 120
)
kbl10(me_precision)
variable ame std.error ci.lower ci.upper model type n
z1 -0.3361 0.0835 -0.4781 -0.1413 precision link 220

4) Predict score probabilities

P <- brs_predict_scoreprob(fit_var, scores = 0:10)
dim(P)
#> [1] 220  11
kbl10(P[1:6, 1:5])
score_0 score_1 score_2 score_3 score_4
0.0146 0.0178 0.0146 0.0130 0.0121
0.0215 0.0228 0.0178 0.0155 0.0141
0.0465 0.0318 0.0216 0.0175 0.0151
0.0731 0.0371 0.0233 0.0181 0.0152
0.0076 0.0105 0.0090 0.0083 0.0078
0.0050 0.0050 0.0039 0.0034 0.0030
kbl10(
  data.frame(row_sum = rowSums(P)),
  digits = 4
)
row_sum
0.1344
0.1614
0.2001
0.2313
0.0856
0.0354
0.0295
0.0994
0.1024
0.1312

rowSums(P) should be close to 1 (up to numerical tolerance and score subset selection).

5) ggplot2 diagnostics

autoplot.brs(fit_var, type = "calibration")

autoplot.brs(fit_var, type = "score_dist", scores = 0:20)

autoplot.brs(fit_var, type = "cdf", max_curves = 4)

autoplot.brs(fit_var, type = "residuals_by_delta", residual_type = "rqr")

6) Repeated k-fold cross-validation

set.seed(2026) # For cross-validation reproducibility
cv_res <- brs_cv(
  y ~ x1 + x2 | z1,
  data = sim,
  k = 3,
  repeats = 1,
  repar = 2
)

kbl10(cv_res)
repeat fold n_train n_test log_score rmse_yt mae_yt converged error
1 1 146 74 -4.3077 0.3215 0.2862 TRUE NA
1 2 147 73 -4.3171 0.3449 0.3032 TRUE NA
1 3 147 73 -4.1676 0.3454 0.3022 TRUE NA
kbl10(
  data.frame(
    metric = c("log_score", "rmse_yt", "mae_yt"),
    mean = colMeans(cv_res[, c("log_score", "rmse_yt", "mae_yt")], na.rm = TRUE)
  ),
  digits = 4
)
metric mean
log_score log_score -4.2641
rmse_yt rmse_yt 0.3373
mae_yt mae_yt 0.2972

Practical interpretation

References