Help for package aamatch

Type:

Package

Title:

Artless Automatic or Artful Multivariate Matching for Observational Studies

Version:

0.4.5

Maintainer:

Paul Rosenbaum <rosenbaum@wharton.upenn.edu>

Description:

Implements a simple version of multivariate matching using a propensity score, near-exact matching, near-fine balance, and robust Mahalanobis distance matching (Rosenbaum 2020 <doi:10.1146/annurev-statistics-031219-041058>). You specify the variables, and the program does everything else.

License:

GPL-2

Encoding:

UTF-8

Imports:

iTOS, stats

Suggests:

DOS2, sensitivity2x2xk, sensitivitymv, weightedRank, xtable

Depends:

R (≥ 3.5.0)

NeedsCompilation:

Packaged:

2026-02-01 18:04:26 UTC; rosenbap

Author:

Paul Rosenbaum [aut, cre]

Repository:

CRAN

Date/Publication:

2026-02-01 20:00:02 UTC

Artless Automatic or Artful Multivariate Matching for Observational Studies

Description

Details

Package aamatch implements a simple version of multivariate matching in observational studies, using propensity scores, minimum distance matching, near-exact matching and fine balance. The main function in the package is artlessV2(). artlessV2() calls the function alittleArt(), and the latter gives the user finer control over the match.

Author(s)

Paul Rosenbaum [aut, cre]

Maintainer: Paul Rosenbaum <rosenbaum@wharton.upenn.edu>

References

Rosenbaum, P. R. (2020a) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd Edition). New York: Springer.

Rosenbaum, P. R. (2020b). <doi:10.1146/annurev-statistics-031219-041058> Modern algorithms for matching in observational studies. Annual Review of Statistics and Its Application, 7(1), 143-176.

Rosenbaum, P. R. (2025) <doi:10.1007/978-3-031-90494-3> Introduction to the Theory of Observational Studies. New York: Springer.

Zhang, B., D. S. Small, K. B. Lasater, M. McHugh, J. H. Silber, and P. R. Rosenbaum (2023) <doi:10.1080/01621459.2021.1981337> Matching one sample according to two criteria in observational studies. Journal of the American Statistical Association, 118, 1140-1151.

Matched Periodontal Disease Data

Description

Matched data from NHANES 2009-2010, 2011-2012, 2013-2014 concerning smoking and periodontal disease. The matched data were built from the unmatched data in PeriUnmatched in this package.

Usage

data("PeriMatched")

Format

A data frame with 3489 observations on the following 18 variables.

SEQN: NHANES ID number
female: 1=female, 0=male
age: Age in years, capped at 80 for confidentiality
ageFloor: Age decade = floor(age/10)
educ: Education as 1 to 5. 1 is less than 9th grade, 2 at least 9th grade with no high school degree, 3 is a high school degree, 4 is some college, such as a 2-year associates degree, 5 is at least a 4-year college degree.
noHS: No high school degree. 1 if educ is 1 or 2, 0 if educ is 3 or more
income: Ratio of family income to the poverty level, capped at 5 for confidenditality
nh: The specific NHANES survey. A factor nh0910 < nh1112 < nh1314
cigsperday: Number of cigarettes smoked per day. 0 for nonsmokers.
z: Daily smoker. 1 indicates someone who smokes everyday. 0 indicates a never-smoker who smoked fewer than 100 cigarettes in their life.
pd: A percent indicating periodontal disease. See details.
prop: A propensity score created in the example for PeriUnmatched. This propensity score decided which smokers would have 1 control and which would have 5 controls.
mset: Indicator of the matched set, 1, 2, ..., 1425
treated: The SEQN for the smoker in this matched set. Contains the same information as mset, but in a different form.
pair: 1 for a matched pair, 0 for a 1-to-4 matched set
grp2: An ordered factor with the same information as z: S=daily smoker, N=never smoker. S < N
grp3: A factor with the joint information in pair and grp2. 1-1:S 1-1:N 1-4:S 1-4:N

Details

Measurements were made for up to 28 teeth, 14 upper, 14 lower, excluding 4 wisdom teeth. Pocket depth and loss of attachment are two complementary measures of the degree to which the gums have separated from the teeth; see Wei, Barker and Eke (2013). Pocket depth and loss of attachment are measured at six locations on each tooth, providing the tooth is present. A measurement at a location was taken to exhibit disease if it had either a loss of attachement >=4mm or a pocked depth >=4mm, so each tooth contributes six binary scores, up to 6x28=168 binary scores. The variable pd is the percent of these binary scores indicating periodontal disease, 0 to 100 percent.

Note

All analyses below distinguish the 1-to-1 pairs and the 1-to-4 sets, even though the information they provide is often combined. Alternatively, one can combine analyses of pairs and 1-to-4 sets using methods that take account of the matched blocks of variable sizes. For instance, for continuous responses, one can use the methods in Rosenbaum (2007) as implemented in the R package sensitivitymv; see also Rosenbaum (2015). For binary responses, one can use the methods in Rosenbaum and Small (2017) as implemented in the R package sensitivity2x2xk.

In contrast, some care is required in plots and descriptive statistics. One can straightforwardly plot the pairs, then separately plot the 1-to-4 sets, and one can do the same with descriptive statistics. Suppose, however, that one merges the two treated groups from pairs and 1-to-4 sets, and merges the two control groups from pairs and 1-to-4 sets; then marginal distributions of outcomes from the pooled treated and control groups are no longer comparable. See Pimentel, Yoon and Keele (2015). For instance, in the example, there is exact matching for sex; however, most pairs are men and most 1-to-4 sets are women. Pool the pairs and the 1-to-4 sets and the pooled control group has proportionately more women than the pooled treated group. To see this, type:

data("PeriMatched")

tapply(PeriMatched$female,PeriMatched$grp3,mean)

tapply(PeriMatched$female,PeriMatched$grp2,mean)

The simple, often enlightening, solution is to plot pairs and 1-to-4 sets in parallel but separately, and to do the same with descriptive statistics.

Source

US National Health and Nutrition Examination Survey (NHANES). https://www.cdc.gov/nchs/nhanes/

References

Pimentel, S. D., Yoon, F., & Keele, L. (2015) <doi:10.1002/sim.6593> Variable‐ratio matching with fine balance in a study of the Peer Health Exchange. Statistics in Medicine, 34(30), 4070-4082.

Rosenbaum, P. R. (2007) <doi:10.1111/j.1541-0420.2006.00717.x> Sensitivity analysis for m-estimates, tests, and confidence intervals in matched observational studies. Biometrics, 63(2), 456-464.

Rosenbaum, P. R. (2015) <doi:10.1353/obs.2015.0000> Two R packages for sensitivity analysis in observational studies. Observational Studies, 1(2), 1-17. Available on-line at: muse.jhu.edu/article/793399/summary

Rosenbaum, P. R. (2016) <doi:10.1214/16-AOAS942> Using Scheffe projections for multiple outcomes in an observational study of smoking and periondontal disease. Annals of Applied Statistics, 10, 1447-1471.

Rosenbaum, P. R., & Small, D. S. (2017) <doi:10.1111/biom.12591> An adaptive Mantel–Haenszel test for sensitivity analysis in observational studies. Biometrics, 73(2), 422-430.

Rosenbaum, Paul R. (2026) <doi:10.1080/00031305.2026.2623909> A design for observational studies in which some people avoid treatment. American Statistician, to appear.

Tomar, S. L. and Asma, S. (2000). Smoking attributable periodontitis in the United States: Findings from NHANES III. J. Periodont. 71, 743-751.

Wei, L., Barker, L. and Eke, P. (2013). Array applications in determining periodontal disease measurement. SouthEast SAS User's Group. (SESUG2013) Paper CC-15, analytics.ncsu.edu/ sesug/2013/CC-15.pdf.

Examples

data(PeriMatched)

# The analysis in Rosenbaum (2025) is replicated below
#
dm2<-PeriMatched
dm<-PeriMatched[PeriMatched$pair==1,]
dm1<-PeriMatched[PeriMatched$pair==0,]
pd1<-t(matrix(dm$pd,2,dim(dm)[1]/2))
pd4<-t(matrix(dm1$pd,5,dim(dm1)[1]/5))
dm2$mset<-as.integer(dm2$mset)

#
#  Make Figure 1
#
old.par <- par(no.readonly = TRUE)
par(mfrow=c(1,3))

boxplot(dm2$prop~dm2$grp3,names=c(expression(S[1]),expression(N[1]),
                                  expression(S[4]),expression(N[4])),
        las=1,sub="Left is 1-1,  Right is 1-4",cex.sub=.9,cex.axis=1,
        ylab="Propensity Score",xlab="(i) Propensity Score")
#axis(3,at=1:4,lab=round(tapply(dm2$prop,dm2$grp3,mean),2),cex.axis=1)
axis(3,at=1:4,lab=c("0.36","0.34","0.10","0.10"),cex.axis=1) # don't round 0.1

boxplot(dm2$educ~dm2$grp3,names=c(expression(S[1]),expression(N[1]),
                                  expression(S[4]),expression(N[4])),
        las=1,sub="Left is 1-1,  Right is 1-4",cex.sub=.9,cex.axis=1,
        ylab="Education: 1 is <9th, 3 is HS, 5 is BA",xlab="(ii) Education")
#axis(3,at=1:4,lab=round(tapply(dm2$educ,dm2$grp3,mean),1),cex.axis=1)
axis(3,at=1:4,lab=c("3.0","3.1","4.0","4.0"),cex.axis=1)

boxplot(dm2$income~dm2$grp3,names=c(expression(S[1]),expression(N[1]),
                                    expression(S[4]),expression(N[4])),
        las=1,sub="Left is 1-1,  Right is 1-4",cex.sub=.9,cex.axis=1,
        ylab="Income / (Poverty Level)",xlab="(iii) Income")
axis(3,at=1:4,lab=round(tapply(dm2$income,dm2$grp3,mean),1),cex.axis=1)

#
# Make Figure 2
#
par(mfrow=c(1,2))

boxplot(dm2$cigsperday~dm2$grp3,names=c(expression(S[1]),expression(N[1]),
                                        expression(S[4]),expression(N[4])),
        las=1,sub="Left is 1-1,  Right is 1-4",cex.sub=.9,cex.axis=1,
        ylab="Cigarettes Per Day",xlab="(i) Cigarettes Per Day")
axis(3,at=1:4,lab=round(tapply(dm2$cigsperday,dm2$grp3,mean),0),cex.axis=1)


boxplot(dm2$pd~dm2$grp3,names=c(expression(S[1]),expression(N[1]),
                                expression(S[4]),expression(N[4])),
        las=1,sub="Left is 1-1,  Right is 1-4",cex.sub=.9,cex.axis=1,
        ylab="Periodonal Disease",xlab="(ii) Periodontal Disease")
axis(3,at=1:4,lab=round(tapply(dm2$pd,dm2$grp3,mean),0),cex.axis=1)

#
# Make Table 1
#
tb<-NULL
N<-tapply(dm2$female,dm2$grp3,length)
tb<-cbind(tb,N)
rm(N)
Female<-tapply(dm2$female,dm2$grp3,mean)*100
tb<-cbind(tb,Female)
rm(Female)
Age<-tapply(dm2$age,dm2$grp3,mean)
tb<-cbind(tb,Age)
rm(Age)
Income<-tapply(dm2$income,dm2$grp3,mean)
tb<-cbind(tb,Income)
rm(Income)
Income10<-tapply(dm2$income,dm2$grp3,quantile,c(.1))
tb<-cbind(tb,Income10)
rm(Income10)
Income90<-tapply(dm2$income,dm2$grp3,quantile,c(.9))
tb<-cbind(tb,Income90)
rm(Income90)

Education25<-tapply(dm2$educ,dm2$grp3,quantile,c(.25))
tb<-cbind(tb,Education25)
rm(Education25)
Education50<-tapply(dm2$educ,dm2$grp3,quantile,c(.5))
tb<-cbind(tb,Education50)
rm(Education50)
Education75<-tapply(dm2$educ,dm2$grp3,quantile,c(.75))
tb<-cbind(tb,Education75)
rm(Education75)
PropensityMin<-tapply(dm2$prop,dm2$grp3,min)
tb<-cbind(tb,PropensityMin)
rm(PropensityMin)
Propensity<-tapply(dm2$prop,dm2$grp3,median)
tb<-cbind(tb,Propensity)
rm(Propensity)
PropensityMax<-tapply(dm2$prop,dm2$grp3,max)
tb<-cbind(tb,PropensityMax)
rm(PropensityMax)
xtable::xtable(tb,digits=c(NA,0,1,1,1,1,1,0,0,0,2,2,2))

addmargins(table(dm2$z,dm2$prop>.15))
#
# Make Table 2 regarding sensitivity analysis
#
gammas<-c(1:5,5.5,6)
ngamma<-length(gammas)
tabSen<-matrix(NA,ngamma,4)
colnames(tabSen)<-c("Pairs 1-1","Sets 1-4","Fisher","Truncated")
rownames(tabSen)<-gammas
for (i in 1:ngamma) tabSen[i,1]<-weightedRank::wgtRank(pd1,phi="u878",gamma=gammas[i])$pval
for (i in 1:ngamma) tabSen[i,2]<-weightedRank::wgtRank(pd4,phi="u878",gamma=gammas[i])$pval
for (i in 1:ngamma) {
  if (min(tabSen[i,1:2]==0)) tabSen[i,3:4]<-0
  else{
    tabSen[i,3]<-sensitivitymv::truncatedP(tabSen[i,1:2],trunc=1)
    tabSen[i,4]<-sensitivitymv::truncatedP(tabSen[i,1:2],trunc=0.2)
  }
}
# Table 2
xtable::xtable(t(tabSen),digits=4)

# Compare Table 2 to a sensitivity analysis for 1425 pairs-only
# by randomly selecting 1 of 4 controls from the 1-to-4 sets
set.seed(12345)
a<-sample(2:5,(dim(pd4)[1]),replace=TRUE)
pd4r<-rep(NA,(dim(pd4)[1]))
for (i in 1:(dim(pd4)[1])) pd4r[i] <- pd4[i,a[i]]
pd4r<-cbind(pd4[,1],pd4r)
rm(a)

weightedRank::wgtRank(rbind(pd1,pd4r),phi="u878",gamma=4.2)
weightedRank::wgtRank(rbind(pd1,pd4r),phi="quade",gamma=4)
weightedRank::wgtRank(rbind(pd1,pd4r),phi="quade",gamma=3)

#
# Make Table 3 regarding counterfactual risk
#
ctab<-table(dm2$pd>=20,dm2$grp3)
ctab<-ctab[2:1,]
ctab<-rbind(ctab,prop.table(ctab,2)[1,]*100)
ctab<-rbind(ctab,c(ctab[1,1]*ctab[2,2]/(ctab[1,2]*ctab[2,1]),
                   mantelhaen.test(table(dm$pd>=20,dm$z,dm$mset))$estimate,
                   ctab[1,3]*ctab[2,4]/(ctab[1,4]*ctab[2,3]),
                   mantelhaen.test(table(dm1$pd>=20,dm1$z,dm1$mset))$estimate))
xtable::xtable(ctab,digits=1)

#
#  Evidence factors analysis -- cigarettes per day
#
crosscutplot<-function (x, y, ct = 0.25, xlab = "", ylab = "", main = "",
                        ylim = NULL)
{
  stopifnot(is.vector(x))
  stopifnot(is.vector(y))
  stopifnot(length(x) == length(y))
  stopifnot((ct > 0) & (ct <= 0.5))
  qx1 <- stats::quantile(x, ct)
  qx2 <- stats::quantile(x, 1 - ct)
  qy1 <- stats::quantile(y, ct)
  qy2 <- stats::quantile(y, 1 - ct)
  use <- ((x <= qx1) | (x >= qx2)) & ((y <= qy1) | (y >= qy2))
  if (is.null(ylim))
    graphics::plot(x, y, xlab = xlab, ylab = ylab, main = main,
                   type = "n",las=1,cex.lab=.9,cex.axis=.9,,cex.main=.9)
  else graphics::plot(x, y, xlab = xlab, ylab = ylab, ylim = ylim,,cex.main=.9,
                      main = main, type = "n",las=1,cex.lab=.9,cex.axis=.9)
  graphics::points(x[use], y[use], pch = 16,cex=.6)
  graphics::points(x[!use], y[!use], col = "gray", pch = 16,cex=.6)
  graphics::abline(h = c(qy1, qy2))
  graphics::abline(v = c(qx1, qx2))
}

dCigs1<-dm$cigsperday[dm$z==1]
dCigs4<-dm1$cigsperday[dm1$z==1]
dif1<-pd1[, 1] - pd1[, 2]
dif4<-pd4[,1]-apply(pd4[,2:5],1,median)
par(mfrow=c(1,2))
crosscutplot(dCigs1,dif1,xlab="Cigarettes per Day",ylim=c(-100,100),
             ylab="Periodontal Disease",main="1212 Pairs")
text(70,-80,paste("Odds Ratio =",round(89*135/(84*72),2)),cex=.7)
crosscutplot(dCigs4,dif4,xlab="Cigarettes per Day",ylim=c(-100,100),
             ylab="Periodontal Disease",
             main="213 Matched 1-to-4 Sets")
text(31,-80,paste("Odds Ratio =",round(28*18/(12*9),2)),cex=.7)
DOS2::crosscut(dCigs1,dif1)
DOS2::crosscut(dCigs4,dif4)
tb<-c(as.vector(DOS2::crosscut(dCigs1,dif1)$table),
      as.vector(DOS2::crosscut(dCigs4,dif4)$table))
tb<-array(tb,c(2,2,2))
sensitivity2x2xk::mh(tb,Gamma=1.6)
sensitivity2x2xk::mh(tb[,,1],Gamma=1.375)
sensitivity2x2xk::mh(tb[,,2],Gamma=1.7)


par(old.par)
rm(gammas,ngamma,crosscutplot,tb,i,tabSen,pd4r,old.par,ctab)

Unmatched Periodontal Disease Data

Description

Unmatched data from NHANES 2009-2010, 2011-2012, 2013-2014 concerning smoking and periodontal disease.

Usage

data("PeriUnmatched")

Format

A data frame with 6255 observations on the following 11 variables.

SEQN: NHANES ID number
female: 1=female, 0=male
age: Age in years, capped at 80 for confidentiality
ageFloor: Age decade = floor(age/10)
educ: Education as 1 to 5. 1 is less than 9th grade, 2 at least 9th grade with no high school degree, 3 is a high school degree, 4 is some college, such as a 2-year associates degree, 5 is at least a 4-year college degree.
noHS: No high school degree. 1 if educ is 1 or 2, 0 if educ is 3 or more
income: Ratio of family income to the poverty level, capped at 5 for confidenditality
nh: The specific NHANES survey. A factor nh0910 < nh1112 < nh1314
cigsperday: Number of cigarettes smoked per day. 0 for nonsmokers.
z: Daily smoker. 1 indicates someone who smokes everyday. 0 indicates a never-smoker who smoked fewer than 100 cigarettes in their life.
pd: A percent indicating periodontal disease. See details.

Details

The data from three NHANES surveys (specifically 2009-2010, 2011-2012, and 2013-2014) contain periodontal data and are used as an example in Rosenbaum (2025). The data from one survey, 2011-2012, were used in Rosenbaum (2016). The example uses these unmatched data twice in artless() to create the fused match in Rosenbaum (2025). The fused match combines some 1-to-1 matched pairs and some 1-to-4 matched sets based on the values of the propensity score. The data are useful in learning about fused matching, but the example in the documentation for artless() should be used as the main example illustrating artless().

Note

An analysis of outcomes should take appropriate account of the matching; see the note in the documentation for PeriMatched. Often, covariate balance is assessed by comparing the marginal distributions of covariates in treated and control groups after matching; however, some care is required when there are both 1-to-1 pairs and 1-to-4 sets. One can assess covariate balance for the pairs, and separately assess covariate balance for the 1-to-4 sets. Alternatively, one can measure covariate balance in the pairs and the 1-to-4 sets separately, perhaps taking the difference in means, and then take a weighted combination of the two differences in means for pairs and 1-to-4 sets, along the lines indicated by Pimentel, Yoon and Keele (2015). However, one cannot assess covariate balance by pooling the two treated groups from pairs and 1-to-4 sets, pooling the two control groups from pairs and 1-to-4 sets, and comparing the two pooled groups. In the example, there is exact matching for sex; however, most pairs are men and most 1-to-4 sets are women. Pool the pairs and the 1-to-4 sets and the pooled control group has proportionately more women than the pooled treated group. To see this, type:

data("PeriMatched")

tapply(PeriMatched$female,PeriMatched$grp3,mean)

tapply(PeriMatched$female,PeriMatched$grp2,mean)

Source

US National Health and Nutrition Examination Survey (NHANES). https://www.cdc.gov/nchs/nhanes/

References

Pimentel, S. D., Yoon, F., & Keele, L. (2015) <doi:10.1002/sim.6593> Variable‐ratio matching with fine balance in a study of the Peer Health Exchange. Statistics in Medicine, 34(30), 4070-4082.

Rosenbaum, Paul R. (2026) <doi:10.1080/00031305.2026.2623909> A design for observational studies in which some people avoid treatment. American Statistician, to appear.

Tomar, S. L. and Asma, S. (2000). Smoking attributable periodontitis in the United States: Findings from NHANES III. J. Periodont. 71, 743-751.

Examples

# The code below creates the matched data, PeriMatched, from the unmatched
# data PeriUnmatched using the function artless() twice. Individuals
# with prop above 0.15 were matched in pairs.  Individuals with prop of at
# most 0.15 were matched in a 1-to-5 ratio.
data(PeriUnmatched)

# Controls matched for female, age, education, income
d0<-PeriUnmatched
prop<-stats::glm(d0$z~d0$female+d0$age+d0$educ+d0$income,family=binomial)$fitted
d0<-cbind(d0,prop)
rm(prop)

# Pair match for higher propensity individuals
d1<-d0[d0$prop>0.15,]
attach(d1)
ageFloor<-floor(age/10)
lowInc<-1*(income<2)
highInc<-1*(income>=4)
x<-cbind(female,age,educ,income)
xm<-cbind(age,educ,income)
near<-cbind(female,ageFloor)
age60<-1*(age>=60)
fine<-cbind(age60,noHS,lowInc,highInc,female)
# Match does the following: estimates a new propensity score in
# this subpopulation using the covariates in x, uses a
# Mahalanobis distance for the covariates in xm, performs near-exact
# matched for the covariates in near, and performs near-fine balancing
# of the covariates in near.  The solves rlemon is used because it is
# available in R, but rrelaxiv may be a better choice, though it
# requires a separate installation.
m<-artless(d1,z,x,xm=xm,near=near,fine=fine,solver="rlemon")
detach(d1)
# Some clean-up follows
rm(age60)
dm<-m$match
dm<-dm[!is.na(dm$mset),]
rm(x,xm,fine,near,d1,ageFloor,lowInc,highInc)
treated<-as.vector(rbind(dm$SEQN[dm$z==1],dm$SEQN[dm$z==1]))
dm<-cbind(dm,treated)
rm(treated)

# Now match 1-to-4 for low propensity individuals
d1<-d0[d0$prop<=0.15,]
attach(d1)
ageFloor<-floor(age/10)
lowInc<-1*(income<2)
highInc<-1*(income>=4)
x<-cbind(female,age,educ,income)
xm<-cbind(age,educ,income)
near<-cbind(female,ageFloor)
age60<-1*(age>=60)
fine<-cbind(age60,noHS,lowInc,highInc,female)
ncontrols<-4
# Match does the following: estimates a new propensity score in
# this subpopulation using the covariates in x, uses a
# Mahalanobis distance for the covariates in xm, performs near-exact
# matched for the covariates in near, and performs near-fine balancing
# of the covariates in near.  The solves rlemon is used because it is
# available in R, but rrelaxiv may be a better choice, though it
# requires a separate installation.
m1<-artless(d1,z,x,xm=xm,near=near,fine=fine,solver="rlemon",
                     ncontrols=ncontrols)
detach(d1)
# Some clean-up follows
rm(age60)
dm1<-m1$match
dm1<-dm1[!is.na(dm1$mset),]
rm(x,xm,fine,near,d1,ageFloor,lowInc,highInc)
treated1<-dm1$SEQN[dm1$z==1]
treated<-treated1
for (i in 1:(ncontrols)) treated<-rbind(treated,treated1)
treated<-as.vector(treated)
dm1<-cbind(dm1,treated)
rm(treated,treated1,i,ncontrols)

# Pool the two matched sames into one data.frame dm2
pair<-rep(1,dim(dm)[1])
dm<-cbind(dm,pair)
dm$mset<-as.integer(dm$mset)
pair<-rep(0,dim(dm1)[1])
dm1<-cbind(dm1,pair)
dm1$mset<-as.integer(dm1$mset)+max(dm$mset)
dm2<-rbind(dm1,dm)
rm(pair)
grp2<-factor(dm2$z,levels=1:0,labels=c("S","N"),ordered=TRUE)
grp3<-factor(dm2$pair,levels=c(1,0),labels=c("1-1","1-4"),ordered=TRUE):grp2
dm2<-cbind(dm2,grp2,grp3)
rm(grp2,grp3)

# There are 1212 pairs and 213 1-to-4 sets
table(table(dm2$mset))
# Check the balance tables separately for pairs and sets
# Pairs
m$balance
# 1-to-4 sets
m1$balance

Artful Optimal Matching

Description

Implements a simple version of multivariate matching using a propensity score, near-exact matching, near-fine balance, and robust Mahalanobis distance matching. Provides fine control of the penalties used in matching.

Usage

alittleArt(dat, z, x = NULL, pr = NULL, xm = NULL, near = NULL,
  fine = NULL, xinteger = NULL, xbalance = NULL, ncontrols = 1,
  rnd = 2, solver = "rlemon", min.penalty = c(10, 1, 0.05),
  pr.penalty = c(2, 5, 25, 250), near.penalty = 1000,
  fine.penalty = 50, integer.penalty = 20)

Arguments

dat

A dataframe containing the data set that will be matched. Let N be the number of rows of dat.

z

A binary vector with N coordinates where z[i]=1 if the ith row of dat describes a treated individual and z[i]=0 if the ith row of dat describes a control.

x

x is a numeric matrix with N rows. If pr is NULL, then the covariates in x are used to estimate a propensity score using a linear logit model that predicts z from x. An error will stop the program if pr and x are both NULL. If neither pr nor x is NULL, then a harmless warning message will remind you that your propensity score pr was used in matching and x was not used to estimate the propensity score. If xbalance is NULL, then the balance table will describe the covariates in x; so, those covariates should be continuous variables or binary variables that can be described by a mean or a proportion, not nominal categories.

pr

A vector with N coordinates containing an estimated propensity or similar quantity. If pr is NULL, then the program estimates the propensity score; see the discussion of x above.

xm

xm is a numeric matrix with N rows. The covariates in xm are used to define a robust Mahalanobis distance between treated and control individuals. The covariates in xm may be continuous variables like weight, integer covariates like number of rooms in a home, or binary variables; however, they should not be unordered nominal covariates like 1=New York, 2=Chicago, 3=London, 4=Tokyo.

near

A numeric vector of length N or a numeric matrix with N rows. Each column of near should represent levels of a nominal covariate with two or a few levels. The variables in near are used in near-exact matching.

fine

A numeric vector of length N or a numeric matrix with N rows. Each column of fine should represent levels of a nominal covariate with two or a few levels. The variables in fine are used in near-fine balancing.

xinteger

A numeric vector of length N or a numeric matrix with N rows. Each column of xinteger should represent levels of an integer covariate with three or a few levels. The variables in xinteger are used in near-fine balancing that prefers an imbalance from an adjacent category to an imbalance from a distant category. See the notes.

xbalance

If not NULL, xbalance is numeric vector of length N or a numeric matrix with N rows. If xbalance is not NULL, then the balance table will describe the covariates in xbalance; so, those covariates should be continuous variables or binary variables that can be described by a mean or a proportion, not nominal categories. See also the discussion of x above and the notes.

ncontrols

A positive integer. ncontrols is the number of controls to be matched to each treated individual.

rnd

A nonnegative integer. The balance table is rounded for display to rnd digits.

solver

Either "rlemon" or "rrelaxiv". The rlemon solver is automatically available without special installation. The rrelaxiv requires a special installation. See the note.

min.penalty

A vector of three nonnegative coordinates. The third coordinate must be strictly greater than zero and strictly less than one. See the notes.

pr.penalty

A vector with four nonnegative coordinates that determine aspects of matching for the propensity score. See the notes.

near.penalty

Either one nonnegative number of a vector of nonnegative numbers with one coordinate for each column of near. See the notes.

fine.penalty

Either one nonnegative number of a vector of nonnegative numbers with one coordinate for each column of fine. See the notes.

integer.penalty

Either one nonnegative number of a vector of nonnegative numbers with one coordinate for each column of xinteger. See the notes.

Details

This function builds a matched treated-control sample from an unmatched data set. It asks you to designate roles for specific covariates, and it does the rest. Unlike artlessV2(), the function alittleArt() gives you control over the penalties used in matching. In particular, if in an initial match one covariate, say age, remains out of balance, then you can adjust a penalty specific to age to attempt to improve its balance.

Value

match

A dataframe containing the matched data set. match contains the rows of dat in a different order. match adds two columns to dat, called mset and matched, which identify matched pairs or matched sets. Specifically, matched is TRUE if a row is in the matched sample and is FALSE otherwise. Rows of dat that are in the same matched set have the same value of mset. The rows of match are sorted by mset with the treated individual before the matched controls. The unmatched controls with matched=FALSE appear as the last rows of match. When you analyze the matched data, you will want to remove rows of match with matched==FALSE.

balance

A matrix called the balance table. The matrix has one row for each covariate in x. It also has a first row for the propensity score. There are five columns. Column 1 is the mean of the covariate in the treated group. Column 2 is the mean of the covariate in the matched control group. Column 3 is the mean of the covariate among all controls prior to matching. Column 4 is the difference between columns 1 and 2 divided by a pooled estimate of the standard deviation of the covariate before matching. Column 5 is the difference between columns 1 and 3 divided by a pooled estimate of the standard deviation of the covariate before matching. Notice that columns 4 and 5 have the same denominator, but different numerators. Tom Love (2002) suggests a graphical display of this information.

Note

The mathematical structure of alittleArt() is a very special implementation of the method in Zhang et al. (2023). The method is also described in Chapters 5 and 6 of Rosenbaum (2025). alittleArt() calls functions in the iTOS package, where more detail may be found.

Note

Penalty Structure: near.penalty, fine.penalty and integer.penalty relate to matrices near, fine and xinteger, respectively, and they have a similar structure. If any of these penalties is a scalar, that scalar is repeated to form a vector with one coordinate for each column of its matrix. In the example, near has two columns. For example, if near has two columns, and near.penalty=1000, then near.penalty becomes c(1000,1000). The penalties apply to the corresponding columns; so, you can apply different penalties to different covariates; however, by default, all columns have the same penalty.

Note

The near Matrix: An attempt is made to exactly match for covariates in near. In the example, near contains two binary covariates, namely female and dontSmoke. This means that the match will try to match women to women and men to men, nonsmokers to nonsmokers, and smokers to smokers. If near.penalty=c(1000,500), then a mismatch for female increases cost by 1000, a mismatch for dontSmoke increases cost by 500, a mismatch for both costs 1500, and mismatching two people for dontSmoke costs the same as a single mismatch for female. A small penalty, say near.penalty=c(2,1), will increase he number of exact matches, but will often be overridden by other considerations.

Note

The fine Matrix: Fine balance refers to the marginal distributions of a covariate in treated and control groups, not to who is paired with whom. An attempt is made to balance covariates in fine. In the example, fine includes a covariate expressing four broad age categories, one low education category (less than high school), and a binary covariate distinguishing daily-smokers from everyone else. This means that the match will work hard to have the same proportion of people with less-than-high-school education in treated and control groups, but it will not prioritize pairing two people with less-than-high-school education. As with near.penalty above, fine.penalty can be adjusted to increase or decrease the emphasis on fine balancing, or to increase or decrease the emphasis on one column of fine rather than another column.

Note

The xinteger Matrix: An attempt is made to balance covariates in xinteger, in a manner similar to the covariates in fine. The difference is that the covariates in fine are viewed as nominal, but the covariates in xinteger are viewed as integers. Take ageC in the example. ageC is an ordered factor that is made into an integer using as.integer(ageC). ageC cuts age into 4 categories at 30, 45, and 60. If used in fine, the categories <30 and 60+ are nominal categories. If used in xinteger, <30 is far from 60+, and <30 is closer to 30-45 than to 60+. If ageC cannot be perfectly balanced, the penalty is smaller for imbalances in nearby categories than for distant categories. If integer.penalty=20, then there is 0 penalty for the same category, 20 for a one-category difference, 40 for a two-category difference, etc. If used as a nominal covariate in fine, every imbalance for ageC would cost the same, ignoring the fact that <30 is closer to 30-45 than to 60+.

Note

The Propensity Score: Three separate attempts are made to, first, balance the propensity score in the sense of fine balance, and second to pair closely for the propensity score, and third to avoid controls with propensity scores below all treated individuals. These attempts are controlled by two parameters, min.penalty and pr.penalty.

With the default min.penalty = c(10, 1, 0.05), there is a penalty of 10 for each control that is used in the match whose propensity score is below the minimum propensity score in the treated group. Also, there is an additional penalty of 1 for each control that is used whose propensity score is below the 0.05 quantile of the propensity scores in the treated group. Taking min.penalty = c(0, 0, 0.05) removes this feature. Taking min.penalty = c(10, 0, 0.05) uses the minimum penalty but not the 0.05 quantile penalty, etc. This is a simple directional penalty of the type in Yu and Rosenbaum (2019). By construction, the propensity score tends to be low in the control group – so the needed direction is clear – but the magnitudes of the penalties – defaulting to 10 and 1 – may need adjustment based on boxplots of the propensity scores in matched samples.

The propensity score is made into two integer variables, pr6 and pr3, where pr6 is 1 to 6 and cuts pr at its 1/6, 1/3, 1/2, 2/3, and 5/6 quantiles, while pr3 is 1, 2 or 3 and cuts pr at its 1/3 and 2/3 quantiles. Note that pr3=1 when pr6=1 or pr6=2. An attempt is made to pair for pr6 and pr3 and to balance for pr6 and pr3. At the default pr.penalty=c(2,5,25,250), there is a penalty of 2 for a one-category mismatch for p6, and an additional penalty of 5 for a one category mismatch for p3; moreover, as in xinteger, these are doubled for a two category mismatch, etc. At the default pr.penalty=c(2,5,25,250), there is a 25 penalty for a one category imbalance in p6 and a 250 penalty for a one category imbalance in p3.

Changing pr.penalty to c(0,0,25,250) will make no attempt to pair for the propensity score, while trying to balance it. Changing pr.penalty to c(2,5,0,0) will make no attempt to finely balance the propensity score, while trying to pair for it. Changing pr.penalty to c(2000,0,0,0) will try very hard to match exactly for pr6.

Note

The xm Matrix: The variables in xm are used to construct a robust rank based covariate distance similar to the Mahalanobis distance; see section 9.3 of Rosenbaum (2020a). Robustness refers to two problems with the usual Mahalanobis distance when used in matching. First, in the usual Mahalanobis distance, outliers in a covariate can increase its sample variance, thereby decreasing the importance of a 1-unit difference in the covariate. First, in the usual Mahalanobis distance, a rare binary covariate has a small variance even though a mismatch is always of size |1-0|=1; so, in a US sample, a mismatch for lives-in-Wyoming is much more important than a mismatch for lives-in-California. The robust rank based covariate distance fixes both problems: an outlier cannot make a covariate less important, and binary variables, rare or common, are equally important.

Note

SOLVER: The package uses by default the solver rlemon; it is available in R. The alternative, rrelaxiv, requires a special installation that will now be described.

With solver="rrelaxiv", the package indirectly uses the callrelax() function in Samuel Pimentel's rcbalance package. This function was originally intended to call the excellent RELAXIV Fortan code of Bertsekas and Tseng (1988,1994). Unfortunately, that code has an academic license and is not available from CRAN; so, by default the package calls the rlemon function instead, which is available at CRAN. If you qualify as an academic, then you may be able to download the RELAXIV code from Github at <https://github.com/josherrickson/rrelaxiv/> and use it in artlessV2 by setting solver="rrelaxiv".

Note

– The following are some practical tips on how to use alittleArt().

– Perhaps do a first match with the default settings. Examine the balance table and parallel boxplots of covariates in matched treated and control groups. Adjust the various penalties, if needed, to fix any covariate imbalances you find.

– It is harder to match 1 treated individual to 3 controls (with ncontrols = 3) than to match one control to each treated individual. If you are having difficulty finding a well-balanced 1-to-5 match, try 1-to-3 or 1-to-1. If it was easy to find a balanced 1-to-1 match, try 1-to-3. With a better choice of penalties, it may be possible to match 1-to-3, while a worse choice of penalties produces adequate covariate balance only for 1-to-1.

– Most covariates that you want to balance should be included in the propensiy score, either in pr or in x.

– The covariates in x could include, say: (i) a quadratic in age, (age-mean(age))^2, (ii) an interaction, (age-mean(age))*(bmi-mean(bmi)), or (iii) spline terms computed from age. Alternatively, you can build your own propensity score in pr or substitute a different kind of score, rather than automatically using a linear logit model fitted by maximum likelihood.

– It is sometimes important to look for effect-modification, meaning that the treatment effect varies systematically with one or more covariates. If you want to look for effect modification by female-vs-male, then it is useful to include a binary female covariate in the matrix near with a large penalty. This will ensure that all or almost all pairs will be exactly matched for female; so, the pairs can be split into female pairs and male pairs for separate or comparative analysis. Various re-pairing methods can be used with finely balanced covariates; see Lee et al. (2018).

– The matrix xbalance allows you to control the ways covariates are expressed in the balance table separately from the way covariates are expressed in the match.

– An attempt is made to pair closely for covariates in xm. A continuous covariate, like age or bmi, might be placed in x and in xm. A binary covariate like female can also be used. Covariates in xm are given roughly equal importance; so, do not put unimportant covariates in xm.

– The match should be finalized before any outcome information is examined; see Rubin (2008).

– There can exist treated and control groups that cannot be matched. If all of the treated individuals are under age 20 and all of the controls are over age 50, then there is no way you can match for age. You could do regression or covariance adjustment for age, but of course it would be silly. Matching will often stop you from doing silly things, while regression will let you do silly things.

Note

TECHNICAL DETAILS: The following details refer to Figure 1 in Zhang et al. (2023) or Figure 5.5 in Rosenbaum (2025). In particular LEFT refers to treated-control edges on the left side of the network, RIGHT refers to control-treated edges on the right side of the network, and CC refers to the control-control edges in the center of the network. Various functions from the iTOS package are mentioned; they have detailed documentation in the iTOS package.

xm adds a penalty on the LEFT using addMahal() from the iTOS package.

near adds a penalty on the LEFT using addNearExact() from the iTOS package.

fine adds a penalty on the RIGHT using addNearExact() from the iTOS package.

xinteger adds a penalty on the RIGHT using addinteger() from the iTOS package.

pr does three things:

1. the first two penalties in pr.penalty are added on the LEFT using addinteger() from the iTOS package.

2. the last two penalties in pr.penalty are added on the RIGHT using addinteger() from the iTOS package.

3. min.penalty adds two penalties to CC central edges.

Author(s)

Paul R. Rosenbaum

References

Bertsekas, D. P., Tseng, P. (1988) <doi:10.1007/BF02288322> The Relax codes for linear minimum cost network flow problems. Annals of Operations Research, 13, 125-190.

Bertsekas, D. P. (1990) <doi:10.1287/inte.20.4.133> The auction algorithm for assignment and other network flow problems: A tutorial. Interfaces, 20(4), 133-149.

Bertsekas, D. P., Tseng, P. (1994) <http://web.mit.edu/dimitrib/www/Bertsekas_Tseng_RELAX4_!994.pdf> RELAX-IV: A Faster Version of the RELAX Code for Solving Minimum Cost Flow Problems.

Greifer, N. and Stuart, E.A., (2021). <doi:10.1093/epirev/mxab003> Matching methods for confounder adjustment: an addition to the epidemiologist’s toolbox. Epidemiologic Reviews, 43(1), pp.118-129.

Hansen, B. B. and Klopfer, S. O. (2006) <doi:10.1198/106186006X137047> "Optimal full matching and related designs via network flows". Journal of computational and Graphical Statistics, 15(3), 609-627. ('optmatch' package)

Hansen, B. B. (2007) <https://www.r-project.org/conferences/useR-2007/program/presentations/hansen.pdf> Flexible, optimal matching for observational studies. R News, 7, 18-24. ('optmatch' package)

Lee, K., Small, D.S. and Rosenbaum, P.R. (2018) <doi:10.1111/biom.12884> A powerful approach to the study of moderate effect modification in observational studies. Biometrics, 74:(4)1161-1170.

Love, Thomas E. (2002) Displaying covariate balance after adjustment for selection bias. Joint Statistical Meetings. Vol. 11. https://chrp.org/love/JSM_Aug11_TLove.pdf

Niknam, B.A. and Zubizarreta, J.R. (2022). <10.1001/jama.2021.20555> Using cardinality matching to design balanced and representative samples for observational studies. JAMA, 327(2), pp.173-174.

Pimentel, S. D., Yoon, F., & Keele, L. (2015) <doi:10.1002/sim.6593> Variable‐ratio matching with fine balance in a study of the Peer Health Exchange. Statistics in Medicine, 34(30), 4070-4082.

Pimentel, S. D., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2015) <doi:10.1080/01621459.2014.997879> Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. Journal of the American Statistical Association, 110, 515-527.

Rosenbaum, P. R. and Rubin, D. B. (1985) <doi:10.1080/00031305.1985.10479383> Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39, 33-38.

Rosenbaum, P. R. (1989) <doi:10.1080/01621459.1989.10478868> Optimal matching for observational studies. Journal of the American Statistical Association, 84(408), 1024-1032.

Rosenbaum, P. R., Ross, R. N. and Silber, J. H. (2007) <doi:10.1198/016214506000001059> Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. Journal of the American Statistical Association, 102, 75-83.

Rosenbaum, P. R. (2020a) <doi:10.1007/978-3-030-46405-9> Design of Observational Studies (2nd Edition). New York: Springer.

Rosenbaum, P. R. (2020b). <doi:10.1146/annurev-statistics-031219-041058> Modern algorithms for matching in observational studies. Annual Review of Statistics and Its Application, 7(1), 143-176.

Rosenbaum, P. R. and Zubizarreta, J. R. (2023). <doi:10.1201/9781003102670> Optimization Techniques in Multivariate Matching. Handbook of Matching and Weighting Adjustments for Causal Inference, pp.63-86. Boca Raton: FL: Chapman and Hall/CRC Press.

Rosenbaum, P. R. (2025) <doi:10.1007/978-3-031-90494-3> Introduction to the Theory of Observational Studies. New York: Springer.

Rubin, D. B. (1980) <doi:10.2307/2529981> Bias reduction using Mahalanobis-metric matching. Biometrics, 36, 293-298.

Rubin, D. B. (2008) <doi:10.1214/08-AOAS187> For objective causal inference, design trumps analysis. Annals of Applied Statistics, 2, 808-840.

Stuart, E.A., (2010). <doi:10.1214/09-STS313> Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21.

Yang, D., Small, D. S., Silber, J. H. and Rosenbaum, P. R. (2012) <doi:10.1111/j.1541-0420.2011.01691.x> Optimal matching with minimal deviation from fine balance in a study of obesity and surgical outcomes. Biometrics, 68, 628-636.

Yu, Ruoqi, and P. R. Rosenbaum. <doi:10.1111/biom.13098> Directional penalties for optimal matching in observational studies. Biometrics 75, no. 4 (2019): 1380-1390.

Yu, R., Silber, J. H., & Rosenbaum, P. R. (2020) <doi:10.1214/19-STS699> Matching methods for observational studies derived from large administrative databases. Statistical Science, 35(3), 338-355.

Yu, R. (2021) <doi:10.1111/biom.13374> Evaluating and improving a matched comparison of antidepressants and bone density. Biometrics, 77(4), 1276-1288.

Yu R. & Rosenbaum, P. R. (2022) <doi:10.1080/10618600.2022.2058001> Graded matching for large observational studies. Journal of Computational and Graphical Statistics, 31(4):1406-1415.

Yu, R. (2023) <doi:10.1111/biom.13771> How well can fine balance work for covariate balancing? Biometrics. 79(3), 2346-2356.

Zubizarreta, J.R., 2012. <doi:10.1080/01621459.2012.703874>Using mixed integer programming for matching in an observational study of kidney failure after surgery. Journal of the American Statistical Association, 107(500), pp.1360-1371.

Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011) <doi:10.1198/tas.2011.11072> Matching for several sparse nominal variables in a case control study of readmission following surgery. The American Statistician, 65(4), 229-238.

Zubizarreta, J.R., Stuart, E.A., Small, D.S. and Rosenbaum, P.R. eds. (2023). <doi:10.1201/9781003102670> Handbook of Matching and Weighting Adjustments for Causal Inference. Boca Raton: FL: Chapman and Hall/CRC Press.

Examples


# The example below uses the binge data from the iTOS package.
# See the documentation for binge in the iTOS package for more information.
#
library(iTOS)
data(binge)
b2<-binge[binge$AlcGroup!="P",] # Match binge drinkers to nondrinkers
z<-1*(b2$AlcGroup=="B") # Treatment/control indicator
b2<-cbind(b2,z)
rm(z)
rownames(b2)<-b2$SEQN
attach(b2)
# Estimate a propensity score
pr<-stats::glm(z~age+female+education+bmi+vigor+
      smokenow+smokeQuit+bpRX,family=binomial)$fitted.values
#
#  Create nominal covariates to include in near or fine
#
smoke<-1*(smokenow==1)
dontSmoke<-1*(smokenow==3)
age50<-1*(age>=50)
bmi30<-1*(bmi>=30)
ed2<-1*(education<=2)
smoke<-1*(smokenow==1)
#
#  near contains covariates to be matched as exactly as possible
#
near<-cbind(female,dontSmoke)
#
# xm contains covariates in the robust Mahalanobis distance
# Includes some continuous covariates.
#
xm<-cbind(age,bmi,vigor,smokenow,education)
#
# fine contains covariate that will be balanced, but not matched
#
fine<-cbind(ed2,smoke,dontSmoke)

# variable to be used in xinteger
ageCi<-as.integer(ageC)
xbalance<-cbind(pr,age,female,education,bmi,vigor,smokenow,smokeQuit,bpRX,
   ageCi,ed2,smoke,dontSmoke,bmi30,smoke,ed2,age50)
b2<-cbind(b2,pr)
rm(bmi30,smoke,ed2,age50,dontSmoke)
detach(b2)

mc<-alittleArt(b2,b2$z,pr=pr,xm=xm,near=near,fine=fine,xinteger=ageCi,
   ncontrols=3,xbalance=xbalance,pr.penalty = c(3, 5, 50, 250))
#
#  Here are the first two 1-to-3 matched sets.
#
mc$match[1:8,]
#
#  You can check that every matched set is exactly matched for
#  female and nonsmoking.  This is from near-exact matching.
#  In some other data set, the number of mismatches might be
#  minimized, not driven to zero.
#
#  The balance table shows that large imbalances in covariates
#  existed before matching, but are much smaller after matching.
#  Look, for example, at the propensity score, female, and
#  the several versions of the smoking variable.
#
mc$balance
m<-mc$match
m<-m[m$matched,] # Remove the unmatched controls
table(m$z)
prop.table(table(m$ageC,m$z),2)
# You could improve this table by setting integer.penalty=500.
# Other things might suffer a bit.  The boxplot of age is good as is.
boxplot(m$age~m$z)
boxplot(m$pr~m$z)

Artless Automatic Matching (Old Version)

Description

Please use the newer version 2, artlessV2, not this old version, artless(). This old version, artless(), is unchanged to maintain backwards compatibility; see the details. Implements a simple version of multivariate matching using a propensity score, near-exact matching, near-fine balance, and robust Mahalanobis distance matching.

Usage

artless(dat, z, x, xm = NULL, near = NULL, fine = NULL,
   ncontrols = 1, rnd = 2, solver="rlemon")

Arguments

dat

A dataframe containing the data set that will be matched. Let N be the number of rows of dat.

z

A binary vector of length N where z[i]=1 if the ith row of dat describes a treated individual and z[i]=0 if the ith row of dat describes a control.

x

x is a numeric matrix with N rows. The covariates in x are used to estimate a propensity score using a linear logit model.

xm

xm is a numeric matrix with N rows. The covariates in xm are used to define a robust Mahalanobis distance between treated and control individuals.

near

fine

ncontrols

A positive integer. ncontrols is the number of controls to be matched to each treated individual.

rnd

A nonnegative integer. The balance table is rounded for display to rnd digits.

solver

Either "rlemon" or "rrelaxiv". The rlemon solver is automatically available without special installation. The rrelaxiv requires a special installation. See the note.

Details

Please use version 2, artlessV2, not this version, artless(). Although quite similar, the newer version, artlessV2(), changes various defaults and fixes some quirks, bugs, and oddities. Also, artlessV2() sets various defaults and calls another function, alittleArt. In alittleArt(), you can change the defaults to have fine control over the resulting match. You can easily make the step from artlessV2() to alittleArt() because, in fact, you have been using alittleArt() all along. The code for artlessV2() is now a single function call to alittleArt(). The documentation for artlessV2() is easy to read, because defaults are set for you without mention. The documentation for alittleArt() is explicit and comprehensive, because anyone who opts for alittleArt() has thereby expressed an interest in fine control of the match and its associated technical detail.

Value

match

balance

A matrix called the balance table. The matrix has one row for each covariate in x, xm, near and fine; so, some covariates may be repeated. It also has a first row for the propensity score. There are five columns. Column 1 is the mean of the covariate in the treated group. Column 2 is the mean of the covariate in the matched control group. Column 3 is the mean of the covariate among all controls prior to matching. Column 4 is the difference between columns 1 and 2 divided by a pooled estimate of the standard deviation of the covariate before matching. Column 5 is the difference between columns 1 and 3 divided by a pooled estimate of the standard deviation of the covariate before matching. Notice that columns 4 and 5 have the same denominator, but different numerators.

Note

– The following are some practical tips on how to use artless.

– Placing a covariate in x means that it is included in the propensity score. Most or all covariates that you want to balance should be placed in x.

– A limited number of nominal covariates with a few levels can be placed in near or in fine. Both near and fine covariates are given overriding importance; so, if you place too many covariates in near or fine, or if they have too many levels, they will override everything else, and the match quality will be poor. The same covariate can appear, perhaps in different forms, in x, xm, near and fine. In the example, a five-level education variable is in x and xm, and a two-level education variable formed from the five-level education variable is in fine.

– An attempt is made to exactly match for covariates in near. In the example, near contains two binary covariates, namely female and dontSmoke. This means that the match will try whenever possible to match women to women and men to men, nonsmokers to nonsmokers, and smokers to smokers. Other considerations are subbordinated to this goal.

– An attempt is made to balance covariates in fine. In the example, fine includes a covariate expressing four broad age categories, one low education category (less than high school), and a binary covariate distinguishing daily-smokers from everyone else. This means that the match will work hard to have the same proportion of people with less-than-high-school education in treated and control groups, but it will not prioritize pairing two people with less-than-high-school education. Although subbordinate to near exact matching, fine balance is given more importance than other considerations.

– Two separate attempts are made to, first, balance the propensity score in the sense of fine balance and to pair closely for the propensity score. More emphasis is given to balancing the propensity score, much less to pairing for it. The match also tries in a limited way to avoid using many controls whose propensity scores are below the minimum propensity score in the treated group.

– An attempt is made to pair closely for covariates in xm; however, this task has the lowest priority of the several goals. A continuous covariate, like age or bmi, might be placed in x and in xm. Covariates in xm are given roughly equal importance, so do not put unimportant covariates in xm.

– The covariates in x could include, say: (i) a quadratic in age, (age-mean(age))^2, (ii) an interaction, (age-mean(age))*(bmi-mean(bmi)), or (iii) spline terms computed from age. The function alittleArt() in this package permits you to import a propensity score, rather than automatically fitting it as a linear logit model fitted by maximum likelihood.

– Usually, the first match you construct is imperfect, and you see this in the balance table or in plots of the matched data. So, you make small adjustments to x, xm, near and fine to fix the imperfections. The match should be finalized before any outcome information is examined. Taking the first match without looking at it and improving it is not artless; it is incompetent.

– Once you have developed some experience with the artless function, you may want to learn about other artful tactics that can enhance your ability to remove imperfections in a match. In particular, alittleArt() in this package provides user control of various parameters or penalties that affect the match, greatly increasing user control of the resulting match, at the price of some added effort on the part of the user. Some of these tactics are implemented in the iTOS package that is called by artless.

– There are treated and control groups that cannot be matched. If all of the treated individuals are under age 20 and all of the controls are over age 50, then there is no way you can match for age. You could do regression or covariance adjustment for age, but of course it would be silly. Matching will often stop you from doing silly things, while regression will let you do silly things.

Note

Should you be artful rather than artless? Essentially, the artless() function is setting priorities by default. This makes artless() easy to use, but its default priorities might not be your priorities. One alternative is to use the alittleArt() function in this package; see the discussion below for the distinction between artless() and alittleArt(). An alternative is to set your own priorities by using the matching methods in, say, the iTOS package. The artless() function calls the functions in the iTOS package, but it sets default priorities when it does this. There are also many more options in the iTOS package.

What can artful use of iTOS do that artless() cannot? artless() automatically sets priorities and penalties, but iTOS lets you adjust them. artless() automatically gives an emphasis to the propensity score, and does this in a particular way, but iTOS lets you decide. The directional penalties of Yu and Rosenbaum (2019) need to be titrated to produce desired effects; they are in iTOS but not in artless(). Near-exact and near-fine matching are implemented for nominal variables in artless(), but iTOS has other options for ordered categories. iTOS lets you give more emphasis to one covariate, less to another, but artless() does this only indirectly through the matrices x, xm, near and fine. In artless() all variables in near are treated as equally important, and all variables in fine are treated as equally important, but iTOS lets you decide. Caliper matching is possible in iTOS but not in artless(). artless() uses the control-control edge costs in Zhang et al. (2023) to avoid low propensity scores in the control group, but iTOS lets you use this feature any way you prefer. The iTOS package is associated with Rosenbaum (2025), especially its Chapters 5 and 6.

Note

This note provides some references and detail about what the package is actually doing. You do not have to read this note to use the package.

Matching using propensity scores and a Mahalanobis distance is discussed in Rosenbaum and Rubin (1985). The robust Mahalanobis distance is discussed in Section 9.3 of Rosenbaum (2020a) and more briefly in Section 4.1 of Rosenbaum (2020b).

Near-exact matching (also known as almost-exact matching) is an attempt to match exactly for a few nominal covariates, while also matching for other things. It is described in Sections 10.3 and 10.4 of Rosenbaum (2020a) and more briefly in Section 4.3 of Rosenbaum (2020b). Near-exact matching is implemented by a large penalty added to a covariate distance: if two people are not exactly matched for a near-exact covariate, then the covariate distance between them is very large. Near-exact matching minimizes the number of individuals who are not exactly matched.

Fine balance attempts to balance a covariate without pairing for it. For example, female is balanced if the treated and control groups have the same proportion of females, but female is exactly matched if females are always matched to females. Fine balance is discussed in Chapter 11 of Rosenbaum (2020a) and more briefly in Section 4.4 of Rosenbaum (2020b). Fine balance was introduced in Section 3.2 of Rosenbaum (1989), and is further developed in Rosenbaum, Ross and Silber (2007). If one seeks a match as close as possible to fine balance, then one is doing near-fine balance. Near-fine balance is often implemented using penalties for imbalances; see Yang et al. (2012), Pimentel et al. (2015) and Zhang et al. (2023).

One can do near-exact matching and fine balancing of the same variable, perhaps leading the proportion of females to be exactly the same in treated and control groups, with pairs matched for female as often as is possible. See Zubizarreta et al. (2011) for discussion.

artless() uses the control-control edge costs in Zhang et al. (2013) to moderately penalize the use of a control whose propensity score is below the minimum propensity score in the treated group. This penalty is smaller than the penalty for near-exact matching and for aspects of propensity score balancing, but it is larger than the penalty for each variable in near-fine matching.

This package implements a very specific version of two-criteria matching from Zhang et al. (2023) using functions from the iTOS package. Two-criteria matching integrates a number of earlier techniques into a single network structure. The package picks several one-size-fits-all penalties for distances for two-criteria matching. An artful match might vary penalties in a thoughtful way to achieve a better, closer, more balanced match with a larger value of ncontrols. The package does not use asymmetric calipers and directional penalties from Yu and Rosenbaum (2019) because these are not easily automated, but the artful use of these techniques can produce a better match.

The package uses optimal matching by minimum cost flow in a network. See Bertsekas (1990) for an introduction to this optimization technique, and see Rosenbaum (1989) for its application to matching in observational studies.

The package indirectly uses the callrelax() function in Samuel Pimentel's rcbalance package. This function was originally intended to call the excellent RELAXIV Fortan code of Bertsekas and Tseng (1988,1994). Unfortunately, that code has an academic license and is not available from CRAN; so, by default it calls the rlemon function instead, which is available at CRAN. If you qualify as an academic, then you may be able to download the RELAXIV code from Github at <https://github.com/josherrickson/rrelaxiv/> and use it in artless by setting solver="rrelaxiv".

artless() uses a dense network, so it can match moderately large data sets, but not very large data sets. For very large data sets, see Yu et al. (2020) and Yu's bigmatch package in R.

Network optimization is only one of several optimization techniques that may be used in multivariate matching. See Niknam and Zubizarreta (2022), Zubizarreta (2012) and Rosenbaum and Zubizarreta (2023).