| ggm.estimate.pcor {GeneTS} | R Documentation |
ggm.estimate.pcor implements various small-sample point estimators of partial
correlation that can be employed also for small sample data sets. Their statistical
properties are investigated in detail in Schaefer and Strimmer (2005).
ggm.estimate.pcor(x, method = c("observed.pcor", "partial.bagged.cor", "bagged.pcor"), R = 1000, ...)
x |
data matrix (each rows corresponds to one multivariate observation) |
method |
method used to estimate the partial correlation matrix. Available options are "observed.pcor" (default), "partial.bagged.cor", and "bagged.pcor". |
R |
number of bootstrap replicates (bagged estimators only) |
... |
options passed to partial.cor, bagged.cor,
and bagged.pcor. |
The result can be summarized as follows (with n being the sample size, and p being the number of variables):
observed.pcor: Observed partial correlation (Pi-1). Should be used preferentially for n >> p. In this region the other two estimators perform equally well but are slower due to bagging.
partial.bagged.cor: Partial bagged correlation (Pi-2). Best used for small sample applications with n < p. Here the advantages of Pi-2 are its small variance, its high accuracy as a point estimate, and its overall best power and positive predictive value (PPV). In addition it is computationally less expensive than Pi-3.
bagged.pcor: Bagged partial correlation (Pi-3). May be used in the critical zone (n approx. p) and for sample sizes n slightly larger than the number of variables p.
As a result, this particularly promotes the partial bagged correlation Pi-3 as estimator of choice for the inference of GGM networks from small-sample (gene expression) data.
An estimated partial correlation matrix.
Juliane Schaefer (http://www.stat.uni-muenchen.de/~schaefer/) and Korbinian Strimmer (http://www.stat.uni-muenchen.de/~strimmer/).
Schaefer, J., and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21:754-764.
Schaefer, J., and Strimmer, K. (2005). Learning large-scale graphical Gaussian models from genomic data. Proceedings of CNET 2004, Aveiro, Pt. (AIP)
ggm.simulate.data,ggm.estimate.pcor.
# load GeneTS library
library(GeneTS)
# generate random network with 40 nodes
# it contains 780=40*39/2 edges of which 5 percent (=39) are non-zero
true.pcor <- ggm.simulate.pcor(40)
# simulate data set with 40 observations
m.sim <- ggm.simulate.data(40, true.pcor)
# simple estimate of partial correlations
estimated.pcor <- partial.cor(m.sim)
# comparison of estimated and true model
sum((true.pcor-estimated.pcor)^2)
# a slightly better estimate ...
estimated.pcor.2 <- ggm.estimate.pcor(m.sim, method = c("bagged.pcor"))
sum((true.pcor-estimated.pcor.2)^2)