snp.imputation {snpMatrix} | R Documentation |
Given two set of SNPs typed in the same subjects, this function calculates rules which can be used to impute one set from the other in a subsequent sample.
snp.imputation(X, Y, pos.X, pos.Y, phase=FALSE, try = 50, stopping = c(0.95, 4, 0.05), use.hap = c(0.95, 0.1), em.cntrl=c(50, 0.01))
X |
An object of class "snpMatrix" or
"X.snp.matrix" containing observations
of the SNPs to be used for imputation ("regressor SNPs") |
Y |
An object of same class as X containing observations
of the SNPs to be imputed in a future sample ("target SNPs") |
pos.X |
The positions of the regressor SNPs |
pos.Y |
The positions of the target SNPs |
phase |
See "Details" below |
try |
The number of potential regressor SNPs to be
considered in the stepwise regression procedure around each target
SNP . The nearest try regressor SNPs to each target SNP
will be considered |
stopping |
Parameters of the stopping rule for the stepwise regression (see below) |
use.hap |
Parameters to control use of the haplotype imputation method (see below) |
em.cntrl |
Parameters to control test for convergence of EM algorithm for fitting phased haplotypes (see below) |
The routine first carries out a series of step-wise regression analyses in
which each Y SNP is regressed on the nearest try
regressor (X)
SNPs. If
phase
is TRUE
, the regressions are calculated at the
chromosome (haplotype) level, variances being simply p(1-p) and
covariances estimated
using the same algorithm used in ld.snp
. Otherwise, the
analysis is carried out at the diplotype level based on
conventional variance and covariance estimates using the
"all.obs"
missing value treatment (see cov
). New
SNPs are added to the regression until either (a) the value of
R^2 exceeds the first parameter of stopping
, (b) the
number of "tag" SNPs has reached the maximum set in the second parameter of
stopping
, or (c) the change in R^2 does not achieve the
target set by the third parameter of stopping
. If the third
parameter of stopping
is NA
, this last test is replaced
by a test for improvement in the Akaike information criterion (AIC).
If the prediction as measure by R^2, has not achieved a
threshold (the first parameter of use.hap
)
using more than one tag SNP, then a second imputation method is
tried. Phased
haplotype frequencies are estimated for the Y SNP plus the
tag SNPs. The R^2 for prediction of the Y SNP using these
haplotype frequencies is then calculated. If the (1-R^2) is reduced
by a proportion exceeding the second parameter of use.hap
, then
the haplotype imputation rule is saved in preference to the faster
regression rule. The argument em.cntrl
controls convergence
testing for the EM algorithm for fitting haplotype frequencies. The
first parameter is the maximum number of iterations, and the second
parameter is the threshold for the change in log likelihood
below which the iteration is judged to have converged.
An object of class
"snp.reg.imputation"
.
The phase=TRUE
option is not yet implemented
David Clayton david.clayton@cimr.cam.ac.uk
Chapman J.M., Cooper J.D., Todd J.A. and Clayton D.G. (2003) Human Heredity, 56:18-31.
snp.reg.imputation-class
, ld.snp
,
imputation.maf
, imputation.r2
# Remove 5 SNPs from a datset and derive imputation rules for them library(snpMatrix) data(for.exercise) sel <- c(20, 1000, 2000, 3000, 5000) to.impute <- snps.10[,sel] impute.from <- snps.10[,-sel] pos.to <- snp.support$position[sel] pos.fr <- snp.support$position[-sel] imp <- snp.imputation(impute.from, to.impute, pos.fr, pos.to)