| find.a0 {siggenes} | R Documentation |
Provides the required information for obtaining the optimal choice of the fudge factor in the Empirical Bayes Analysis of Microarrays that uses the modified t statistics.
find.a0(data,cl,B=100,balanced=FALSE,mat.samp=NULL,delta=0.9,alpha=(0:9)/10,
include.0=TRUE,p0=NA,plot.legend=TRUE,na.rm=FALSE,rand=TRUE)
data |
a matrix, data frame or exprSet object containing the data that should be analyzed. Every row of this data set must correspond to a gene, and each column to a sample. |
cl |
a numeric vector of length ncol(data) containing the class
labels of the samples. In the two class paired case, cl can also
be a matrix with ncol(data) rows and 2 columns. If data is
a exprSet object, cl can also be a character string naming the column
of pData(data) that contains the class labels of the samples.
In the one-class case, cl should be a vector of 1's.
In the two class unpaired case, cl should be a vector containing 0's
(specifying the samples of, e.g., the control group) and 1's (specifying,
e.g., the case group).
In the two class paired case, cl can be either a vector or a matrix.
If it is a vector, then cl has to consist of the integers between -1 and
-n/2 (e.g., before treatment group) and between 1 and n/2 (e.g.,
after treatment group), where n is the length of cl and k
is paired with -k, k=1,...,n/2. If cl is a matrix, one
column should contain -1's and 1's specifying, e.g., the before and the after
treatment samples, respectively, and the other column should contain integer
between 1 and n/2 specifying the n/2 pairs of observations.
For examples of how cl can be specified, see the manual of siggenes |
B |
number of permutations used in the calculation of the null density. |
balanced |
if TRUE, only balanced permutations will be used. Default is
FALSE. |
mat.samp |
a permutation matrix. If specified, this matrix will be used, even if
rand and B are specified. |
delta |
a gene will be called differentially expressed, if its posterior
probability of being differentially expressed is large than or equal to
delta. |
alpha |
a vector of possible values for the fudge factor a0 in terms of quantiles of the standard deviations of the genes. |
include.0 |
if TRUE (default), a0=0 will also be a possible
choice for the fudge factor. |
p0 |
the prior probability that a gene is differentially expressed. If not specified, it will automatically be computed. |
plot.legend |
if TRUE (default), a legend will be added to the plot of the
expression scores vs. their logit-transformed posterior probability. |
na.rm |
if FALSE (default), the expression score of genes with one or more
missing values will be set to NA. If TRUE, the missing values
will be replaced by the genewise mean of the non-missing values. |
rand |
if specified, the random number generator will be put in a reproducible state. |
a list of the numbers of genes called differentially expressed by the EBAM analysis for several choices of a0, and the plot of the expression scores vs. their corresponding logit-transformed posterior probability of being significant.
sig.a0 |
vector containing the number of differentially expressed genes for the specified set of values for a0. |
a0 |
the optimal choice of the fudge factor using the criterion of Efron et al. (2001) that the a0 should be used which leads to the most differentially expressed genes. |
The results of find.a0 must be assigned to an object for the further analysis
with ebam.
Holger Schwender, holger.schw@gmx.de
Efron, B., Tibshirani, R., Storey, J.D., and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Storey, J.D., and Tibshirani, R. (2003). Statistical significance for genome-wide experiments, Technical Report, Department of Statistics, Stanford University.
Schwender, H. (2003). Assessing the false discovery rate in a statistical analysis of gene expression data, Chapter 7, Diploma thesis, Department of Statistics, University of Dortmund, http://de.geocities.com/holgerschw/thesis.pdf.
## Not run:
library(multtest)
# Load the data of Golub et al. (1999). data(golub) contains
# a 3051x38 gene expression matrix called golub, a vector of
# length called golub.cl that consists of the 38 class labels,
# and a matrix called golub.gnames whose third column contains
# the gene names.
data(golub)
# Now the optimal value for the fudge factor a0 is computed,
# where possible values of the a0 are 0 and the 0, 0.05 and
# 0.1 quantile of the standard deviations of the genes.
# Setting rand=123 makes the results reproducible.
find.out<-find.a0(golub,golub.cl,alpha=c(0,0.05,0.1),rand=123)
## End(Not run)