| maSigPro {maSigPro} | R Documentation |
maSigPro performs a whole maSigPro analysis for a times series gene expression experiment.
The function sucesively calls the functions make.design.matrix(optional), p.vector, T.fit,
get.siggenes and see.genes.
maSigPro(data, edesign, matrix = "AUTO", groups.vector = NULL,
degree = 2, time.col = 1, repl.col = 2, group.cols = c(3:ncol(edesign)),
Q = 0.05, alfa = Q, nvar.correction = FALSE, step.method = "backward", rsq = 0.7,
min.obs = 3, vars = "groups", significant.intercept = "dummy", cluster.data = 1,
add.IDs = FALSE, IDs = NULL, matchID.col = 1, only.names = FALSE, k = 9, m = 1.45,
cluster.method = "hclust", distance = "cor", agglo.method = "ward", iter.max = 500,
summary.mode = "median", color.mode = "rainbow", trat.repl.spots = "none",
index = IDs[, (matchID.col + 1)], match = IDs[, matchID.col], rs = 0.7,
show.fit = TRUE, show.lines = TRUE, pdf = TRUE, cexlab = 0.8,
legend = TRUE, main = NULL, ...)
data |
matrix with normalized gene expression data. Genes must be in
rows and arrays in columns. Row names must contain geneIDs
(argument of p.vector) |
edesign |
matrix of experimental design. Row names must contain
arrayIDs
(argument of make.design.matrix and see.genes) |
matrix |
design matrix for regression analysis. By default design is
calculated with make.design.matrix
(argument of p.vector and T.fit, by
default computed by make.design.matrix) |
groups.vector |
vector indicating experimental group of each variable
(argument of get.siggenes and
see.genes, by default computed by
make.design.matrix) |
degree |
the degree of the regression fit polynome. degree = 1
returns lineal regression, degree = 2 returns quadratic regression,
etc...
(argument of make.design.matrix) |
time.col |
column in edesign containing time values. Default is first
column
(argument of make.design.matrix and
see.genes) |
repl.col |
column in edesign containing coding for replicates
arrays. Default is second column
(argument of make.design.matrix and see.genes)
|
group.cols |
columns in edesign indicating the coding for each
group of the experiment (see make.design.matrix)
(argument of make.design.matrix and
see.genes) |
Q |
level of false discovery rate (FDR) control
(argument of p.vector) |
alfa |
significance level used for variable selection in the stepwise
regression
(argument of T.fit) |
nvar.correction |
logical for indicating correcting of stepwise
regression significance level
(argument of T.fit) |
step.method |
argument to be passed to the step function.
Can be either "backward", "forward",
"two.ways.backward" or "two.ways.forward" |
rsq |
cut-off level at the R-squared value for the stepwise regression
fit.
Only genes with R-squared greater than rsq are selected |
min.obs |
genes with less than this number of true numerical values
will be excluded from the analysis
(argument of p.vector and T.fit) |
vars |
variables for which to extract significant genes
(argument of get.siggenes) |
significant.intercept |
experimental groups for which significant
intercept coefficients are considered
(argument of get.siggenes) |
cluster.data |
Type of data used by the cluster algorithm
(argument of see.genes) |
add.IDs |
logical indicating whether to include additional gene id's
in the significant genes result
(argument of get.siggenes) |
IDs |
matrix contaning additional gene id information (required when
add.IDs is TRUE)
(argument of get.siggenes) |
matchID.col |
number of matching column in matrix IDs for adding genes
ids
(argument ofget.siggenes) |
only.names |
logical. If TRUE, expression values are ommited in the
significant genes result
(argument of get.siggenes) |
k |
number of clusters
(argument of see.genes) |
m |
m parameter when "mfuzz" clustering algorithm is used. See
mfuzz
(argument of see.genes) |
cluster.method |
clustering method for data partioning
(argument of see.genes) |
distance |
distance measurement function used when
cluster.method is "hclust"
(argument of see.genes) |
agglo.method |
aggregation method used when cluster.method is
"hclust"
(argument of see.genes) |
iter.max |
number of iterations when cluster.method is
"kmeans"
(argument of see.genes) |
summary.mode |
the method to condensate expression information when
more than one gene is present in the data.
Possible values are "representative" and "median"
(argument of PlotGroups) |
color.mode |
color scale for plotting profiles. Can be either
"rainblow" or "gray"
(argument of PlotProfiles) |
trat.repl.spots |
treatment givent to replicate spots. Possible values are "none" and "average"
(argument of get.siggenes) |
index |
argument of the average.rows function to use when trat.repl.spots is "average"
(argument of get.siggenes) |
match |
argument of the link{\average.rows} function to use when trat.repl.spots is "average"
(argument of get.siggenes) |
rs |
minimun pearson correlation coefficient for replicated spots profiles to be averaged
(argument of get.siggenes) |
show.fit |
logical indicating whether regression fit curves must be plotted
(argument of see.genes) |
show.lines |
logical indicating whether a line must be drawn joining plotted data points for reach group
(argument of see.genes) |
pdf |
logical indicating whether a pdf results file must be generated
(argument of see.genes) |
cexlab |
graphical parameter maginfication to be used for x labels in plotting functions |
legend |
logical indicating whether legend must be added when plotting profiles
(argument of see.genes) |
main |
title for pdf results file |
... |
other graphical function arguments |
maSigPro finds and display genes with significant profile differences in time series gene expression experiments.
The main, compulsory, input parameters for this function are a matrix of gene expression data (see p.vector for details)
and a matrix describing experimental design (see make.design.matrix or p.vector for details). In case extended
gene ID information is wanted to be included in the result of significant genes, a third IDs matrix containing this
information will be required (see get.siggenes for details).
Basiscally in the function calls subsequent steps of the maSigPro approach which is:
summary |
a vector or matrix listing significant genes for the variables given by the function parameters |
sig.genes |
a list with detailed information on the significant genes found for the variables given by the function parameters. Each element of the list is also a list containing:
sig.profiles: expression values of significant genes.The cluster assingment of each gene is given in the last column
coefficients: regression coefficients for significant genes
t.score: value of the t statistics of significant genes
sig.pvalues: p-values of the regression coefficients for significant genes
g: number of genes
... :arguments passed by previous functions |
input.data |
input analysis data |
G |
number of input genes |
edesign |
matrix of experimental design |
dis |
regression design matrix |
min.obs |
imputed value for minimal number of true observations |
p.vector |
vector containing the computed p-values of the general regression model for each gene |
variables |
variables in the general regression model |
g |
number of signifant genes |
p.vector.alfa |
p-vlaue at FDR = Q control |
step.method |
imputed step method for stepwise regression |
Q |
imputed value for false discovery rate (FDR) control |
step.alfa |
inputed significance level in stepwise regression |
influ.info |
data frame of genes containing influencial data |
Ana Conesa, aconesa@ivia.es; Maria Jose Nueda, mj.nueda@ua.es
Conesa, A., Nueda M.J., Alberto Ferrer, A., Talon, T. 2005. maSigPro: a Method to Identify Significant Differential Expression Profiles in Time-Course Microarray Experiments.
make.design.matrix, p.vector, T.fit, get.siggenes, see.genes
#### GENERATE TIME COURSE DATA
## generate n random gene expression profiles of a data set with
## one control plus 3 treatments, 3 time points and r replicates per time point.
tc.GENE <- function(n, r,
var11 = 0.01, var12 = 0.01,var13 = 0.01,
var21 = 0.01, var22 = 0.01, var23 =0.01,
var31 = 0.01, var32 = 0.01, var33 = 0.01,
var41 = 0.01, var42 = 0.01, var43 = 0.01,
a1 = 0, a2 = 0, a3 = 0, a4 = 0,
b1 = 0, b2 = 0, b3 = 0, b4 = 0,
c1 = 0, c2 = 0, c3 = 0, c4 = 0)
{
tc.dat <- NULL
for (i in 1:n) {
Ctl <- c(rnorm(r, a1, var11), rnorm(r, b1, var12), rnorm(r, c1, var13)) # Ctl group
Tr1 <- c(rnorm(r, a2, var21), rnorm(r, b2, var22), rnorm(r, c2, var23)) # Tr1 group
Tr2 <- c(rnorm(r, a3, var31), rnorm(r, b3, var32), rnorm(r, c3, var33)) # Tr2 group
Tr3 <- c(rnorm(r, a4, var41), rnorm(r, b4, var42), rnorm(r, c4, var43)) # Tr3 group
gene <- c(Ctl, Tr1, Tr2, Tr3)
tc.dat <- rbind(tc.dat, gene)
}
tc.dat
}
## Create 270 flat profiles
flat <- tc.GENE(n = 270, r = 3)
## Create 10 genes with profile differences between Ctl and Tr1 groups
twodiff <- tc.GENE (n = 10, r = 3, b2 = 0.5, c2 = 1.3)
## Create 10 genes with profile differences between Ctl, Tr2, and Tr3 groups
threediff <- tc.GENE(n = 10, r = 3, b3 = 0.8, c3 = -1, a4 = -0.1, b4 = -0.8, c4 = -1.2)
## Create 10 genes with profile differences between Ctl and Tr2 and different variance
vardiff <- tc.GENE(n = 10, r = 3, a3 = 0.7, b3 = 1, c3 = 1.2, var32 = 0.03, var33 = 0.03)
## Create dataset
tc.DATA <- rbind(flat, twodiff, threediff, vardiff)
rownames(tc.DATA) <- paste("feature", c(1:300), sep = "")
colnames(tc.DATA) <- paste("Array", c(1:36), sep = "")
tc.DATA[sample(c(1:(300*36)), 300)] <- NA # introduce missing values
#### CREATE EXPERIMENTAL DESIGN
Time <- rep(c(rep(c(1:3), each = 3)), 4)
Replicates <- rep(c(1:12), each = 3)
Control <- c(rep(1, 9), rep(0, 27))
Treat1 <- c(rep(0, 9), rep(1, 9), rep(0, 18))
Treat2 <- c(rep(0, 18), rep(1, 9), rep(0,9))
Treat3 <- c(rep(0, 27), rep(1, 9))
edesign <- cbind(Time, Replicates, Control, Treat1, Treat2, Treat3)
rownames(edesign) <- paste("Array", c(1:36), sep = "")
#### RUN maSigPro
tc.test <- maSigPro (tc.DATA, edesign, degree = 2, vars = "groups", main = "Test")
tc.test$g # gives number of total significant genes
tc.test$summary # shows significant genes by experimental groups
tc.test$sig.genes$Treat1$sig.pvalues # shows pvalues of the significant coefficients
# in the regression models of the significant genes
# for Control.vs.Treat1 comparison