combineFeatures {MSnbase}R Documentation

Combines features in an MSnSet object

Description

This function combines the features in an "MSnSet" instance applying a summarisation function (see fun argument) to sets of features as defined by a factor (see groupBy argument). Note that the feature names are automatically updated based on the groupBy parameter.

The coefficient of variations are automatically computed and collated to the featureData slot. See cv and cv.norm arguments for details.

NB: All the functions available as fun take a na.rm argument. This argument is FALSE by default. This will have as effect that NA get propagated at the higher level. It is generally advised to set na.rm = TRUE. See the example below.

Usage


combineFeatures(object, groupBy, fun = c("mean", "median",
"weighted.mean", "sum", "medpolish", "iPQF", "NTR"), redundancy.handler =
c("unique", "multiple"), cv = TRUE, cv.norm = "sum", verbose =
isMSnbaseVerbose(), ...)

Arguments

object

An instance of class "MSnSet" whose features will be summerised.

groupBy

A factor, character, numeric or a list of the above defining how to summerise the features. The list must be of length nrow(object). Each element of the list is a vector describing the feature mapping. If the list can be named, its names must match fetureNames(object). See redundancy.handler for details about the latter.

fun

The summerising function. Currently, mean, median, weighted mean, sum, median polish, iPQF (see iPQF for details) and NTR (see NTR for details) are implemented, but user-defined functions can also be supplied.

redundancy.handler

If groupBy is a list, one of "unique" (default) or "multiple" (ignored otherwise) defining how to handle peptides that can be associated to multiple higher-level features (proteins) upon combination. Using "unique" will only consider uniquely matching features (features matching multiple proteins will be discarded). "multiple" will allow matching to multiple proteins and each feature will be repeatedly tallied for each possible matching protein.

cv

A logical defining if feature coefficients of variation should be computed and stored as feature meta-data. Default is TRUE.

cv.norm

A character defining how to normalise the feature intensitites prior to CV calculation. Default is sum. Use none to keep intensities as is. See featureCV for more details.

verbose

A logical indicating whether verbose output is to be printed out.

...

Additional arguments for the fun function.

Value

A new "MSnSet" instance is returned with ncol (i.e. number of samples) is unchanged, but nrow (i.e. the number od features) is now equals to the number of levels in groupBy. The feature metadata (featureData slot) is updated accordingly and only the first occurrence of a feature in the original feature meta-data is kept.

Author(s)

Laurent Gatto <lg390@cam.ac.uk>

References

iPQF: a new peptide-to-protein summarization method using peptide spectra characteristics to improve protein quantification. Fischer M, Renard BY. Bioinformatics. 2016 Apr 1;32(7):1040-7. doi:10.1093/bioinformatics/btv675. Epub 2015 Nov 20. PubMed PMID:26589272.

See Also

featureCV to calculate coefficient of variation, nFeatures to document the number of features per group in the feature data, and the aggvar to explore variability within protein groups.

iPQF for iPQF summarisation.

NTR for normalisation to reference summarisation.

Examples

data(msnset)
msnset <- msnset[11:15, ]
exprs(msnset)

## arbitrary grouping into two groups
grp <- as.factor(c(1, 1, 2, 2, 2))
msnset.comb <- combineFeatures(msnset, grp, "sum")
dim(msnset.comb)
exprs(msnset.comb)
fvarLabels(msnset.comb)

## grouping with a list
grpl <- list(c("A", "B"), "A", "A", "C", c("C", "B"))
## optional naming
names(grpl) <- featureNames(msnset)
exprs(combineFeatures(msnset, grpl, fun = "sum", redundancy.handler = "unique"))
exprs(combineFeatures(msnset, grpl, fun = "sum", redundancy.handler = "multiple"))

## missing data
exprs(msnset)[4, 4] <-
    exprs(msnset)[2, 2] <- NA
exprs(msnset)
## NAs propagate in the 115 and 117 channels
exprs(combineFeatures(msnset, grp, "sum"))
## NAs are removed before summing
exprs(combineFeatures(msnset, grp, "sum", na.rm = TRUE))

## using iPQF
data(msnset2)
res <- combineFeatures(msnset2,
                       groupBy = fData(msnset2)$accession,
                       redundancy.handler = "unique",
                       fun = "iPQF",
                       low.support.filter = FALSE,
                       ratio.calc = "sum",
                       method.combine = FALSE)
head(exprs(res))

[Package MSnbase version 2.4.2 Index]