runPCA {scater}R Documentation

Plot PCA for a SingleCellExperiment object

Description

Produce a principal components analysis (PCA) plot of two or more principal components for an SingleCellExperiment dataset.

Usage

runPCA(object, ntop = 500, ncomponents = 2, exprs_values = "logcounts",
  feature_set = NULL, scale_features = TRUE, pca_data_input = "logcounts",
  selected_variables = NULL, detect_outliers = FALSE)

plotPCASCE(object, colour_by = NULL, shape_by = NULL, size_by = NULL,
  return_SCE = FALSE, draw_plot = TRUE, theme_size = 10,
  legend = "auto", rerun = FALSE, ncomponents = 2,
  detect_outliers = FALSE, ...)

## S4 method for signature 'SingleCellExperiment'
plotPCA(object, colour_by = NULL,
  shape_by = NULL, size_by = NULL, return_SCE = FALSE, draw_plot = TRUE,
  theme_size = 10, legend = "auto", rerun = FALSE, ncomponents = 2,
  detect_outliers = FALSE, ...)

Arguments

object

an SingleCellExperiment object

ntop

numeric scalar indicating the number of most variable features to use for the PCA. Default is 500, but any ntop argument is overrided if the feature_set argument is non-NULL.

ncomponents

numeric scalar indicating the number of principal components to plot, starting from the first principal component. Default is 2. If ncomponents is 2, then a scatterplot of PC2 vs PC1 is produced. If ncomponents is greater than 2, a pairs plots for the top components is produced.

exprs_values

character string indicating which values should be used as the expression values for this plot. Valid arguments are "tpm" (transcripts per million), "norm_tpm" (normalised TPM values), "fpkm" (FPKM values), "norm_fpkm" (normalised FPKM values), "counts" (counts for each feature), "norm_counts", "cpm" (counts-per-million), "norm_cpm" (normalised counts-per-million), "logcounts" (log-transformed count data; default), "norm_exprs" (normalised expression values) or "stand_exprs" (standardised expression values) or any other named element of the assays slot of the SingleCellExperiment object that can be accessed with the assay function.

feature_set

character, numeric or logical vector indicating a set of features to use for the PCA. If character, entries must all be in featureNames(object). If numeric, values are taken to be indices for features. If logical, vector is used to index features and should have length equal to nrow(object).

scale_features

logical, should the expression values be standardised so that each feature has unit variance? Default is TRUE.

pca_data_input

character argument defining which data should be used as input for the PCA. Possible options are "logcounts" (default), which uses log-count data to produce a PCA at the cell level; "coldata" or "pdata" (for backwards compatibility) which uses numeric variables from colData(object) to do PCA at the cell level; and "rowdata" which uses numeric variables from rowData(object) to do PCA at the feature level.

selected_variables

character vector indicating which variables in colData(object) to use for the phenotype-data based PCA. Ignored if the argument pca_data_input is anything other than "pdata".

detect_outliers

logical, should outliers be detected in the PC plot? Only an option when pca_data_input argument is "pdata". Default is FALSE.

colour_by

character string defining the column of pData(object) to be used as a factor by which to colour the points in the plot. Alternatively, a data frame with one column, containing values to map to colours for all cells.

shape_by

character string defining the column of pData(object) to be used as a factor by which to define the shape of the points in the plot. Alternatively, a data frame with one column containing values to map to shapes.

size_by

character string defining the column of pData(object) to be used as a factor by which to define the size of points in the plot. Alternatively, a data frame with one column containing values to map to sizes.

return_SCE

logical, should the function return an SingleCellExperiment object with principal component values for cells in the reducedDim slot. Default is FALSE, in which case a ggplot object is returned.

draw_plot

logical, should the plot be drawn on the current graphics device? Only used if return_SCE is TRUE, otherwise the plot is always produced.

theme_size

numeric scalar giving default font size for plotting theme (default is 10).

legend

character, specifying how the legend(s) be shown? Default is "auto", which hides legends that have only one level and shows others. Alternatives are "all" (show all legends) or "none" (hide all legends).

rerun

logical, should PCA be recomputed even if object contains a "PCA" element in the reducedDims slot?

...

further arguments passed to plotPCASCE

Details

The function prcomp is used internally to do the PCA. The function checks whether the object has standardised expression values (by looking at stand_exprs(object)). If yes, the existing standardised expression values are used for the PCA. If not, then standardised expression values are computed using scale (with feature-wise unit variances or not according to the scale_features argument), added to the object and PCA is done using these new standardised expression values.

If the arguments detect_outliers and return_SCE are both TRUE, then the element $outlier is added to the pData (phenotype data) slot of the SingleCellExperiment object. This element contains indicator values about whether or not each cell has been designated as an outlier based on the PCA. These values can be accessed for filtering low quality cells with, for example, example_sce$outlier.

Value

either a ggplot plot object or an SingleCellExperiment object

Examples

## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts), colData = sc_example_cell_info)
example_sce <- normalize(example_sce)
drop_genes <- apply(exprs(example_sce), 1, function(x) {var(x) == 0})
example_sce <- example_sce[!drop_genes, ]

## Examples plotting PC1 and PC2
plotPCA(example_sce)
plotPCA(example_sce, colour_by = "Cell_Cycle")
plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment")
plotPCA(example_sce, colour_by = "Cell_Cycle", shape_by = "Treatment",
size_by = "Mutation_Status")
plotPCA(example_sce, shape_by = "Treatment", size_by = "Mutation_Status")
plotPCA(example_sce, feature_set = 1:100, colour_by = "Treatment",
shape_by = "Mutation_Status")

## experiment with legend
example_subset <- example_sce[, example_sce$Treatment == "treat1"]
plotPCA(example_subset, colour_by = "Cell_Cycle", shape_by = "Treatment", legend = "all")

plotPCA(example_sce, shape_by = "Treatment", return_SCE = TRUE)

## Examples plotting more than 2 PCs
plotPCA(example_sce, ncomponents = 8)
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
shape_by = "Mutation_Status")


[Package scater version 1.6.3 Index]