Contents

1 Introduction

The EpiDISH package provides tools to infer the proportions of a priori known cell subtypes present in a sample representing a mixture of such cell-types. Inference proceeds via one of 3 methods (Robust Partial Correlations-RPC(Teschendorff et al. 2017), Cibersort (CBS)(Newman et al. 2015), Constrained Projection (CP)(Houseman et al. 2012)), as determined by user. Besides, we also provide a method - CellDMC that allows the identification of differentially methylated cell-types and their directionality of change.

For now, the package contains 4 references, including two blood subtypes reference, as well as one reference with epithelial cells, fibroblasts, and total immune cells, and one reference with epithelial cells, fibroblasts, fat cells, and total immune cells, described in (Teschendorff et al. 2017) and (Zheng, Webster, et al. 2018).

2 How to estimte cell-type fractions using DNAm data

To show an example here, we use a dummy beta value matrix DummyBeta.m, which contains 2000 CpGs and 10 samples. The DummyBeta.m is stored in the package.

We load EpiDISH package, beta value matrix, and the EpiFibIC reference.

library(EpiDISH)
data(centEpiFibIC.m)
data(DummyBeta.m)

Notice that centEpiFibIC.m has 3 columns, with columns names EPi, Fib and IC. We go ahead and use epidish function with RPC mode to infer the fractions.

out.l <- epidish(beta.m = DummyBeta.m, ref.m = centEpiFibIC.m, method = "RPC") 

Then, we check the output list. estF is the estimated cell fraction matrix. ref is the reference centroid matrix used; and dataREF is the input data matrix over the probes defined in the reference matrix.

out.l$estF
##                  Epi        Fib           IC
## GSM868022 0.08836819 0.06109607 0.8505357378
## GSM868018 0.07652115 0.57326994 0.3502089007
## GSM868022 0.15417391 0.75663136 0.0891947251
## GSM868020 0.77082647 0.04171941 0.1874541181
## GSM868018 0.03960599 0.31921224 0.6411817742
## GSM868020 0.12751711 0.79642919 0.0760537000
## GSM868018 0.18144315 0.72889883 0.0896580171
## GSM868022 0.20220823 0.40929344 0.3884983293
## GSM868022 0.19398079 0.80540932 0.0006098973
## GSM868018 0.27976647 0.23671333 0.4835201992
dim(out.l$ref)
## [1] 599   3
dim(out.l$dataREF)
## [1] 599  10

In quality control step, we might remove bad probes from all probes on 450k or 850k array; consequently, not all probes in the reference can be found in inquiry data. By checking ref and dataREF, we can extract the probes used to infer the proportions. If most of the probes in the reference cannot be found, the estimated proportions might be compromised.

3 How to estimte cell-type fractions in a two-step framework

HEpiDISH is a iterative hierarchical procedure of EpiDISH. HEpiDISH uses two distinct DNAm references, a primary reference for the estimation of several cell-types fractions, and a separate secondary non-overlapping DNAm reference for the estimation of underlying subtype fractions of one of the cell-type in the primary reference. Fig1. HEpiDISH workflow In this example, the third cell-type in primary reference is total immune cells. We would like to know the fractions of immune cell subtypes. So we use a secondary reference, which contains 7 immnue cell subtypes, and tell hepidish function that the third column of primary reference is the total of secodnary reference.

data(centBloodSub.m)
frac.m <- hepidish(beta.m = DummyBeta.m, ref1.m = centEpiFibIC.m, ref2.m = centBloodSub.m[,c(1, 2, 5)], h.CT.idx = 3, method = 'RPC')
frac.m
##                  Epi        Fib            B           NK       Mono
## GSM868022 0.08836819 0.06109607 0.6446835622 0.0945693668 0.11128281
## GSM868018 0.07652115 0.57326994 0.0502766152 0.2999322854 0.00000000
## GSM868022 0.15417391 0.75663136 0.0381194625 0.0134501813 0.03762508
## GSM868020 0.77082647 0.04171941 0.1434958145 0.0211681974 0.02279011
## GSM868018 0.03960599 0.31921224 0.0167748647 0.1912747358 0.43313217
## GSM868020 0.12751711 0.79642919 0.0286647024 0.0252778983 0.02211110
## GSM868018 0.18144315 0.72889883 0.0515861314 0.0228453164 0.01522657
## GSM868022 0.20220823 0.40929344 0.1908434542 0.1772700742 0.02038480
## GSM868022 0.19398079 0.80540932 0.0003521377 0.0002577596 0.00000000
## GSM868018 0.27976647 0.23671333 0.2546961632 0.1008399798 0.12798406

4 More info about different methods for cell-type fractions estimation

We compared CP and RPC in (Teschendorff et al. 2017). And we also have a review article(Teschendorff and Zheng 2017) which summarized all methods tackling cell heterogeneity for DNAm data. Refers to references section for more details.

5 How to identify differentially methylated cell-types and their directionality of change

After estimating cell-type fractions, we can identify differentially methylated cell-types and their directionality of change using CellDMC (Zheng, Breeze, et al. 2018)function. The workflow of CellDMC is shown below. Fig2. CellDMC workflow

pheno.v <- rep(c(0, 1), each = 5)
celldmc.o <- CellDMC(DummyBeta.m, pheno.v, frac.m)

The DMCTs prediction is given(pls note this is faked data. The sample size is too small to find DMCTs.):

head(celldmc.o$dmct)
##            DMC Epi Fib B NK Mono
## cg17506061   0   0   0 0  0    0
## cg09300980   0   0   0 0  0    0
## cg18886245   0   0   0 0  0    0
## cg17470327   0   0   0 0  0    0
## cg26082174   0   0   0 0  0    0
## cg14737131   0   0   0 0  0    0

More info, pls refer to help page of CellDMC.

6 Sessioninfo

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] EpiDISH_2.0.0    BiocStyle_2.12.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1         bookdown_0.9       quadprog_1.5-6    
##  [4] matrixStats_0.54.0 class_7.3-15       digest_0.6.18     
##  [7] MASS_7.3-51.4      magrittr_1.5       e1071_1.7-1       
## [10] evaluate_0.13      stringi_1.4.3      rmarkdown_1.12    
## [13] tools_3.6.0        stringr_1.4.0      parallel_3.6.0    
## [16] xfun_0.6           yaml_2.2.0         compiler_3.6.0    
## [19] BiocManager_1.30.4 htmltools_0.3.6    knitr_1.22

References

Houseman, Eugene Andres, William P Accomando, Devin C Koestler, Brock C Christensen, Carmen J Marsit, Heather H Nelson, John K Wiencke, and Karl T Kelsey. 2012. “DNA methylation arrays as surrogate measures of cell mixture distribution.” BMC Bioinformatics 13 (1):86.

Newman, Aaron M, Chih Long Liu, Michael R Green, Andrew J Gentles, Weiguo Feng, Yue Xu, Chuong D Hoang, Maximilian Diehn, and Ash A Alizadeh. 2015. “Robust enumeration of cell subsets from tissue expression profiles.” Nature Methods 12 (5):453–57.

Teschendorff, Andrew E, Charles E Breeze, Shijie C Zheng, and Stephan Beck. 2017. “A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies.” BMC Bioinformatics 18 (1):105.

Teschendorff, Andrew E, and Shijie C Zheng. 2017. “Cell-type deconvolution in epigenome-wide association studies: a review and recommendations.” Epigenomics 9 (5):757–68.

Zheng, Shijie C, Charles E Breeze, Stephan Beck, and Andrew E Teschendorff. 2018. “Identification of differentially methylated cell-types in Epigenome-Wide Association Studies.” Nature Methods 15 (12):1059–66.

Zheng, Shijie C, Amy P Webster, Danyue Dong, Andy Feber, David G Graham, Roisin Sullivan, Sarah Jevons, et al. 2018. “A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix.” Epigenomics 10 (7):925–40.