Single-cell RNA-seq denoising using a deep count autoencoder

被引:775
作者
Eraslan, Goekcen [1 ,2 ]
Simon, Lukas M. [1 ]
Mircea, Maria [1 ]
Mueller, Nikola S. [1 ]
Theis, Fabian J. [1 ,2 ,3 ]
机构
[1] Helmholtz Zentrum Munchen, Inst Computat Biol, Neuherberg, Germany
[2] Tech Univ Munich, TUM Sch Life Sci Weihenstephan, Freising Weihenstephan, Germany
[3] Tech Univ Munich, Dept Math, Garching, Germany
关键词
HETEROGENEITY; CHALLENGES; NOISE;
D O I
10.1038/s41467-018-07931-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
引用
收藏
页数:14
相关论文
共 53 条
[1]
Abadi M., 2015, P 12 USENIX S OPERAT
[2]
Angerer Philipp, 2017, Current Opinion in Systems Biology, V4, P85, DOI 10.1016/j.coisb.2017.07.004
[3]
[Anonymous], 2018, Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing, DOI [DOI 10.1101/292037, 10.1101/292037]
[4]
Azizi Elham., 2017, Genomics and Computational Biology, V3, P46, DOI DOI 10.18547/gcb.2017.vol3.iss1.e46
[5]
Hyperopt: A Python library for model selection and hyperparameter optimization [J].
Bergstra, James ;
Komer, Brent ;
Eliasmith, Chris ;
Yamins, Dan ;
Cox, David D .
Computational Science and Discovery, 2015, 8 (01)
[6]
The time-resolved transcriptome of C. elegans [J].
Boeck, Max E. ;
Chau Huynh ;
Gevirtzman, Lou ;
Thompson, Owen A. ;
Wang, Guilin ;
Kasper, Dionna M. ;
Reinke, Valerie ;
Hillier, LaDeana W. ;
Waterston, Robert H. .
GENOME RESEARCH, 2016, 26 (10) :1441-1450
[7]
Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/nmeth.2645, 10.1038/NMETH.2645]
[8]
Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[9]
UMI-count modeling and differential expression analysis for single-cell RNA sequencing [J].
Chen, Wenan ;
Li, Yan ;
Easton, John ;
Finkelstein, David ;
Wu, Gang ;
Chen, Xiang .
GENOME BIOLOGY, 2018, 19
[10]
Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm [J].
Chu, Li-Fang ;
Leng, Ning ;
Zhang, Jue ;
Hou, Zhonggang ;
Mamott, Daniel ;
Vereide, David T. ;
Choi, Jeea ;
Kendziorski, Christina ;
Stewart, Ron ;
Thomson, James A. .
GENOME BIOLOGY, 2016, 17