MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data

被引:1719
作者
Finak, Greg [1 ]
McDavid, Andrew [1 ]
Yajima, Masanao [1 ]
Deng, Jingyuan [1 ]
Gersuk, Vivian [2 ]
Shalek, Alex K. [3 ,4 ,5 ,6 ]
Slichter, Chloe K. [1 ]
Miller, Hannah W. [1 ]
McElrath, M. Juliana [1 ]
Prlic, Martin [1 ]
Linsley, Peter S. [2 ]
Gottardo, Raphael [1 ,7 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98109 USA
[2] Benaroya Res Inst Virginia Mason, Seattle, WA 98101 USA
[3] MIT, Inst Med Engn & Sci, Boston, MA 01239 USA
[4] MIT, Dept Chem, Boston, MA 01239 USA
[5] Ragon Inst MGH MIT & Harvard, Cambridge, MA 02139 USA
[6] Broad Inst MIT & Harvard, Cambridge, MA 02139 USA
[7] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Seattle, WA 98109 USA
来源
GENOME BIOLOGY | 2015年 / 16卷
关键词
Bimodality; Cellular detection rate; Co-expression; Empirical Bayes; Generalized linear model; Gene set enrichment analysis; DIFFERENTIAL EXPRESSION ANALYSIS; STOCHASTIC GENE-EXPRESSION; MESSENGER-RNA; SEQ DATA; BIOCONDUCTOR; MOLECULES; DISCOVERY;
D O I
10.1186/s13059-015-0844-5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment.
引用
收藏
页数:13
相关论文
共 35 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics [J].
Blake, Judith A. ;
Bult, Carol J. ;
Kadin, James A. ;
Richardson, Joel E. ;
Eppig, Janan T. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D842-D848
[4]  
Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/NMETH.2645, 10.1038/nmeth.2645]
[5]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[6]   Bystander-Activated Memory CD8 T Cells Control Early Pathogen Load in an Innate-like, NKG2D-Dependent Manner [J].
Chu, Talyn ;
Tyznik, Aaron J. ;
Roepke, Sarah ;
Berkley, Amy M. ;
Woodward-Davis, Amanda ;
Pattacini, Laura ;
Bevan, Michael J. ;
Zehn, Dietmar ;
Prlic, Martin .
CELL REPORTS, 2013, 3 (03) :701-708
[7]  
Cousins RD., 2007, ANNOTATED BIBLIOGRAP
[8]   Combining independent p values:: Extensions of the Stouffer and binomial methods [J].
Darlington, RB ;
Hayes, AF .
PSYCHOLOGICAL METHODS, 2000, 5 (04) :496-515
[9]  
deWit H, 1996, EXP HEMATOL, V24, P228
[10]   GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists [J].
Eden, Eran ;
Navon, Roy ;
Steinfeld, Israel ;
Lipson, Doron ;
Yakhini, Zohar .
BMC BIOINFORMATICS, 2009, 10