A survey of best practices for RNA-seq data analysis

被引:1595
作者
Conesa, Ana [1 ,2 ]
Madrigal, Pedro [3 ,4 ]
Tarazona, Sonia [2 ,5 ]
Gomez-Cabrero, David [6 ,7 ,8 ,9 ]
Cervera, Alejandra [10 ,11 ]
McPherson, Andrew [12 ]
Szczesniak, Michal Wojciech [13 ]
Gaffney, Daniel J. [3 ]
Elo, Laura L. [14 ,15 ]
Zhang, Xuegong [16 ,17 ,18 ]
Mortazavi, Ali [19 ,20 ]
机构
[1] Univ Florida, Inst Food & Agr Sci, Dept Microbiol & Cell Sci, Gainesville, FL 32603 USA
[2] Ctr Invest Principe Felipe, Genom Gene Express Lab, Valencia 46012, Spain
[3] Wellcome Trust Sanger Inst, Wellcome Trust Genome Campus, Cambridge CB10 1SA, England
[4] Univ Cambridge, Dept Surg, Wellcome Trust Med Res Council Cambridge Stem Cel, Anne McLaren Lab Regenerat Med, Cambridge CB2 0SZ, England
[5] Univ Politecn Valencia, Dept Appl Stat Operat Res & Qual, Valencia 46020, Spain
[6] Karolinska Inst, Karolinska Univ Hosp, Dept Med, Unit Computat Med, S-17177 Stockholm, Sweden
[7] Karolinska Inst, Ctr Mol Med, S-17177 Stockholm, Sweden
[8] Karolinska Univ Hosp, Dept Med, Clin Epidemiol Unit, L8, S-17176 Stockholm, Sweden
[9] Sci Life Lab, S-17121 Solna, Sweden
[10] Univ Helsinki, Syst Biol Lab, Inst Biomed, FIN-00014 Helsinki, Finland
[11] Univ Helsinki, Genome Scale Biol Res Program, FIN-00014 Helsinki, Finland
[12] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[13] Adam Mickiewicz Univ, Inst Mol Biol & Biotechnol, Dept Bioinformat, PL-61614 Poznan, Poland
[14] Univ Turku, Turku Ctr Biotechnol, FI-20520 Turku, Finland
[15] Abo Akad Univ, FI-20520 Turku, Finland
[16] Tsinghua Univ, Key Lab Bioinformat, Bioinformat Div, TNLIST, Beijing 100084, Peoples R China
[17] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[18] Tsinghua Univ, Sch Life Sci, Beijing 100084, Peoples R China
[19] Univ Calif Irvine, Dept Dev & Cell Biol, Irvine, CA 92697 USA
[20] Univ Calif Irvine, Ctr Complex Biol Syst, Irvine, CA 92697 USA
来源
GENOME BIOLOGY | 2016年 / 17卷
基金
芬兰科学院;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; SIMULTANEOUS ISOFORM DISCOVERY; INTEGRATED ANALYSIS MMIA; WEB-BASED TOOL; SINGLE-CELL; GENE-EXPRESSION; DNA-METHYLATION; COMPREHENSIVE EVALUATION; CHROMATIN ACCESSIBILITY; TRANSCRIPTOME ANALYSIS;
D O I
10.1186/s13059-016-0881-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
引用
收藏
页数:19
相关论文
共 211 条
[51]   Sequencing technology does not eliminate biological variability [J].
Hansen, Kasper D. ;
Wu, Zhijin ;
Irizarry, Rafael A. ;
Leek, Jeffrey T. .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :572-573
[52]   baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data [J].
Hardcastle, Thomas J. ;
Kelly, Krystyna A. .
BMC BIOINFORMATICS, 2010, 11
[53]   Calculating Sample Size Estimates for RNA Sequencing Data [J].
Hart, Steven N. ;
Therneau, Terry M. ;
Zhang, Yuji ;
Poland, Gregory A. ;
Kocher, Jean-Pierre .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (12) :970-978
[54]   ReadXplorer-visualization and analysis of mapped sequences [J].
Hilker, Rolf ;
Stadermann, Kai Bernd ;
Doppmeier, Daniel ;
Kalinowski, Joern ;
Stoye, Jens ;
Straube, Jasmin ;
Winnebald, Joern ;
Goesmann, Alexander .
BIOINFORMATICS, 2014, 30 (16) :2247-2254
[55]   Simultaneous Isoform Discovery and Quantification from RNA-Seq [J].
Hiller D. ;
Wong W.H. .
Statistics in Biosciences, 2013, 5 (1) :100-118
[56]   DiffSplice: the genome-wide detection of differential splicing events with RNA-seq [J].
Hu, Yin ;
Huang, Yan ;
Du, Ying ;
Orellana, Christian F. ;
Singh, Darshan ;
Johnson, Amy R. ;
Monroy, Anais ;
Kuan, Pei-Fen ;
Hammond, Scott M. ;
Makowski, Liza ;
Randell, Scott H. ;
Chiang, Derek Y. ;
Hayes, D. Neil ;
Jones, Corbin ;
Liu, Yufeng ;
Prins, Jan F. ;
Liu, Jinze .
NUCLEIC ACIDS RESEARCH, 2013, 41 (02) :e39
[57]   Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Lempicki, Richard A. .
NATURE PROTOCOLS, 2009, 4 (01) :44-57
[58]   Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Lempicki, Richard A. .
NUCLEIC ACIDS RESEARCH, 2009, 37 (01) :1-13
[59]  
Huber W, 2015, NAT METHODS, V12, P115, DOI [10.1038/nmeth.3252, 10.1038/NMETH.3252]
[60]   InterPro in 2011: new developments in the family and domain prediction database [J].
Hunter, Sarah ;
Jones, Philip ;
Mitchell, Alex ;
Apweiler, Rolf ;
Attwood, Teresa K. ;
Bateman, Alex ;
Bernard, Thomas ;
Binns, David ;
Bork, Peer ;
Burge, Sarah ;
de Castro, Edouard ;
Coggill, Penny ;
Corbett, Matthew ;
Das, Ujjwal ;
Daugherty, Louise ;
Duquenne, Lauranne ;
Finn, Robert D. ;
Fraser, Matthew ;
Gough, Julian ;
Haft, Daniel ;
Hulo, Nicolas ;
Kahn, Daniel ;
Kelly, Elizabeth ;
Letunic, Ivica ;
Lonsdale, David ;
Lopez, Rodrigo ;
Madera, Martin ;
Maslen, John ;
McAnulla, Craig ;
McDowall, Jennifer ;
McMenamin, Conor ;
Mi, Huaiyu ;
Mutowo-Muellenet, Prudence ;
Mulder, Nicola ;
Natale, Darren ;
Orengo, Christine ;
Pesseat, Sebastien ;
Punta, Marco ;
Quinn, Antony F. ;
Rivoire, Catherine ;
Sangrador-Vegas, Amaia ;
Selengut, Jeremy D. ;
Sigrist, Christian J. A. ;
Scheremetjew, Maxim ;
Tate, John ;
Thimmajanarthanan, Manjulapramila ;
Thomas, Paul D. ;
Wu, Cathy H. ;
Yeats, Corin ;
Yong, Siew-Yit .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D306-D312