Finding the active genes in deep RNA-seq gene expression studies

被引:180
作者
Hart, Traver [1 ]
Komori, H. Kiyomi [2 ]
LaMere, Sarah [2 ]
Podshivalova, Katie [2 ]
Salomon, Daniel R. [2 ]
机构
[1] Univ Toronto, Banting & Best Dept Med Res, Donnelly Ctr, Toronto, ON, Canada
[2] Scripps Res Inst, Dept Mol & Expt Med, La Jolla, CA 92037 USA
来源
BMC GENOMICS | 2013年 / 14卷
基金
美国国家卫生研究院;
关键词
QUANTIFICATION; TRANSCRIPTOME;
D O I
10.1186/1471-2164-14-778
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Early application of second-generation sequencing technologies to transcript quantitation (RNA-seq) has hinted at a vast mammalian transcriptome, including transcripts from nearly all known genes, which might be fully measured only by ultradeep sequencing. Subsequent studies suggested that low-abundance transcripts might be the result of technical or biological noise rather than active transcripts; moreover, most RNA-seq experiments did not provide enough read depth to generate high-confidence estimates of gene expression for low-abundance transcripts. As a result, the community adopted several heuristics for RNA-seq analysis, most notably an arbitrary expression threshold of 0.3 - 1 FPKM for downstream analysis. However, advances in RNA-seq library preparation, sequencing technology, and informatic analysis have addressed many of the systemic sources of uncertainty and undermined the assumptions that drove the adoption of these heuristics. We provide an updated view of the accuracy and efficiency of RNA-seq experiments, using genomic data from large-scale studies like the ENCODE project to provide orthogonal information against which to validate our conclusions. Results: We show that a human cell's transcriptome can be divided into active genes carrying out the work of the cell and other genes that are likely the by-products of biological or experimental noise. We use ENCODE data on chromatin state to show that ultralow-expression genes are predominantly associated with repressed chromatin; we provide a novel normalization metric, zFPKM, that identifies the threshold between active and background gene expression; and we show that this threshold is robust to experimental and analytical variations. Conclusions: The zFPKM normalization method accurately separates the biologically relevant genes in a cell, which are associated with active promoters, from the ultralow-expression noisy genes that have repressed promoters. A read depth of twenty to thirty million mapped reads allows high-confidence quantitation of genes expressed at this threshold, providing important guidance for the design of RNA-seq studies of gene expression. Moreover, we offer an example for using extensive ENCODE chromatin state information to validate RNA-seq analysis pipelines.
引用
收藏
页数:7
相关论文
共 20 条
[1]   Comprehensive genomic characterization defines human glioblastoma genes and core pathways [J].
Chin, L. ;
Meyerson, M. ;
Aldape, K. ;
Bigner, D. ;
Mikkelsen, T. ;
VandenBerg, S. ;
Kahn, A. ;
Penny, R. ;
Ferguson, M. L. ;
Gerhard, D. S. ;
Getz, G. ;
Brennan, C. ;
Taylor, B. S. ;
Winckler, W. ;
Park, P. ;
Ladanyi, M. ;
Hoadley, K. A. ;
Verhaak, R. G. W. ;
Hayes, D. N. ;
Spellman, Paul T. ;
Absher, D. ;
Weir, B. A. ;
Ding, L. ;
Wheeler, D. ;
Lawrence, M. S. ;
Cibulskis, K. ;
Mardis, E. ;
Zhang, Jinghui ;
Wilson, R. K. ;
Donehower, L. ;
Wheeler, D. A. ;
Purdom, E. ;
Wallis, J. ;
Laird, P. W. ;
Herman, J. G. ;
Schuebel, K. E. ;
Weisenberger, D. J. ;
Baylin, S. B. ;
Schultz, N. ;
Yao, Jun ;
Wiedemeyer, R. ;
Weinstein, J. ;
Sander, C. ;
Gibbs, R. A. ;
Gray, J. ;
Kucherlapati, R. ;
Lander, E. S. ;
Myers, R. M. ;
Perou, C. M. ;
McLendon, Roger .
NATURE, 2008, 455 (7216) :1061-1068
[2]   Mapping and analysis of chromatin state dynamics in nine human cell types [J].
Ernst, Jason ;
Kheradpour, Pouya ;
Mikkelsen, Tarjei S. ;
Shoresh, Noam ;
Ward, Lucas D. ;
Epstein, Charles B. ;
Zhang, Xiaolan ;
Wang, Li ;
Issner, Robbyn ;
Coyne, Michael ;
Ku, Manching ;
Durham, Timothy ;
Kellis, Manolis ;
Bernstein, Bradley E. .
NATURE, 2011, 473 (7345) :43-U52
[3]   GENCODE: The reference human genome annotation for The ENCODE Project [J].
Harrow, Jennifer ;
Frankish, Adam ;
Gonzalez, Jose M. ;
Tapanari, Electra ;
Diekhans, Mark ;
Kokocinski, Felix ;
Aken, Bronwen L. ;
Barrell, Daniel ;
Zadissa, Amonida ;
Searle, Stephen ;
Barnes, If ;
Bignell, Alexandra ;
Boychenko, Veronika ;
Hunt, Toby ;
Kay, Mike ;
Mukherjee, Gaurab ;
Rajan, Jeena ;
Despacio-Reyes, Gloria ;
Saunders, Gary ;
Steward, Charles ;
Harte, Rachel ;
Lin, Michael ;
Howald, Cedric ;
Tanzer, Andrea ;
Derrien, Thomas ;
Chrast, Jacqueline ;
Walters, Nathalie ;
Balasubramanian, Suganthi ;
Pei, Baikang ;
Tress, Michael ;
Manuel Rodriguez, Jose ;
Ezkurdia, Iakes ;
van Baren, Jeltje ;
Brent, Michael ;
Haussler, David ;
Kellis, Manolis ;
Valencia, Alfonso ;
Reymond, Alexandre ;
Gerstein, Mark ;
Guigo, Roderic ;
Hubbard, Tim J. .
GENOME RESEARCH, 2012, 22 (09) :1760-1774
[4]   Method for improved Illumina sequencing library preparation using NuGEN Ovation RNA-Seq System [J].
Head, Steven R. ;
Komori, H. Kiyomi ;
Hart, G. Traver ;
Shimashita, John ;
Schaffer, Lana ;
Salomon, Daniel R. ;
Ordoukhanian, Phillip T. .
BIOTECHNIQUES, 2011, 50 (03) :177-+
[5]   RNA sequencing reveals two major classes of gene expression levels in metazoan cells [J].
Hebenstreit, Daniel ;
Fang, Miaoqing ;
Gu, Muxin ;
Charoensawan, Varodom ;
van Oudenaarden, Alexander ;
Teichmann, Sarah A. .
MOLECULAR SYSTEMS BIOLOGY, 2011, 7
[6]   Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data [J].
Jothi, Raja ;
Cuddapah, Suresh ;
Barski, Artem ;
Cui, Kairong ;
Zhao, Keji .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16) :5221-5231
[7]   Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling [J].
Labaj, Pawel P. ;
Leparc, German G. ;
Linggi, Bryan E. ;
Markillie, Lye Meng ;
Wiley, H. Steven ;
Kreil, David P. .
BIOINFORMATICS, 2011, 27 (13) :I383-I391
[8]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[9]   Transcriptome and genome sequencing uncovers functional variation in humans [J].
Lappalainen, Tuuli ;
Sammeth, Michael ;
Friedlaender, Marc R. ;
't Hoen, Peter A. C. ;
Monlong, Jean ;
Rivas, Manuel A. ;
Gonzalez-Porta, Mar ;
Kurbatova, Natalja ;
Griebel, Thasso ;
Ferreira, Pedro G. ;
Barann, Matthias ;
Wieland, Thomas ;
Greger, Liliana ;
van Iterson, Maarten ;
Almloef, Jonas ;
Ribeca, Paolo ;
Pulyakhina, Irina ;
Esser, Daniela ;
Giger, Thomas ;
Tikhonov, Andrew ;
Sultan, Marc ;
Bertier, Gabrielle ;
MacArthur, Daniel G. ;
Lek, Monkol ;
Lizano, Esther ;
Buermans, Henk P. J. ;
Padioleau, Ismael ;
Schwarzmayr, Thomas ;
Karlberg, Olof ;
Ongen, Halit ;
Kilpinen, Helena ;
Beltran, Sergi ;
Gut, Marta ;
Kahlem, Katja ;
Amstislavskiy, Vyacheslav ;
Stegle, Oliver ;
Pirinen, Matti ;
Montgomery, Stephen B. ;
Donnelly, Peter ;
McCarthy, Mark I. ;
Flicek, Paul ;
Strom, Tim M. ;
Lehrach, Hans ;
Schreiber, Stefan ;
Sudbrak, Ralf ;
Carracedo, Angel ;
Antonarakis, Stylianos E. ;
Haesler, Robert ;
Syvaenen, Ann-Christine ;
Van Ommen, Gert-Jan .
NATURE, 2013, 501 (7468) :506-511
[10]   RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J].
Li, Bo ;
Dewey, Colin N. .
BMC BIOINFORMATICS, 2011, 12