RNA-seq: technical variability and sampling

被引:195
作者
McIntyre, Lauren M. [1 ]
Lopiano, Kenneth K. [2 ]
Morse, Alison M. [1 ]
Amin, Victor [1 ]
Oberg, Ann L. [3 ]
Young, Linda J. [2 ]
Nuzhdin, Sergey V. [4 ]
机构
[1] Univ Florida, Dept Mol Genet & Microbiol, Gainesville, FL 32610 USA
[2] Univ Florida, Dept Stat, Gainesville, FL 32611 USA
[3] Mayo Clin, Div Biomed Stat & Informat, Dept Hlth Sci Res, Rochester, MN USA
[4] Univ So Calif, Los Angeles, CA 90089 USA
来源
BMC GENOMICS | 2011年 / 12卷
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; STATISTICAL-METHODS; NORMALIZATION; ALIGNMENT; DESIGN; LENGTH;
D O I
10.1186/1471-2164-12-293
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results: In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions: Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.
引用
收藏
页数:13
相关论文
共 44 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
[Anonymous], 1907, BIOMETRIKA, DOI DOI 10.2307/2331633
[3]   3′ tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer [J].
Asmann, Yan W. ;
Klee, Eric W. ;
Thompson, E. Aubrey ;
Perez, Edith A. ;
Middha, Sumit ;
Oberg, Ann L. ;
Therneau, Terry M. ;
Smith, David I. ;
Poland, Gregory A. ;
Wieben, Eric D. ;
Kocher, Jean-Pierre A. .
BMC GENOMICS, 2009, 10 :531
[4]   Statistical Design and Analysis of RNA Sequencing Data [J].
Auer, Paul L. ;
Doerge, R. W. .
GENETICS, 2010, 185 (02) :405-U32
[5]   ISOLATION OF NEW RIBOZYMES FROM A LARGE POOL OF RANDOM SEQUENCES [J].
BARTEL, DP ;
SZOSTAK, JW .
SCIENCE, 1993, 261 (5127) :1411-1418
[6]   Population genomics:: Whole-genome analysis of polymorphism and divergence in Drosophila simulans [J].
Begun, David J. ;
Holloway, Alisha K. ;
Stevens, Kristian ;
Hillier, LaDeana W. ;
Poh, Yu-Ping ;
Hahn, Matthew W. ;
Nista, Phillip M. ;
Jones, Corbin D. ;
Kern, Andrew D. ;
Dewey, Colin N. ;
Pachter, Lior ;
Myers, Eugene ;
Langley, Charles H. .
PLOS BIOLOGY, 2007, 5 (11) :2534-2559
[7]   MISLEADING STATISTICS - ERRORS IN TEXTBOOKS, SOFTWARE AND MANUALS [J].
BLAND, JM ;
ALTMAN, DG .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 1988, 17 (02) :245-247
[8]   STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT [J].
BLAND, JM ;
ALTMAN, DG .
LANCET, 1986, 1 (8476) :307-310
[9]   DNA condensation [J].
Bloomfield, VA .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) :334-341
[10]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193