Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays

被引:79
作者
Agarwal, Ashish [1 ]
Koppstein, David [1 ]
Rozowsky, Joel [1 ]
Sboner, Andrea [1 ]
Habegger, Lukas [1 ]
Hillier, LaDeana W. [3 ]
Sasidharan, Rajkumar [1 ]
Reinke, Valerie [4 ]
Waterston, Robert H. [3 ]
Gerstein, Mark [1 ,2 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[3] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[4] Yale Univ, Sch Med, Dept Genet, New Haven, CT 06520 USA
来源
BMC GENOMICS | 2010年 / 11卷
关键词
GENE-EXPRESSION; EUKARYOTIC TRANSCRIPTOME; MICROARRAY; IDENTIFICATION; NORMALIZATION; ANNOTATION; PREDICTION;
D O I
10.1186/1471-2164-11-383
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs. Results: Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of C. elegans. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center. Conclusions: Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Armour CD, 2009, NAT METHODS, V6, P647, DOI [10.1038/NMETH.1360, 10.1038/nmeth.1360]
  • [2] Global identification of human transcribed sequences with genome tiling arrays
    Bertone, P
    Stolc, V
    Royce, TE
    Rozowsky, JS
    Urban, AE
    Zhu, XW
    Rinn, JL
    Tongprasit, W
    Samanta, M
    Weissman, S
    Gerstein, M
    Snyder, M
    [J]. SCIENCE, 2004, 306 (5705) : 2242 - 2246
  • [3] Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays
    Bloom, Joshua S.
    Khan, Zia
    Kruglyak, Leonid
    Singh, Mona
    Caudy, Amy A.
    [J]. BMC GENOMICS, 2009, 10
  • [4] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [5] A multivariate prediction model for microarray cross-hybridization
    Chen, YA
    Chou, CC
    Lu, XH
    Slate, EH
    Peck, K
    Wu, WY
    Voit, EO
    Almeida, JS
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [6] Stem cell transcriptome profiling via massive-scale mRNA sequencing
    Cloonan, Nicole
    Forrest, Alistair R. R.
    Kolle, Gabriel
    Gardiner, Brooke B. A.
    Faulkner, Geoffrey J.
    Brown, Mellissa K.
    Taylor, Darrin F.
    Steptoe, Anita L.
    Wani, Shivangi
    Bethel, Graeme
    Robertson, Alan J.
    Perkins, Andrew C.
    Bruce, Stephen J.
    Lee, Clarence C.
    Ranade, Swati S.
    Peckham, Heather E.
    Manning, Jonathan M.
    McKernan, Kevin J.
    Grimmond, Sean M.
    [J]. NATURE METHODS, 2008, 5 (07) : 613 - 619
  • [7] A high-resolution map of transcription in the yeast genome
    David, L
    Huber, W
    Granovskaia, M
    Toedling, J
    Palm, CJ
    Bofkin, L
    Jones, T
    Davis, RW
    Steinmetz, LM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (14) : 5320 - 5325
  • [8] Fischer Ernest A, 2006, AMIA Annu Symp Proc, P921
  • [9] The real life of pseudogenes
    Gerstein, Mark
    Zheng, Deyou
    [J]. SCIENTIFIC AMERICAN, 2006, 295 (02) : 48 - 55
  • [10] Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals
    Guttman, Mitchell
    Amit, Ido
    Garber, Manuel
    French, Courtney
    Lin, Michael F.
    Feldser, David
    Huarte, Maite
    Zuk, Or
    Carey, Bryce W.
    Cassady, John P.
    Cabili, Moran N.
    Jaenisch, Rudolf
    Mikkelsen, Tarjei S.
    Jacks, Tyler
    Hacohen, Nir
    Bernstein, Bradley E.
    Kellis, Manolis
    Regev, Aviv
    Rinn, John L.
    Lander, Eric S.
    [J]. NATURE, 2009, 458 (7235) : 223 - 227