Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

被引:168
作者
Yassour, Moran [1 ,2 ,3 ]
Kapian, Tommy [1 ,4 ]
Fraser, Hunter B. [2 ,3 ]
Levin, Joshua Z. [2 ,3 ]
Pfiffner, Jenna [2 ,3 ]
Adiconis, Xian [2 ,3 ]
Schroth, Gary [5 ]
Luo, Shujun [5 ]
Khrebtukova, Irina [5 ]
Gnirke, Andreas [2 ,3 ]
Nusbaum, Chad [2 ,3 ]
Thompson, Dawn-Anne [2 ,3 ]
Friedman, Nir [1 ]
Regev, Aviv [2 ,3 ,6 ]
机构
[1] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel
[2] MIT, Broad Inst, Cambridge, MA 02142 USA
[3] Harvard, Cambridge, MA 02142 USA
[4] Hebrew Univ Jerusalem, Fac Med, Dept Mol Genet & Biotechnol, IL-91120 Jerusalem, Israel
[5] Illuminia Inc, Hayward, CA 94545 USA
[6] MIT, Dept Biol, Cambridge, MA 02142 USA
基金
美国国家卫生研究院;
关键词
computational biology; RNAseq; next generation sequencing; transcriptome profiling; Saccharomyces cerevisiae; SACCHAROMYCES-CEREVISIAE; YEAST GENOME; RESOLUTION; DISCOVERY;
D O I
10.1073/pnas.0812841106
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5' and 3' UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.
引用
收藏
页码:3264 / 3269
页数:6
相关论文
共 22 条
  • [1] Whole-genome re-sequencing
    Bentley, David R.
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) : 545 - 552
  • [2] Brachmann CB, 1998, YEAST, V14, P115
  • [3] SGD:: Saccharomyces Genome Database
    Cherry, JM
    Adler, C
    Ball, C
    Chervitz, SA
    Dwight, SS
    Hester, ET
    Jia, YK
    Juvik, G
    Roe, T
    Schroeder, M
    Weng, SA
    Botstein, D
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 73 - 79
  • [4] Stem cell transcriptome profiling via massive-scale mRNA sequencing
    Cloonan, Nicole
    Forrest, Alistair R. R.
    Kolle, Gabriel
    Gardiner, Brooke B. A.
    Faulkner, Geoffrey J.
    Brown, Mellissa K.
    Taylor, Darrin F.
    Steptoe, Anita L.
    Wani, Shivangi
    Bethel, Graeme
    Robertson, Alan J.
    Perkins, Andrew C.
    Bruce, Stephen J.
    Lee, Clarence C.
    Ranade, Swati S.
    Peckham, Heather E.
    Manning, Jonathan M.
    McKernan, Kevin J.
    Grimmond, Sean M.
    [J]. NATURE METHODS, 2008, 5 (07) : 613 - 619
  • [5] DALESSIO JM, 1988, NUCLEIC ACIDS RES, V16, P1999
  • [6] A high-resolution map of transcription in the yeast genome
    David, L
    Huber, W
    Granovskaia, M
    Toedling, J
    Palm, CJ
    Bofkin, L
    Jones, T
    Davis, RW
    Steinmetz, LM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (14) : 5320 - 5325
  • [7] Genome sequence of the human malaria parasite Plasmodium falciparum
    Gardner, MJ
    Hall, N
    Fung, E
    White, O
    Berriman, M
    Hyman, RW
    Carlton, JM
    Pain, A
    Nelson, KE
    Bowman, S
    Paulsen, IT
    James, K
    Eisen, JA
    Rutherford, K
    Salzberg, SL
    Craig, A
    Kyes, S
    Chan, MS
    Nene, V
    Shallom, SJ
    Suh, B
    Peterson, J
    Angiuoli, S
    Pertea, M
    Allen, J
    Selengut, J
    Haft, D
    Mather, MW
    Vaidya, AB
    Martin, DMA
    Fairlamb, AH
    Fraunholz, MJ
    Roos, DS
    Ralph, SA
    McFadden, GI
    Cummings, LM
    Subramanian, GM
    Mungall, C
    Venter, JC
    Carucci, DJ
    Hoffman, SL
    Newbold, C
    Davis, RW
    Fraser, CM
    Barrell, B
    [J]. NATURE, 2002, 419 (6906) : 498 - 511
  • [8] NUMBER AND DISTRIBUTION OF POLYADENYLATED RNA SEQUENCES IN YEAST
    HEREFORD, LM
    ROSBASH, M
    [J]. CELL, 1977, 10 (03) : 453 - 462
  • [9] Whole-genome sequencing and variant discovery in C-elegans
    Hillier, LaDeana W.
    Marth, Gabor T.
    Quinlan, Aaron R.
    Dooling, David
    Fewell, Ginger
    Barnett, Derek
    Fox, Paul
    Glasscock, Jarret I.
    Hickenbotham, Matthew
    Huang, Weichun
    Magrini, Vincent J.
    Richt, Ryan J.
    Sander, Sacha N.
    Stewart, Donald A.
    Stromberg, Michael
    Tsung, Eric F.
    Wylie, Todd
    Schedl, Tim
    Wilson, Richard K.
    Mardis, Elaine R.
    [J]. NATURE METHODS, 2008, 5 (02) : 183 - 188
  • [10] Dissecting the regulatory circuitry of a eukaryotic genome
    Holstege, FCP
    Jennings, EG
    Wyrick, JJ
    Lee, TI
    Hengartner, CJ
    Green, MR
    Golub, TR
    Lander, ES
    Young, RA
    [J]. CELL, 1998, 95 (05) : 717 - 728