Incorporating RNA-seq data into the zebrafish Ensembl genebuild

被引:79
作者
Collins, John E. [1 ]
White, Simon [1 ]
Searle, Stephen M. J. [1 ]
Stemple, Derek L. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, Cambs, England
基金
英国惠康基金;
关键词
EUKARYOTIC TRANSCRIPTOME; GENERATION; LANDSCAPE; REVEALS; YEAST;
D O I
10.1101/gr.137901.112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seqonly build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.
引用
收藏
页码:2067 / 2078
页数:12
相关论文
共 30 条
  • [1] The Universal Protein Resource (UniProt) in 2010
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Barrell, Daniel
    Bely, Benoit
    Bingley, Mark
    Binns, David
    Bower, Lawrence
    Browne, Paul
    Chan, Wei Mun
    Dimmer, Emily
    Eberhardt, Ruth
    Fedotov, Alexander
    Foulger, Rebecca
    Garavelli, John
    Huntley, Rachael
    Jacobsen, Julius
    Kleen, Michael
    Laiho, Kati
    Leinonen, Rasko
    Legge, Duncan
    Lin, Quan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Poggioli, Diego
    Pruess, Manuela
    Corbett, Matt
    di Martino, Giuseppe
    Donnelly, Mike
    van Rensburg, Pieter
    Bairoch, Amos
    Bougueleret, Lydie
    Xenarios, Ioannis
    Altairac, Severine
    Auchincloss, Andrea
    Argoud-Puy, Ghislaine
    Axelsen, Kristian
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    Bolleman, Jerven
    Bollondi, Laurent
    Boutet, Emmanuel
    Quintaje, Silvia Braconi
    Breuza, Lionel
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D142 - D148
  • [2] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [3] Stem cell transcriptome profiling via massive-scale mRNA sequencing
    Cloonan, Nicole
    Forrest, Alistair R. R.
    Kolle, Gabriel
    Gardiner, Brooke B. A.
    Faulkner, Geoffrey J.
    Brown, Mellissa K.
    Taylor, Darrin F.
    Steptoe, Anita L.
    Wani, Shivangi
    Bethel, Graeme
    Robertson, Alan J.
    Perkins, Andrew C.
    Bruce, Stephen J.
    Lee, Clarence C.
    Ranade, Swati S.
    Peckham, Heather E.
    Manning, Jonathan M.
    McKernan, Kevin J.
    Grimmond, Sean M.
    [J]. NATURE METHODS, 2008, 5 (07) : 613 - 619
  • [4] The Ensembl automatic gene annotation system
    Curwen, V
    Eyras, E
    Andrews, TD
    Clarke, L
    Mongin, E
    Searle, SMJ
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
  • [5] Annotating genomes with massive-scale RNA sequencing
    Denoeud, France
    Aury, Jean-Marc
    Da Silva, Corinne
    Noel, Benjamin
    Rogier, Odile
    Delledonne, Massimo
    Morgante, Michele
    Valle, Giorgio
    Wincker, Patrick
    Scarpelli, Claude
    Jaillon, Olivier
    Artiguenave, Francois
    [J]. GENOME BIOLOGY, 2008, 9 (12)
  • [6] Ensembl 2011
    Flicek, Paul
    Amode, M. Ridwan
    Barrell, Daniel
    Beal, Kathryn
    Brent, Simon
    Chen, Yuan
    Clapham, Peter
    Coates, Guy
    Fairley, Susan
    Fitzgerald, Stephen
    Gordon, Leo
    Hendrix, Maurice
    Hourlier, Thibaut
    Johnson, Nathan
    Kaehaeri, Andreas
    Keefe, Damian
    Keenan, Stephen
    Kinsella, Rhoda
    Kokocinski, Felix
    Kulesha, Eugene
    Larsson, Pontus
    Longden, Ian
    McLaren, William
    Overduin, Bert
    Pritchard, Bethan
    Riat, Harpreet Singh
    Rios, Daniel
    Ritchie, Graham R. S.
    Ruffier, Magali
    Schuster, Michael
    Sobral, Daniel
    Spudich, Giulietta
    Tang, Y. Amy
    Trevanion, Stephen
    Vandrovcova, Jana
    Vilella, Albert J.
    White, Simon
    Wilder, Steven P.
    Zadissa, Amonida
    Zamora, Jorge
    Aken, Bronwen L.
    Birney, Ewan
    Cunningham, Fiona
    Dunham, Ian
    Durbin, Richard
    Fernandez-Suarez, Xose M.
    Herrero, Javier
    Hubbard, Tim J. P.
    Parker, Anne
    Proctor, Glenn
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D800 - D806
  • [7] Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs
    Guttman, Mitchell
    Garber, Manuel
    Levin, Joshua Z.
    Donaghey, Julie
    Robinson, James
    Adiconis, Xian
    Fan, Lin
    Koziol, Magdalena J.
    Gnirke, Andreas
    Nusbaum, Chad
    Rinn, John L.
    Lander, Eric S.
    Regev, Aviv
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (05) : 503 - U166
  • [8] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
  • [9] The Landscape of C. elegans 3′UTRs
    Mangone, Marco
    Manoharan, Arun Prasad
    Thierry-Mieg, Danielle
    Thierry-Mieg, Jean
    Han, Ting
    Mackowiak, Sebastian D.
    Mis, Emily
    Zegar, Charles
    Gutwein, Michelle R.
    Khivansara, Vishal
    Attie, Oliver
    Chen, Kevin
    Salehi-Ashtiani, Kourosh
    Vidal, Marc
    Harkins, Timothy T.
    Bouffard, Pascal
    Suzuki, Yutaka
    Sugano, Sumio
    Kohara, Yuji
    Rajewsky, Nikolaus
    Piano, Fabio
    Gunsalus, Kristin C.
    Kim, John K.
    [J]. SCIENCE, 2010, 329 (5990) : 432 - 435
  • [10] Mapping and quantifying mammalian transcriptomes by RNA-Seq
    Mortazavi, Ali
    Williams, Brian A.
    McCue, Kenneth
    Schaeffer, Lorian
    Wold, Barbara
    [J]. NATURE METHODS, 2008, 5 (07) : 621 - 628