Incorporating RNA-seq data into the zebrafish Ensembl genebuild

被引：79

作者：

Collins, John E. ^{[1
]}

White, Simon ^{[1
]}

Searle, Stephen M. J. ^{[1
]}

Stemple, Derek L. ^{[1
]}

机构：

[1] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, Cambs, England

来源：

GENOME RESEARCH | 2012年 / 22卷 / 10期

基金：

英国惠康基金;

关键词：

EUKARYOTIC TRANSCRIPTOME; GENERATION; LANDSCAPE; REVEALS; YEAST;

D O I：

10.1101/gr.137901.112

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seqonly build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.

引用

页码：2067 / 2078

页数：12

共 30 条

[1] The Universal Protein Resource (UniProt) in 2010
Apweiler, Rolf
Martin, Maria Jesus
O'Donovan, Claire
Magrane, Michele
Alam-Faruque, Yasmin
Antunes, Ricardo
Barrell, Daniel
Bely, Benoit
Bingley, Mark
Binns, David
Bower, Lawrence
Browne, Paul
Chan, Wei Mun
Dimmer, Emily
Eberhardt, Ruth
Fedotov, Alexander
Foulger, Rebecca
Garavelli, John
Huntley, Rachael
Jacobsen, Julius
Kleen, Michael
Laiho, Kati
Leinonen, Rasko
Legge, Duncan
Lin, Quan
Liu, Wudong
Luo, Jie
Orchard, Sandra
Patient, Samuel
Poggioli, Diego
Pruess, Manuela
Corbett, Matt
di Martino, Giuseppe
Donnelly, Mike
van Rensburg, Pieter
Bairoch, Amos
Bougueleret, Lydie
Xenarios, Ioannis
Altairac, Severine
Auchincloss, Andrea
Argoud-Puy, Ghislaine
Axelsen, Kristian
Baratin, Delphine
Blatter, Marie-Claude
Boeckmann, Brigitte
Bolleman, Jerven
Bollondi, Laurent
Boutet, Emmanuel
Quintaje, Silvia Braconi
Breuza, Lionel
[J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D142 - D148
[2] Accurate whole human genome sequencing using reversible terminator chemistry
Bentley, David R.
Balasubramanian, Shankar
Swerdlow, Harold P.
Smith, Geoffrey P.
Milton, John
Brown, Clive G.
Hall, Kevin P.
Evers, Dirk J.
Barnes, Colin L.
Bignell, Helen R.
Boutell, Jonathan M.
Bryant, Jason
Carter, Richard J.
Cheetham, R. Keira
Cox, Anthony J.
Ellis, Darren J.
Flatbush, Michael R.
Gormley, Niall A.
Humphray, Sean J.
Irving, Leslie J.
Karbelashvili, Mirian S.
Kirk, Scott M.
Li, Heng
Liu, Xiaohai
Maisinger, Klaus S.
Murray, Lisa J.
Obradovic, Bojan
Ost, Tobias
Parkinson, Michael L.
Pratt, Mark R.
Rasolonjatovo, Isabelle M. J.
Reed, Mark T.
Rigatti, Roberto
Rodighiero, Chiara
Ross, Mark T.
Sabot, Andrea
Sankar, Subramanian V.
Scally, Aylwyn
Schroth, Gary P.
Smith, Mark E.
Smith, Vincent P.
Spiridou, Anastassia
Torrance, Peta E.
Tzonev, Svilen S.
Vermaas, Eric H.
Walter, Klaudia
Wu, Xiaolin
Zhang, Lu
Alam, Mohammed D.
Anastasi, Carole
[J]. NATURE, 2008, 456 (7218) : 53 - 59
[3] Stem cell transcriptome profiling via massive-scale mRNA sequencing
Cloonan, Nicole
Forrest, Alistair R. R.
Kolle, Gabriel
Gardiner, Brooke B. A.
Faulkner, Geoffrey J.
Brown, Mellissa K.
Taylor, Darrin F.
Steptoe, Anita L.
Wani, Shivangi
Bethel, Graeme
Robertson, Alan J.
Perkins, Andrew C.
Bruce, Stephen J.
Lee, Clarence C.
Ranade, Swati S.
Peckham, Heather E.
Manning, Jonathan M.
McKernan, Kevin J.
Grimmond, Sean M.
[J]. NATURE METHODS, 2008, 5 (07) : 613 - 619
[4] The Ensembl automatic gene annotation system
Curwen, V
Eyras, E
Andrews, TD
Clarke, L
Mongin, E
Searle, SMJ
Clamp, M
[J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
[5] Annotating genomes with massive-scale RNA sequencing
Denoeud, France
Aury, Jean-Marc
Da Silva, Corinne
Noel, Benjamin
Rogier, Odile
Delledonne, Massimo
Morgante, Michele
Valle, Giorgio
Wincker, Patrick
Scarpelli, Claude
Jaillon, Olivier
Artiguenave, Francois
[J]. GENOME BIOLOGY, 2008, 9 (12)
[6] Ensembl 2011
Flicek, Paul
Amode, M. Ridwan
Barrell, Daniel
Beal, Kathryn
Brent, Simon
Chen, Yuan
Clapham, Peter
Coates, Guy
Fairley, Susan
Fitzgerald, Stephen
Gordon, Leo
Hendrix, Maurice
Hourlier, Thibaut
Johnson, Nathan
Kaehaeri, Andreas
Keefe, Damian
Keenan, Stephen
Kinsella, Rhoda
Kokocinski, Felix
Kulesha, Eugene
Larsson, Pontus
Longden, Ian
McLaren, William
Overduin, Bert
Pritchard, Bethan
Riat, Harpreet Singh
Rios, Daniel
Ritchie, Graham R. S.
Ruffier, Magali
Schuster, Michael
Sobral, Daniel
Spudich, Giulietta
Tang, Y. Amy
Trevanion, Stephen
Vandrovcova, Jana
Vilella, Albert J.
White, Simon
Wilder, Steven P.
Zadissa, Amonida
Zamora, Jorge
Aken, Bronwen L.
Birney, Ewan
Cunningham, Fiona
Dunham, Ian
Durbin, Richard
Fernandez-Suarez, Xose M.
Herrero, Javier
Hubbard, Tim J. P.
Parker, Anne
Proctor, Glenn
[J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D800 - D806
[7] Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs
Guttman, Mitchell
Garber, Manuel
Levin, Joshua Z.
Donaghey, Julie
Robinson, James
Adiconis, Xian
Fan, Lin
Koziol, Magdalena J.
Gnirke, Andreas
Nusbaum, Chad
Rinn, John L.
Lander, Eric S.
Regev, Aviv
[J]. NATURE BIOTECHNOLOGY, 2010, 28 (05) : 503 - U166
[8] Fast and accurate short read alignment with Burrows-Wheeler transform
Li, Heng
Durbin, Richard
[J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
[9] The Landscape of C. elegans 3′UTRs
Mangone, Marco
Manoharan, Arun Prasad
Thierry-Mieg, Danielle
Thierry-Mieg, Jean
Han, Ting
Mackowiak, Sebastian D.
Mis, Emily
Zegar, Charles
Gutwein, Michelle R.
Khivansara, Vishal
Attie, Oliver
Chen, Kevin
Salehi-Ashtiani, Kourosh
Vidal, Marc
Harkins, Timothy T.
Bouffard, Pascal
Suzuki, Yutaka
Sugano, Sumio
Kohara, Yuji
Rajewsky, Nikolaus
Piano, Fabio
Gunsalus, Kristin C.
Kim, John K.
[J]. SCIENCE, 2010, 329 (5990) : 432 - 435
[10] Mapping and quantifying mammalian transcriptomes by RNA-Seq
Mortazavi, Ali
Williams, Brian A.
McCue, Kenneth
Schaeffer, Lorian
Wold, Barbara
[J]. NATURE METHODS, 2008, 5 (07) : 621 - 628

← 1 2 3 →