Incorporating RNA-seq data into the zebrafish Ensembl genebuild

被引:79
作者
Collins, John E. [1 ]
White, Simon [1 ]
Searle, Stephen M. J. [1 ]
Stemple, Derek L. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, Cambs, England
基金
英国惠康基金;
关键词
EUKARYOTIC TRANSCRIPTOME; GENERATION; LANDSCAPE; REVEALS; YEAST;
D O I
10.1101/gr.137901.112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3'-end capture and sequencing protocol was developed to predict the 3' ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seqonly build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3' untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.
引用
收藏
页码:2067 / 2078
页数:12
相关论文
共 30 条
  • [21] SEQUENCING OF CDNA USING ANCHORED OLIGO DT PRIMERS
    THOMAS, MG
    HESSE, SA
    MCKIE, AT
    FARZANEH, F
    [J]. NUCLEIC ACIDS RESEARCH, 1993, 21 (16) : 3915 - 3916
  • [22] Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
    Trapnell, Cole
    Williams, Brian A.
    Pertea, Geo
    Mortazavi, Ali
    Kwan, Gordon
    van Baren, Marijke J.
    Salzberg, Steven L.
    Wold, Barbara J.
    Pachter, Lior
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (05) : 511 - U174
  • [23] Conserved Function of lincRNAs in Vertebrate Embryonic Development despite Rapid Sequence Evolution
    Ulitsky, Igor
    Shkumatava, Alena
    Jan, Calvin H.
    Sive, Hazel
    Bartel, David P.
    [J]. CELL, 2011, 147 (07) : 1537 - 1550
  • [24] Alternative isoform regulation in human tissue transcriptomes
    Wang, Eric T.
    Sandberg, Rickard
    Luo, Shujun
    Khrebtukova, Irina
    Zhang, Lu
    Mayr, Christine
    Kingsmore, Stephen F.
    Schroth, Gary P.
    Burge, Christopher B.
    [J]. NATURE, 2008, 456 (7221) : 470 - 476
  • [25] RNA-Seq: a revolutionary tool for transcriptomics
    Wang, Zhong
    Gerstein, Mark
    Snyder, Michael
    [J]. NATURE REVIEWS GENETICS, 2009, 10 (01) : 57 - 63
  • [26] Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution
    Wilhelm, Brian T.
    Marguerat, Samuel
    Watt, Stephen
    Schubert, Falk
    Wood, Valerie
    Goodhead, Ian
    Penkett, Christopher J.
    Rogers, Jane
    Bahler, Jurg
    [J]. NATURE, 2008, 453 (7199) : 1239 - U39
  • [27] The vertebrate genome annotation (Vega) database
    Wilming, L. G.
    Gilbert, J. G. R.
    Howe, K.
    Trevanion, S.
    Hubbard, T.
    Harrow, J. L.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D753 - D760
  • [28] Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing
    Yassour, Moran
    Kapian, Tommy
    Fraser, Hunter B.
    Levin, Joshua Z.
    Pfiffner, Jenna
    Adiconis, Xian
    Schroth, Gary
    Luo, Shujun
    Khrebtukova, Irina
    Gnirke, Andreas
    Nusbaum, Chad
    Thompson, Dawn-Anne
    Friedman, Nir
    Regev, Aviv
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (09) : 3264 - 3269
  • [29] Noncanonical transcript forms in yeast and their regulation during environmental stress
    Yoon, Oh Kyu
    Brem, Rachel B.
    [J]. RNA, 2010, 16 (06) : 1256 - 1267
  • [30] Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
    Zerbino, Daniel R.
    Birney, Ewan
    [J]. GENOME RESEARCH, 2008, 18 (05) : 821 - 829