The Ensembl gene annotation system

被引:523
作者
Aken, Bronwen L. [1 ,2 ]
Ayling, Sarah [2 ,3 ]
Barrell, Daniel [1 ,2 ,4 ]
Clarke, Laura [2 ,5 ]
Curwen, Valery [2 ]
Fairley, Susan [2 ,5 ]
Banet, Julio Fernandez [2 ,6 ]
Billis, Konstantinos [1 ,2 ]
Giron, Carlos Garcia [1 ,2 ]
Hourlier, Thibaut [1 ,2 ]
Howe, Kevin [2 ,5 ]
Kahari, Andreas [2 ,7 ]
Kokocinski, Felix [2 ]
Martin, Fergal J. [1 ,2 ]
Murphy, Daniel N. [1 ,2 ]
Nag, Rishi [1 ,2 ]
Ruffier, Magali [2 ,5 ]
Schuster, Michael [1 ,8 ]
Tang, Y. Amy [2 ,5 ]
Vogel, Jan-Hinnerk [2 ,9 ]
White, Simon [2 ,10 ]
Zadissa, Amonida [2 ,5 ]
Flicek, Paul [1 ,2 ]
Searle, Stephen M. J. [2 ]
机构
[1] European Bioinformat Inst Wellcome Genome Campus, European Mol Biol Lab, Cambridge CB10 1SD, England
[2] Wellcome Trust Sanger Inst Wellcome Genome Campus, Cambridge CB10 1SA, England
[3] Genome Anal Ctr, Norwich Res Pk, Norwich NR4 7UH, Norfolk, England
[4] Eagle Genom Ltd, Babraham Res Campus, Cambridge CB22 3AT, England
[5] European Bioinformat Inst, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[6] Pfizer Inc, 10646 Sci Ctr Dr, San Diego, CA 92121 USA
[7] Uppsala Univ, Inst Cell Molekylarbiol, Husargatan 3, S-75237 Uppsala, Sweden
[8] Austrian Acad Sci, CeMM Res Ctr Mol Med, A-1090 Vienna, Austria
[9] Genentech Inc, 1 DNAWay, San Francisco, CA 94080 USA
[10] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2016年
基金
英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
GENOME PROVIDES INSIGHTS; SEQUENCE; REVEALS; EVOLUTION; ALIGNMENT; DATABASE; ZEBRAFISH; IDENTIFICATION; COMPLEXITY; GENERATION;
D O I
10.1093/database/baw093
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.
引用
收藏
页数:19
相关论文
共 106 条
  • [21] Facing growth in the European Nucleotide Archive
    Cochrane, Guy
    Alako, Blaise
    Amid, Clara
    Bower, Lawrence
    Cerdeno-Tarraga, Ana
    Cleland, Iain
    Gibson, Richard
    Goodgame, Neil
    Jang, Mikyung
    Kay, Simon
    Leinonen, Rasko
    Lin, Xiu
    Lopez, Rodrigo
    McWilliam, Hamish
    Oisel, Arnaud
    Pakseresht, Nima
    Pallreddy, Swapna
    Park, Youngmi
    Plaister, Sheila
    Radhakrishnan, Rajesh
    Riviere, Stephane
    Rossello, Marc
    Senf, Alexander
    Silvester, Nicole
    Smirnov, Dmitriy
    ten Hoopen, Petra
    Toribio, Ana
    Vaughan, Daniel
    Zalunin, Vadim
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D30 - D35
  • [22] Incorporating RNA-seq data into the zebrafish Ensembl genebuild
    Collins, John E.
    White, Simon
    Searle, Stephen M. J.
    Stemple, Derek L.
    [J]. GENOME RESEARCH, 2012, 22 (10) : 2067 - 2078
  • [23] The Ensembl automatic gene annotation system
    Curwen, V
    Eyras, E
    Andrews, TD
    Clarke, L
    Mongin, E
    Searle, SMJ
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
  • [24] Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis
    Dalloul, Rami A.
    Long, Julie A.
    Zimin, Aleksey V.
    Aslam, Luqman
    Beal, Kathryn
    Blomberg, Le Ann
    Bouffard, Pascal
    Burt, David W.
    Crasta, Oswald
    Crooijmans, Richard P. M. A.
    Cooper, Kristal
    Coulombe, Roger A.
    De, Supriyo
    Delany, Mary E.
    Dodgson, Jerry B.
    Dong, Jennifer J.
    Evans, Clive
    Frederickson, Karin M.
    Flicek, Paul
    Florea, Liliana
    Folkerts, Otto
    Groenen, Martien A. M.
    Harkins, Tim T.
    Herrero, Javier
    Hoffmann, Steve
    Megens, Hendrik-Jan
    Jiang, Andrew
    de Jong, Pieter
    Kaiser, Pete
    Kim, Heebal
    Kim, Kyu-Won
    Kim, Sungwon
    Langenberger, David
    Lee, Mi-Kyung
    Lee, Taeheon
    Mane, Shrinivasrao
    Marcais, Guillaume
    Marz, Manja
    McElroy, Audrey P.
    Modise, Thero
    Nefedov, Mikhail
    Notredame, Cedric
    Paton, Ian R.
    Payne, William S.
    Pertea, Geo
    Prickett, Dennis
    Puiu, Daniela
    Qioa, Dan
    Raineri, Emanuele
    Ruffier, Magali
    [J]. PLOS BIOLOGY, 2010, 8 (09)
  • [25] Computational identification of promoters and first exons in the human genome
    Davuluri, RV
    Grosse, I
    Zhang, MQ
    [J]. NATURE GENETICS, 2001, 29 (04) : 412 - 417
  • [26] DENMAN RB, 1993, BIOTECHNIQUES, V15, P1090
  • [27] Computational detection and location of transcription start sites in mammalian genomic DNA
    Down, TA
    Hubbard, TJP
    [J]. GENOME RESEARCH, 2002, 12 (03) : 458 - 461
  • [28] A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure
    Eddy, SR
    [J]. BMC BIOINFORMATICS, 2002, 3 (1)
  • [29] The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution
    Elsik, Christine G.
    Tellam, Ross L.
    Worley, Kim C.
    Gibbs, Richard A.
    Abatepaulo, Antonio R. R.
    Abbey, Colette A.
    Adelson, David L.
    Aerts, Jan
    Ahola, Virpi
    Alexander, Lee
    Alioto, Tyler
    Almeida, Iassudara G.
    Amadio, Ariel F.
    Anatriello, Elen
    Antonarakis, Stylianos E.
    Anzola, Juan M.
    Astashyn, Alex
    Bahadue, Suria M.
    Baldwin, Cynthia L.
    Barris, Wes
    Baxter, Rebecca
    Bell, Stephanie Nicole
    Bennett, Anna K.
    Bennett, Gary L.
    Biase, Fernando H.
    Boldt, Clayton R.
    Bradley, Daniel G.
    Brinkman, Fiona S. L.
    Brinkmeyer-Langford, Candice L.
    Brown, Wendy C.
    Brownstein, Michael J.
    Buhay, Christian
    Caetano, Alexandre R.
    Camara, Francisco
    Carroll, Jeffrey A.
    Carvalho, Wanessa A.
    Casey, Theresa
    Cervelatti, Elaine P.
    Chack, Joseph
    Chacko, Elsa
    Chandrabose, Mimi M.
    Chapin, Jennifer E.
    Chapple, Charles E.
    Chen, Hsiu-Chuan
    Chen, Lin
    Cheng, Ye
    Cheng, Ze
    Childers, Christopher P.
    Chitko-McKown, Carol G.
    Chiu, Readman
    [J]. SCIENCE, 2009, 324 (5926) : 522 - 528
  • [30] Pfam: the protein families database
    Finn, Robert D.
    Bateman, Alex
    Clements, Jody
    Coggill, Penelope
    Eberhardt, Ruth Y.
    Eddy, Sean R.
    Heger, Andreas
    Hetherington, Kirstie
    Holm, Liisa
    Mistry, Jaina
    Sonnhammer, Erik L. L.
    Tate, John
    Punta, Marco
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D222 - D230