GENCODE: The reference human genome annotation for The ENCODE Project

被引:3289
作者
Harrow, Jennifer [1 ]
Frankish, Adam [1 ]
Gonzalez, Jose M. [1 ]
Tapanari, Electra [1 ]
Diekhans, Mark [2 ]
Kokocinski, Felix [1 ]
Aken, Bronwen L. [1 ]
Barrell, Daniel [1 ]
Zadissa, Amonida [1 ]
Searle, Stephen [1 ]
Barnes, If [1 ]
Bignell, Alexandra [1 ]
Boychenko, Veronika [1 ]
Hunt, Toby [1 ]
Kay, Mike [1 ]
Mukherjee, Gaurab [1 ]
Rajan, Jeena [1 ]
Despacio-Reyes, Gloria [1 ]
Saunders, Gary [1 ]
Steward, Charles [1 ]
Harte, Rachel [2 ]
Lin, Michael [3 ]
Howald, Cedric [4 ]
Tanzer, Andrea [5 ,6 ]
Derrien, Thomas [4 ]
Chrast, Jacqueline [4 ]
Walters, Nathalie [4 ]
Balasubramanian, Suganthi [7 ]
Pei, Baikang [7 ]
Tress, Michael [8 ]
Manuel Rodriguez, Jose [8 ]
Ezkurdia, Iakes [8 ]
van Baren, Jeltje [9 ]
Brent, Michael [9 ]
Haussler, David [2 ]
Kellis, Manolis [3 ]
Valencia, Alfonso [8 ]
Reymond, Alexandre [4 ]
Gerstein, Mark [7 ]
Guigo, Roderic [5 ,6 ]
Hubbard, Tim J. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[3] MIT, Cambridge, MA 02139 USA
[4] Univ Lausanne, Ctr Integrat Genom, CH-1015 Lausanne, Switzerland
[5] Ctr Genom Regulat CRG, Barcelona 08003, Catalonia, Spain
[6] UPF, Barcelona 08003, Catalonia, Spain
[7] Yale Univ, New Haven, CT 06520 USA
[8] Spanish Natl Canc Res Ctr CNIO, E-28029 Madrid, Spain
[9] Ctr Genome Sci & Syst Biol, St Louis, MO 63130 USA
基金
英国惠康基金; 美国国家科学基金会; 美国国家卫生研究院;
关键词
GENE-EXPRESSION; NONCODING RNAS; IDENTIFICATION; SEQUENCES; REVEALS; PSEUDOGENE; PREDICTION; TOPOLOGY; TRANSCRIPTION; COMPLEXITY;
D O I
10.1101/gr.135350.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (IncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
引用
收藏
页码:1760 / 1774
页数:15
相关论文
共 75 条
  • [31] GENCODE: producing a reference annotation for ENCODE
    Harrow, Jennifer
    Denoeud, France
    Frankish, Adam
    Reymond, Alexandre
    Chen, Chao-Kung
    Chrast, Jacqueline
    Lagarde, Julien
    Gilbert, James Gr
    Storey, Roy
    Swarbreck, David
    Rossier, Colette
    Ucla, Catherine
    Hubbard, Tim
    Antonarakis, Stylianos E.
    Guigo, Roderic
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [32] Using semantic web rules to reason on an ontology of pseudogenes
    Holford, Matthew E.
    Khurana, Ekta
    Cheung, Kei-Hoi
    Gerstein, Mark
    [J]. BIOINFORMATICS, 2010, 26 (12) : i71 - i78
  • [33] Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome
    Howald, Cedric
    Tanzer, Andrea
    Chrast, Jacqueline
    Kokocinski, Felix
    Derrien, Thomas
    Walters, Nathalie
    Gonzalez, Jose M.
    Frankish, Adam
    Aken, Bronwen L.
    Hourlier, Thibaut
    Vogel, Jan-Hinnerk
    White, Simon
    Searle, Stephen
    Harrow, Jennifer
    Hubbard, Tim J.
    Guigo, Roderic
    Reymond, Alexandre
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1698 - 1710
  • [34] International network of cancer genome projects
    Hudson, Thomas J.
    Anderson, Warwick
    Aretz, Axel
    Barker, Anna D.
    Bell, Cindy
    Bernabe, Rosa R.
    Bhan, M. K.
    Calvo, Fabien
    Eerola, Iiro
    Gerhard, Daniela S.
    Guttmacher, Alan
    Guyer, Mark
    Hemsley, Fiona M.
    Jennings, Jennifer L.
    Kerr, David
    Klatt, Peter
    Kolar, Patrik
    Kusuda, Jun
    Lane, David P.
    Laplace, Frank
    Lu, Youyong
    Nettekoven, Gerd
    Ozenberger, Brad
    Peterson, Jane
    Rao, T. S.
    Remacle, Jacques
    Schafer, Alan J.
    Shibata, Tatsuhiro
    Stratton, Michael R.
    Vockley, Joseph G.
    Watanabe, Koichi
    Yang, Huanming
    Yuen, Matthew M. F.
    Knoppers, M.
    Bobrow, Martin
    Cambon-Thomsen, Anne
    Dressler, Lynn G.
    Dyke, Stephanie O. M.
    Joly, Yann
    Kato, Kazuto
    Kennedy, Karen L.
    Nicolas, Pilar
    Parker, Michael J.
    Rial-Sebbag, Emmanuelle
    Romeo-Casabona, Carlos M.
    Shaw, Kenna M.
    Wallace, Susan
    Wiesner, Georgia L.
    Zeps, Nikolajs
    Lichter, Peter
    [J]. NATURE, 2010, 464 (7291) : 993 - 998
  • [35] Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes
    Ingolia, Nicholas T.
    Lareau, Liana F.
    Weissman, Jonathan S.
    [J]. CELL, 2011, 147 (04) : 789 - 802
  • [36] Genome-wide computational identification and manual annotation of human long noncoding RNA genes
    Jia, Hui
    Osak, Maureen
    Bogu, Gireesh K.
    Stanton, Lawrence W.
    Johnson, Rory
    Lipovich, Leonard
    [J]. RNA, 2010, 16 (08) : 1478 - 1487
  • [37] Improving the accuracy of transmembrane protein topology prediction using evolutionary information
    Jones, David T.
    [J]. BIOINFORMATICS, 2007, 23 (05) : 538 - 544
  • [38] A combined transmembrane topology and signal peptide prediction method
    Käll, L
    Krogh, A
    Sonnhammer, ELL
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 338 (05) : 1027 - 1036
  • [39] RNA maps reveal new RNA classes and a possible function for pervasive transcription
    Kapranov, Philipp
    Cheng, Jill
    Dike, Sujit
    Nix, David A.
    Duttagupta, Radharani
    Willingham, Aarron T.
    Stadler, Peter F.
    Hertel, Jana
    Hackermueller, Joerg
    Hofacker, Ivo L.
    Bell, Ian
    Cheung, Evelyn
    Drenkow, Jorg
    Dumais, Erica
    Patel, Sandeep
    Helt, Gregg
    Ganesh, Madhavan
    Ghosh, Srinka
    Piccolboni, Antonio
    Sementchenko, Victor
    Tammana, Hari
    Gingeras, Thomas R.
    [J]. SCIENCE, 2007, 316 (5830) : 1484 - 1488
  • [40] Antisense transcription in the mammalian transcriptome
    Katayama, S
    Tomaru, Y
    Kasukawa, T
    Waki, K
    Nakanishi, M
    Nakamura, M
    Nishida, H
    Yap, CC
    Suzuki, M
    Kawai, J
    Suzuki, H
    Carninci, P
    Hayashizaki, Y
    Wells, C
    Frith, M
    Ravasi, T
    Pang, KC
    Hallinan, J
    Mattick, J
    Hume, DA
    Lipovich, L
    Batalov, S
    Engström, PG
    Mizuno, Y
    Faghihi, MA
    Sandelin, A
    Chalk, AM
    Mottagui-Tabar, S
    Liang, Z
    Lenhard, B
    Wahlestedt, C
    [J]. SCIENCE, 2005, 309 (5740) : 1564 - 1566