Loose ends: almost one in five human genes still have unresolved coding status

被引:42
作者
Abascal, Federico [1 ]
Juan, David [2 ]
Jungreis, Irwin [3 ,4 ]
Martinez, Laura [5 ]
Rigau, Maria [6 ]
Manuel Rodriguez, Jose [7 ]
Vazquez, Jesus [7 ]
Tress, Michael L. [5 ]
机构
[1] Wellcome Trust Sanger Inst, Hinxton CB10 1SA, Cambs, England
[2] Univ Pompeu Fabra, Comparat Genom Lab, Inst Biol Evolut, Barcelona, Spain
[3] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[4] Broad Inst MIT & Harvard, Cambridge, MA USA
[5] Spanish Natl Canc Res Ctr, Bioinformat Unit, Madrid, Spain
[6] Barcelona Supercomp Ctr, Computat Biol Life Sci Grp, Barcelona, Spain
[7] Ctr Nacl Invest Cardiovasc, Cardiovasc Prote Lab, Madrid, Spain
基金
美国国家卫生研究院;
关键词
INTEGRATED MAP; PREDICTION; TOPOLOGY; PROTEOME; DATABASE; SEQUENCE; ISOFORMS; GENCODE; DOG;
D O I
10.1093/nar/gky587
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
引用
收藏
页码:7070 / 7084
页数:15
相关论文
共 55 条
[1]
Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms [J].
Abyzov, Alexej ;
Li, Shantao ;
Kim, Daniel Rhee ;
Mohiyuddin, Marghoob ;
Stuetz, Adrian M. ;
Parrish, Nicholas F. ;
Mu, Xinmeng Jasmine ;
Clark, Wyatt ;
Chen, Ken ;
Hurles, Matthew ;
Korbel, Jan O. ;
Lam, Hugo Y. K. ;
Lee, Charles ;
Gerstein, Mark B. .
NATURE COMMUNICATIONS, 2015, 6
[2]
Agarwala R, 2018, NUCLEIC ACIDS RES, V46, pD8, DOI [10.1093/nar/gks1189, 10.1093/nar/gkx1095, 10.1093/nar/gkq1172]
[3]
Ensembl 2017 [J].
Aken, Bronwen L. ;
Achuthan, Premanand ;
Akanni, Wasiu ;
Amode, M. Ridwan ;
Bernsdorff, Friederike ;
Bhai, Jyothish ;
Billis, Konstantinos ;
Carvalho-Silva, Denise ;
Cummins, Carla ;
Clapham, Peter ;
Gil, Laurent ;
Giron, Carlos Garcia ;
Gordon, Leo ;
Hourlier, Thibaut ;
Hunt, Sarah E. ;
Janacek, Sophie H. ;
Juettemann, Thomas ;
Keenan, Stephen ;
Laird, Matthew R. ;
Lavidas, Ilias ;
Maurel, Thomas ;
McLaren, William ;
Moore, Benjamin ;
Murphy, Daniel N. ;
Nag, Rishi ;
Newman, Victoria ;
Nuhn, Michael ;
Ong, Chuang Kee ;
Parker, Anne ;
Patricio, Mateus ;
Riat, Harpreet Singh ;
Sheppard, Daniel ;
Sparrow, Helen ;
Taylor, Kieron ;
Thormann, Anja ;
Vullo, Alessandro ;
Walts, Brandon ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Kostadima, Myrto ;
Martin, Fergal J. ;
Muffato, Matthieu ;
Perry, Emily ;
Ruffier, Magali ;
Staines, Daniel M. ;
Trevanion, Stephen J. ;
Cunningham, Fiona ;
Yates, Andrew ;
Zerbino, Daniel R. ;
Flicek, Paul .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D635-D642
[4]
An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[5]
UniProt: the universal protein knowledgebase [J].
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Cowley, Andrew ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Castro, Leyla Garcia ;
Figueira, Luis ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzalez, Daniel ;
Hatton-Ellis, Emma ;
Li, Weizhong ;
Liu, Wudong ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Nightingale, Andrew ;
Palka, Barbara ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Volynkin, Vladimir ;
Wardell, Tony ;
Warner, Kate ;
Watkins, Xavier ;
Zaru, Rossana ;
Zellner, Hermann ;
Xenarios, Ioannis .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D158-D169
[6]
Devising a Consensus Framework for Validation of Novel Human Coding Loci [J].
Bruford, Elspeth A. ;
Lane, Lydie ;
Harrow, Jennifer .
JOURNAL OF PROTEOME RESEARCH, 2015, 14 (12) :4945-4948
[7]
Quantifying the mechanisms of domain gain in animal proteins [J].
Buljan, Marija ;
Frankish, Adam ;
Bateman, Alex .
GENOME BIOLOGY, 2010, 11 (07)
[8]
A phylogenomic study of human, dog, and mouse [J].
Cannarozzi, Gina ;
Schneider, Adrian ;
Gonnet, Gaston .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (01) :9-14
[9]
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse [J].
Church, Deanna M. ;
Goodstadt, Leo ;
Hillier, LaDeana W. ;
Zody, Michael C. ;
Goldstein, Steve ;
She, Xinwe ;
Bult, Carol J. ;
Agarwala, Richa ;
Cherry, Joshua L. ;
DiCuccio, Michael ;
Hlavina, Wratko ;
Kapustin, Yuri ;
Meric, Peter ;
Maglott, Donna ;
Birtle, Zoe ;
Marques, Ana C. ;
Graves, Tina ;
Zhou, Shiguo ;
Teague, Brian ;
Potamousis, Konstantinos ;
Churas, Christopher ;
Place, Michael ;
Herschleb, Jill ;
Runnheim, Ron ;
Forrest, Daniel ;
Amos-Landgraf, James ;
Schwartz, David C. ;
Cheng, Ze ;
Lindblad-Toh, Kerstin ;
Eichler, Evan E. ;
Ponting, Chris P. .
PLOS BIOLOGY, 2009, 7 (05)
[10]
Distinguishing protein-coding and noncoding genes in the human genome [J].
Clamp, Michele ;
Fry, Ben ;
Kamal, Mike ;
Xie, Xiaohui ;
Cuff, James ;
Lin, Michael F. ;
Kellis, Manolis ;
Lindblad-Toh, Kerstin ;
Lander, Eric S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (49) :19428-19433