The challenge of increasing Pfam coverage of the human proteome

被引:21
作者
Mistry, Jaina [1 ,2 ]
Coggill, Penny [1 ,2 ]
Eberhardt, Ruth Y. [1 ,2 ]
Deiana, Antonio [3 ]
Giansanti, Andrea [3 ,4 ]
Finn, Robert D. [5 ]
Bateman, Alex [1 ]
Punta, Marco [1 ,2 ]
机构
[1] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[2] Sanger Inst, Cambridge CB10 1SA, England
[3] Univ Roma La Sapienza, Dept Phys, I-00185 Rome, Italy
[4] Ist Nazl Fis Nucl, Sez Roma1, I-00185 Rome, Italy
[5] HHMI Janelia Farm Res Campus, Ashburn, VA 20147 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2013年
基金
英国惠康基金; 英国生物技术与生命科学研究理事会;
关键词
INTRINSICALLY DISORDERED PROTEINS; COMPARATIVE GENOMICS; DATABASE; DOMAIN; COMPLEXITY; UNFOLDOMICS; ANNOTATION; SEQUENCES; RESOURCE; HOMOLOGY;
D O I
10.1093/database/bat023
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far we have progressed in this endeavour. Ninety per cent of proteins in the human proteome matched at least one of 5494 manually curated Pfam-A families. In contrast, human residue coverage by Pfam-A families was <45%, with 9418 automatically generated Pfam-B families adding a further 10%. Even after excluding predicted signal peptide regions and short regions (<50 consecutive residues) unlikely to harbour new families, for similar to 38% of the human protein residues, there was no information in Pfam about conservation and evolutionary relationship with other protein regions. This uncovered portion of the human proteome was found to be distributed over almost 25 000 distinct protein regions. Comparison with proteins in the UniProtKB database suggested that the human regions that exhibited similarity to thousands of other sequences were often either divergent elements or N- or C-terminal extensions of existing families. Thirty-four per cent of regions, on the other hand, matched fewer than 100 sequences in UniProtKB. Most of these did not appear to share any relationship with existing Pfam-A families, suggesting that thousands of new families would need to be generated to cover them. Also, these latter regions were particularly rich in amino acid compositional bias such as the one associated with intrinsic disorder. This could represent a significant obstacle toward their inclusion into new Pfam families. Based on these observations, a major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families. New families will also be built, prioritizing those that have been experimentally functionally characterized.
引用
收藏
页数:10
相关论文
共 36 条
[1]   Data growth and its impact on the SCOP database: new developments [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chandonia, John-Marc ;
Brenner, Steven E. ;
Hubbard, Tim J. P. ;
Chothia, Cyrus ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D419-D425
[2]   Intrinsically disordered proteins: regulation and disease [J].
Babu, M. Madan ;
van der Lee, Robin ;
de Groot, Natalia Sanchez ;
Gsponer, Joerg .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2011, 21 (03) :432-440
[3]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[4]   Bringing order to protein disorder through comparative genomics and genetic interactions [J].
Bellay, Jeremy ;
Han, Sangjo ;
Michaut, Magali ;
Kim, TaeHyung ;
Costanzo, Michael ;
Andrews, Brenda J. ;
Boone, Charles ;
Bader, Gary D. ;
Myers, Chad L. ;
Kim, Philip M. .
GENOME BIOLOGY, 2011, 12 (02)
[5]   Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions [J].
Chen, JW ;
Romero, P ;
Uversky, VN ;
Dunker, AK .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (04) :879-887
[6]   Saccharomyces Genome Database: the genomics resource of budding yeast [J].
Cherry, J. Michael ;
Hong, Eurie L. ;
Amundsen, Craig ;
Balakrishnan, Rama ;
Binkley, Gail ;
Chan, Esther T. ;
Christie, Karen R. ;
Costanzo, Maria C. ;
Dwight, Selina S. ;
Engel, Stacia R. ;
Fisk, Dianna G. ;
Hirschman, Jodi E. ;
Hitz, Benjamin C. ;
Karra, Kalpana ;
Krieger, Cynthia J. ;
Miyasato, Stuart R. ;
Nash, Rob S. ;
Park, Julie ;
Skrzypek, Marek S. ;
Simison, Matt ;
Weng, Shuai ;
Wong, Edith D. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D700-D705
[7]   MobiDB: a comprehensive database of intrinsic protein disorder annotations [J].
Di Domenico, Tomas ;
Walsh, Ian ;
Martin, Alberto J. M. ;
Tosatto, Silvio C. E. .
BIOINFORMATICS, 2012, 28 (15) :2080-2081
[8]   The UniProt-GO Annotation database in 2011 [J].
Dimmer, Emily C. ;
Huntley, Rachael P. ;
Alam-Faruque, Yasmin ;
Sawford, Tony ;
O'Donovan, Claire ;
Martin, Maria J. ;
Bely, Benoit ;
Browne, Paul ;
Chan, Wei Mun ;
Eberhardt, Ruth ;
Gardner, Michael ;
Laiho, Kati ;
Legge, Duncan ;
Magrane, Michele ;
Pichler, Klemens ;
Poggioli, Diego ;
Sehra, Harminder ;
Auchincloss, Andrea ;
Axelsen, Kristian ;
Blatter, Marie-Claude ;
Boutet, Emmanuel ;
Braconi-Quintaje, Silvia ;
Breuza, Lionel ;
Bridge, Alan ;
Coudert, Elizabeth ;
Estreicher, Anne ;
Famiglietti, Livia ;
Ferro-Rojas, Serenella ;
Feuermann, Marc ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Keller, Guillaume ;
Lemercier, Phillippe ;
Lieberherr, Damien ;
Masson, Patrick ;
Moinat, Madelaine ;
Pedruzzi, Ivo ;
Poux, Sylvain ;
Rivoire, Catherine ;
Roechert, Bernd ;
Schneider, Michael ;
Stutz, Andre ;
Sundaram, Shyamala ;
Tognolli, Michael ;
Bougueleret, Lydie .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D565-D570
[9]   The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins [J].
Dosztányi, Z ;
Csizmók, V ;
Tompa, P ;
Simon, I .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 347 (04) :827-839
[10]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74