Pfam 10 years on: 10 000 families and still growing

被引:89
作者
Sammut, Stephen John [2 ]
Finn, Robert D.
Bateman, Alex [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Malta, Msida, Malta
基金
英国惠康基金;
关键词
Pfam; protein families; classification; coverage; hidden Markov model;
D O I
10.1093/bib/bbn010
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10 000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72 of known protein sequences, but for proteins with known structure Pfam matches 95, which we believe represents the likely upper bound. Based on our analysis a further 28 000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.
引用
收藏
页码:210 / 219
页数:10
相关论文
共 35 条
[21]   CHOP proteins into structural domain-like fragments [J].
Liu, JF ;
Rost, B .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (03) :678-688
[22]   Improvement of the GenTHREADER method for genomic fold recognition [J].
McGuffin, LJ ;
Jones, DT .
BIOINFORMATICS, 2003, 19 (07) :874-881
[23]   SCOP - A STRUCTURAL CLASSIFICATION OF PROTEINS DATABASE FOR THE INVESTIGATION OF SEQUENCES AND STRUCTURES [J].
MURZIN, AG ;
BRENNER, SE ;
HUBBARD, T ;
CHOTHIA, C .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 247 (04) :536-540
[24]   Protein families and their evolution - A structural perspective [J].
Orengo, CA ;
Thornton, JM .
ANNUAL REVIEW OF BIOCHEMISTRY, 2005, 74 :867-900
[25]   The CATH database: an extended protein family resource for structural and functional genomics [J].
Pearl, FMG ;
Bennett, CF ;
Bray, JE ;
Harrison, AP ;
Martin, N ;
Shepherd, A ;
Sillitoe, I ;
Thornton, J ;
Orengo, CA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :452-455
[26]   EVEREST: a collection of evolutionary conserved protein domains [J].
Portugaly, Elon ;
Linial, Nathan ;
Linial, Michal .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D241-D246
[27]   Optimizing the stability of single-chain proteins by linker length and composition mutagenesis [J].
Robinson, CR ;
Sauer, RT .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :5929-5934
[28]   Did evolution leap to create the protein universe? [J].
Rost, B .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2002, 12 (03) :409-416
[29]   Structural biology sheds light on the puzzle genomic ORFans [J].
Siew, N ;
Fischer, D .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 342 (02) :369-373
[30]   Analysis of singleton ORFans in fully sequenced microbial genomes [J].
Siew, N ;
Fischer, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 (02) :241-251