Multi-domain proteins in the three kingdoms of life:: Orphan domains and other unassigned regions

被引:182
作者
Ekman, D [1 ]
Björklund, ÅK [1 ]
Frey-Skött, J [1 ]
Elofsson, A [1 ]
机构
[1] Stockholm Univ, Stockholm Bioinformat Ctr, SE-10691 Stockholm, Sweden
关键词
protein domains; multi-domain protein; comparative genomics; kingdoms of life; proteome;
D O I
10.1016/j.jmb.2005.02.007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. Here, we have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, MAS, in addition to Pfam-A and SCOP domains. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we show that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have approximately 65% multi-domain proteins, while the prokaryotes consist of approximately 40% multi-domain proteins. However, these numbers are strongly dependent on the exact choice of cut-off for domains in unassigned regions. In conclusion, all eukaryotes have similar fractions of multidomain proteins and disorder, whereas a high fraction of repeating domain is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions. (c) 2005 Published by Elsevier Ltd.
引用
收藏
页码:231 / 243
页数:13
相关论文
共 42 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Comparison of ARM and HEAT protein repeats [J].
Andrade, MA ;
Petosa, C ;
O'Donoghue, SI ;
Müller, CW ;
Bork, P .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 309 (01) :1-18
[3]  
Apic G, 2001, Bioinformatics, V17 Suppl 1, pS83
[4]   Domain combinations in archaeal, eubacterial and eukaryotic proteomes [J].
Apic, G ;
Gough, J ;
Teichmann, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (02) :311-325
[5]   Domain insertions in protein structures [J].
Aroul-Selvam, R ;
Hubbard, T ;
Sasidharan, R .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 338 (04) :633-641
[6]   Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation [J].
Bradley, P ;
Chivian, D ;
Meiler, J ;
Misura, KMS ;
Rohl, CA ;
Schief, WR ;
Wedemeyer, WJ ;
Schueler-Furman, O ;
Murphy, P ;
Schonbrun, J ;
Strauss, CEM ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :457-468
[7]   Evolutionary rate heterogeneity in proteins with long disordered regions [J].
Brown, CJ ;
Takayama, S ;
Campen, AM ;
Vise, P ;
Marshall, TW ;
Oldfield, CJ ;
Williams, CJ ;
Dunker, AK .
JOURNAL OF MOLECULAR EVOLUTION, 2002, 55 (01) :104-110
[8]  
CONSORTIUM TF, 2003, NUCLEIC ACIDS RES, V31, P172
[9]  
DOLINSKI K, 2004, METHOD ENZYMOL, V266, P554
[10]   A comparison of sequence and structure protein domain families as a basis for structural genomics [J].
Elofsson, A ;
Sonnhammer, ELL .
BIOINFORMATICS, 1999, 15 (06) :480-500