Exploiting protein structure data to explore the evolution of protein function and biological complexity

被引:18
作者
Marsden, RL [1 ]
Ranea, JAG [1 ]
Sillero, A [1 ]
Redfern, O [1 ]
Yeats, C [1 ]
Maibaum, M [1 ]
Lee, D [1 ]
Addou, S [1 ]
Reeves, GA [1 ]
Dallman, TJ [1 ]
Orengo, CA [1 ]
机构
[1] UCL, Dept Biochem & Mol Biol, London WC1E 6BT, England
关键词
protein structure; function; evolution; genome analysis;
D O I
10.1098/rstb.2005.1801
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.
引用
收藏
页码:425 / 440
页数:16
相关论文
共 85 条
[1]   The Biomolecular Interaction Network Database and related tools 2005 update [J].
Alfarano, C ;
Andrade, CE ;
Anthony, K ;
Bahroos, N ;
Bajec, M ;
Bantoft, K ;
Betel, D ;
Bobechko, B ;
Boutilier, K ;
Burgess, E ;
Buzadzija, K ;
Cavero, R ;
D'Abreo, C ;
Donaldson, I ;
Dorairajoo, D ;
Dumontier, MJ ;
Dumontier, MR ;
Earles, V ;
Farrall, R ;
Feldman, H ;
Garderman, E ;
Gong, Y ;
Gonzaga, R ;
Grytsan, V ;
Gryz, E ;
Gu, V ;
Haldorsen, E ;
Halupa, A ;
Haw, R ;
Hrvojic, A ;
Hurrell, L ;
Isserlin, R ;
Jack, F ;
Juma, F ;
Khan, A ;
Kon, T ;
Konopinsky, S ;
Le, V ;
Lee, E ;
Ling, S ;
Magidin, M ;
Moniakis, J ;
Montojo, J ;
Moore, S ;
Muskat, B ;
Ng, I ;
Paraiso, JP ;
Parker, B ;
Pintilie, G ;
Pirone, R .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D418-D424
[2]   Predictions without templates: New folds, secondary structure, and contacts in CASP5 [J].
Aloy, P ;
Stark, A ;
Hadley, S ;
Russell, RB .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :436-456
[3]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[4]   Domain combinations in archaeal, eubacterial and eukaryotic proteomes [J].
Apic, G ;
Gough, J ;
Teichmann, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (02) :311-325
[5]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[7]  
Benson Dennis A, 2005, Nucleic Acids Res, V33, pD34
[8]   A practical and robust sequence search strategy for structural genomics target selection [J].
Bray, JE ;
Marsden, RL ;
Rison, SCG ;
Savchenko, A ;
Edwards, AM ;
Thornton, JM ;
Orengo, CA .
BIOINFORMATICS, 2004, 20 (14) :2288-2295
[9]   A tour of structural genomics [J].
Brenner, SE .
NATURE REVIEWS GENETICS, 2001, 2 (10) :801-809
[10]   Gene3D: Structural assignment for whole genes and genomes using the CATH domain structure database [J].
Buchan, DWA ;
Shepherd, AJ ;
Lee, D ;
Pearl, FMG ;
Rison, SCG ;
Thornton, JM ;
Orengo, CA .
GENOME RESEARCH, 2002, 12 (03) :503-514