Clustering of protein domains for functional and evolutionary studies

被引:5
作者
Goldstein, Pavle [2 ]
Zucko, Jurica [1 ,5 ]
Vujaklija, Dusica [3 ]
Krisko, Anita [4 ,7 ]
Hranueli, Daslav [5 ]
Long, Paul F. [6 ]
Etchebest, Catherine [8 ]
Basrak, Bojan [2 ]
Cullum, John [1 ]
机构
[1] Univ Kaiserslautern, Dept Genet, D-67653 Kaiserslautern, Germany
[2] Univ Zagreb, Dept Math, Zagreb 10000, Croatia
[3] Rudjer Boskovic Inst, Dept Mol Biol, Zagreb 10000, Croatia
[4] Mediterranean Inst Life Sci, Split 21000, Croatia
[5] Univ Zagreb, Fac Food Technol & Biotechnol, Zagreb 10000, Croatia
[6] Univ London, Sch Pharm, London WC1N 1AX, England
[7] Univ Paris 05, Fac Med, INSERM, U571, F-75730 Paris 15, France
[8] Univ Paris 07, INSERM, U726, Equipe Bioinformat Genom & Mol, F-75251 Paris 05, France
关键词
MULTIPLE SEQUENCE ALIGNMENT; SUBSTRATE-SPECIFICITY; CRYSTAL-STRUCTURE; GENE CLUSTERS; PREDICTION; STEREOSPECIFICITY; TRANSACYLASE; RESIDUES; MATRICES; PROGRAM;
D O I
10.1186/1471-2105-10-335
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.
引用
收藏
页数:11
相关论文
共 41 条
[1]
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]
[Anonymous], EXPASY PROT SERV
[3]
NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases [J].
Ansari, MZ ;
Yadav, G ;
Gokhale, RS ;
Mohanty, D .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W405-W413
[4]
The Universal Protein Resource (UniProt) 2009 [J].
Bairoch, Amos ;
Consortium, UniProt ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Ciapina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Roggli, Paula Duek ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Kappler, Thomas ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D169-D174
[5]
Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkh121, 10.1093/nar/gkr1065, 10.1093/nar/gkp985]
[6]
Conserved amino acid residues correlating with ketoreductase stereospecificity in modular polyketicle synthases [J].
Caffrey, P .
CHEMBIOCHEM, 2003, 4 (07) :654-657
[7]
Stereospecificity of ketoreductase domains of the 6-deoxyerythronolide B synthase [J].
Castonguay, Roselyne ;
He, Weiguo ;
Chen, Alice Y. ;
Khosla, Chaitan ;
Cane, David E. .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2007, 129 (44) :13758-13769
[8]
Biosynthesis of polyketide synthase extender units [J].
Chan, Yolande A. ;
Podevels, Angela M. ;
Kevany, Brian M. ;
Thomas, Michael G. .
NATURAL PRODUCT REPORTS, 2009, 26 (01) :90-114
[9]
Dayhoff MO, 1978, ATLAS PROTEIN SEQ S3, V5, P345
[10]
Active-site residue, domain and module swaps in modular polyketide synthases [J].
Del Vecchio, F ;
Petkovic, H ;
Kendrew, SG ;
Low, L ;
Wilkinson, B ;
Lill, R ;
Cortés, J ;
Rudd, BAM ;
Staunton, J ;
Leadlay, PF .
JOURNAL OF INDUSTRIAL MICROBIOLOGY & BIOTECHNOLOGY, 2003, 30 (08) :489-494