Clustering protein environments for function prediction: finding PROSITE motifs in 3D

被引:24
作者
Yoon, Sungroh [1 ]
Ebert, Jessica C.
Chung, Eui-Young
De Micheli, Giovanni
Altman, Russ B.
机构
[1] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[2] Stanford Univ, Comp Syst Lab, Stanford, CA 94305 USA
[3] Yonsei Univ, Sch Elect & Elect Engn, Seoul 120749, South Korea
[4] Ecole Polytech Fed Lausanne, Swiss Fed Inst Technol, Ctr Integrated Syst, CH-1015 Lausanne, Switzerland
[5] Intel Corp, Santa Clara, CA 95054 USA
关键词
D O I
10.1186/1471-2105-8-S4-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. Results: We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. Conclusion: Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.
引用
收藏
页数:12
相关论文
共 25 条
[1]   Query3d: a new method for high-throughput analysis of functional residues in protein structures [J].
Ausiello, G ;
Via, A ;
Helmer-Citterich, M .
BMC BIOINFORMATICS, 2005, 6
[2]  
Bagley S C, 1995, Proc Int Conf Intell Syst Mol Biol, V3, P12
[3]   Conserved features in the active site of nonhomologous serine proteases [J].
Bagley, SC ;
Altman, RB .
FOLDING & DESIGN, 1996, 1 (05) :371-379
[4]  
BAGLEY SC, 1995, PROTEIN SCI, V4, P622
[5]   An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis [J].
Barker, JA ;
Thornton, JM .
BIOINFORMATICS, 2003, 19 (13) :1644-1649
[6]   The Protein Data Bank [J].
Berman, HM ;
Battistuz, T ;
Bhat, TN ;
Bluhm, WF ;
Bourne, PE ;
Burkhardt, K ;
Iype, L ;
Jain, S ;
Fagan, P ;
Marvin, J ;
Padilla, D ;
Ravichandran, V ;
Schneider, B ;
Thanki, N ;
Weissig, H ;
Westbrook, JD ;
Zardecki, C .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 :899-907
[7]   Enhanced functional annotation of protein sequences via the use of structural descriptors [J].
Di Gennaro, JA ;
Siew, N ;
Hoffman, BT ;
Zhang, L ;
Skolnick, J ;
Neilson, LI ;
Fetrow, JS .
JOURNAL OF STRUCTURAL BIOLOGY, 2001, 134 (2-3) :232-245
[8]   ArchDB: automated protein loop classification as a tool for structural genomics [J].
Espadaler, J ;
Fernandez-Fuentes, N ;
Hermoso, A ;
Querol, E ;
Aviles, FX ;
Sternberg, MJE ;
Oliva, B .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D185-D188
[9]   Classification of common functional loops of kinase super-families [J].
Fernandez-Fuentes, N ;
Hermoso, A ;
Espadaler, J ;
Querol, E ;
Aviles, FX ;
Oliva, B .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (03) :539-555
[10]   Automated protein function prediction - the genomic challenge [J].
Friedberg, Iddo .
BRIEFINGS IN BIOINFORMATICS, 2006, 7 (03) :225-242