Characterization and prediction of residues determining protein functional specificity

被引:93
作者
Capra, John A. [1 ]
Singh, Mona [1 ]
机构
[1] Princeton Univ, Dept Comp Sci, Lewis Sigler Inst Integrat Gen, Princeton, NJ 08540 USA
关键词
D O I
10.1093/bioinformatics/btn214
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each proteins particular function-al specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. Results: We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolution-ary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs.
引用
收藏
页码:1473 / 1480
页数:8
相关论文
共 40 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], P 23 INT C MACH LEAR
[3]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[4]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[5]   Analysis of catalytic residues in enzyme active sites [J].
Bartlett, GJ ;
Porter, CT ;
Borkakoti, N ;
Thornton, JM .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 324 (01) :105-121
[6]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]   Automated protein subfamily identification and classification [J].
Brown, Duncan P. ;
Krishnamurthy, Nandini ;
Sjoelander, Kimmen .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (08) :1526-1538
[8]   A gold standard set of mechanistically diverse enzyme superfamilies [J].
Brown, SD ;
Gerlt, JA ;
Seffernick, JL ;
Babbitt, PC .
GENOME BIOLOGY, 2006, 7 (01)
[9]   Predicting functionally important residues from sequence conservation [J].
Capra, John A. ;
Singh, Mona .
BIOINFORMATICS, 2007, 23 (15) :1875-1882
[10]   A METHOD TO PREDICT FUNCTIONAL RESIDUES IN PROTEINS [J].
CASARI, G ;
SANDER, C ;
VALENCIA, A .
NATURE STRUCTURAL BIOLOGY, 1995, 2 (02) :171-178