Prediction of Interactiveness Between Small Molecules and Enzymes by Combining Gene Ontology and Compound Similarity

被引:26
作者
Chen, Lei [2 ,3 ]
Qian, Ziliang [4 ]
Fen, Kaiyan [5 ]
Cai, Yudong [1 ]
机构
[1] Shanghai Univ, Inst Syst Biol, Shanghai 200444, Peoples R China
[2] E China Normal Univ, Shanghai Key Lab Trustworthy Comp, Shanghai 200062, Peoples R China
[3] Fudan Univ, Ctr Computat Syst Biol, Shanghai 200433, Peoples R China
[4] Chinese Acad Sci, Dept Combinator & Geometry, CAS MPG Partner Inst Computat Biol, Shanghai Inst Biol Sci, Shanghai 200031, Peoples R China
[5] Univ Manchester, Sch Med, Div Imaging Sci, Manchester M13 9PT, Lancs, England
关键词
molecule-enzyme couple; compound similarity; functional domain composition; gene ontology; blast; metabolic pathway; Nearest neighbor algorithm (NNA); simple majority voting system; SUPPORT VECTOR MACHINES; PROTEIN LOCALIZATION; CLASSIFICATION; LIGAND; GO; LOCATIONS; DATABASE;
D O I
10.1002/jcc.21467
中图分类号
O6 [化学];
学科分类号
070301 [无机化学];
摘要
Determination of whether a small organic molecule interacts with an enzyme can help to understand the molecular and cellular functions of organisms, and the metabolic pathways. In this research, we present a prediction model, by combining compound similarity and enzyme similarity, to predict the interactiveness between small molecules and enzymes. A dataset consisting of 2859 positive couples of small molecule and enzyme and 286,056 negative couples was employed. Compound similarity is a measurement of how similar two small molecules are, proposed by Hattori et al., J Am Chem Soc 2003, 125, 11853 which can be availed at http://www.genome.jp/ligand-bin/search_compound, while enzyme similarity was obtained by three ways, they are blast method, using gene ontology items and functional domain composition. Then a new distance between a pair of couples was established and nearest neighbor algorithm (NNA) was employed to predict the interactiveness of enzymes and small molecules. A data distribution strategy was adopted to get a better data balance between the positive samples and the negative samples during training the prediction model, by singling out one-fourth couples as testing samples and dividing the rest data into seven training datasets-the rest positive samples were added into each training dataset while only the negative samples were divided. In this way, seven NNAs were built. Finally, simple majority voting system was applied to integrate these seven models to predict the testing dataset, which was demonstrated to have better prediction results than using any single prediction model. As a result, the highest overall prediction accuracy achieved 97.30%. (C) 2009 Wiley Periodicals, Inc. J Comput Chem 31: 1766-1776, 2010
引用
收藏
页码:1766 / 1776
页数:11
相关论文
共 48 条
[1]
BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]
The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[3]
Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]
Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[5]
Bayes affinity fingerprints improve retrieval rates in virtual screening and define orthogonal bioactivity space: When are multitarget drugs a feasible concept? [J].
Bender, Andreas ;
Jenkins, Jeremy L. ;
Glick, Meir ;
Deng, Zhan ;
Nettles, James H. ;
Davies, John W. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) :2445-2456
[6]
Bishop CM., 1995, NEURAL NETWORKS PATT
[7]
FINDING ALL CLIQUES OF AN UNDIRECTED GRAPH [H] [J].
BRON, C ;
KERBOSCH, J .
COMMUNICATIONS OF THE ACM, 1973, 16 (09) :575-577
[8]
Protein function classification via support vector machine approach [J].
Cai, CZ ;
Wang, WL ;
Sun, LZ ;
Chen, YZ .
MATHEMATICAL BIOSCIENCES, 2003, 185 (02) :111-122
[9]
CAI YD, 2007, METABOLIC PATHWAYMOD, P1
[10]
Prediction of compounds' biological function (metabolic pathways) based on functional group composition [J].
Cai, Yu-Dong ;
Qian, Ziliang ;
Lu, Lin ;
Feng, Kai-Yan ;
Meng, Xin ;
Niu, Bing ;
Zhao, Guo-Dong ;
Lu, Wen-Cong .
MOLECULAR DIVERSITY, 2008, 12 (02) :131-137