A maximum common substructure-based algorithm for searching and predicting drug-like compounds

被引:136
作者
Cao, Yiqun [1 ]
Jiang, Tao [1 ]
Girke, Thomas [2 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[2] Univ Calif Riverside, Dept Bot & Plant Sci, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btn186
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The prediction of biologically active compounds is of great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods in this area focus on measuring the structural similarities between chemical structures. However, traditional similarity measures are often too rigid or consider only global similarities between structures. The maximum common substructure (MCS) approach provides a more promising and flexible alternative for predicting bioactive compounds. Results: In this article, a new backtracking algorithm for MCS is proposed and compared to global similarity measurements. Our algorithm provides high flexibility in the matching process, and it is very efficient in identifying local structural similarities. To predict and cluster biologically active compounds more efficiently, the concept of basis compounds is proposed that enables researchers to easily combine the MCS-based and traditional similarity measures with modern machine learning techniques. Support vector machines (SVMs) are used to test how the MCS-based similarity measure and the basis compound vectorization method perform on two empirically tested datasets. The test results show that MCS complements the well-known atom pair descriptor-based similarity measure. By combining these two measures, our SVM-based model predicts the biological activities of chemical compounds with higher specificity and sensitivity.
引用
收藏
页码:I366 / I374
页数:9
相关论文
共 46 条
[1]  
Abt M, 2001, STAT SCI, V16, P154
[2]   DIAGNOSTIC-TESTS-2 - PREDICTIVE VALUES .4. [J].
ALTMAN, DG ;
BLAND, JM .
BRITISH MEDICAL JOURNAL, 1994, 309 (6947) :102-102
[3]  
Barrow H. G., 1976, Information Processing Letters, V4, P83, DOI 10.1016/0020-0190(76)90049-1
[4]   Efficient matching and indexing of graph models in content-based retrieval [J].
Berretti, S ;
Del Bimbo, A ;
Vicario, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (10) :1089-1105
[5]   Comparison of methods for sequential screening of large compound sets [J].
Blower, PE ;
Cross, KP ;
Eichler, GS ;
Myatt, GJ ;
Weinstein, JN ;
Yang, CH .
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2006, 9 (02) :115-122
[6]   A graph distance metric based on the maximal common subgraph [J].
Bunke, H ;
Shearer, K .
PATTERN RECOGNITION LETTERS, 1998, 19 (3-4) :255-259
[7]  
BUNKE H, 2000, P VIS INT 2000 MONTR, P82
[8]   Using artificial neural networks to predict biological activity from simple molecular structural considerations [J].
Burden, FR .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1996, 15 (01) :7-11
[9]   ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS [J].
CARHART, RE ;
SMITH, DH ;
VENKATARAGHAVAN, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02) :64-73
[10]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)