Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

被引:124
作者
Swamidass, SJ [1 ]
Chen, J [1 ]
Phung, P [1 ]
Ralaivola, L [1 ]
Baldi, P [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92717 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bti1055
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Small molecules play a fundamental role in organic chemistry and biology. They can be used to probe biological systems and to discover new drugs and other useful compounds. As increasing numbers of large datasets of small molecules become available, it is necessary to develop computational methods that can deal with molecules of variable size and structure and predict their physical, chemical and biological properties. Results: Here we develop several new classes of kernels for small molecules using their 1D, 2D and 3D representations. In 1D, we consider string kernels based on SMILES strings. In 2D, we introduce several similarity kernels based on conventional or generalized fingerprints. Generalized fingerprints are derived by counting in different ways subpaths contained in the graph of bonds, using depth-first searches. In 3D, we consider similarity measures between histograms of pairwise distances between atom classes. These kernels can be computed efficiently and are applied to problems of classification and prediction of mutagenicity, toxicity and anti-cancer activity on three publicly available datasets. The results derived using cross-validation methods are state-of-the-art. Tradeoffs between various kernels are briefly discussed.
引用
收藏
页码:I359 / I368
页数:10
相关论文
共 42 条
[1]  
[Anonymous], 1998, Learning in Graphical Models, chapter A tutorial on learning with Bayesian networks
[2]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[3]  
[Anonymous], 2001, P 18 ICML
[4]   The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem [J].
Baldi, P ;
Pollastri, G .
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (04) :575-602
[5]  
Bianucci AM, 2003, STUD FUZZ SOFT COMP, V120, P265
[6]   USE OF A NEURAL-NETWORK TO DETERMINE THE BOILING-POINT OF ALKANES [J].
CHERQAOUI, D ;
VILLEMIN, D .
JOURNAL OF THE CHEMICAL SOCIETY-FARADAY TRANSACTIONS, 1994, 90 (01) :97-102
[7]  
Collins M, 2002, ADV NEUR IN, V14, P625
[8]  
Cristianini N., 2000, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
[9]   STRUCTURE ACTIVITY RELATIONSHIP OF MUTAGENIC AROMATIC AND HETEROAROMATIC NITRO-COMPOUNDS - CORRELATION WITH MOLECULAR-ORBITAL ENERGIES AND HYDROPHOBICITY [J].
DEBNATH, AK ;
DECOMPADRE, RLL ;
DEBNATH, G ;
SHUSTERMAN, AJ ;
HANSCH, C .
JOURNAL OF MEDICINAL CHEMISTRY, 1991, 34 (02) :786-797
[10]  
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651