Graph kernels for chemical informatics

被引:293
作者
Ralaivola, L
Swamidass, SJ
Saigo, H
Baldi, P [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA 92697 USA
基金
美国国家卫生研究院;
关键词
kernel methods; graph kernels; convolution kernels; spectral kernels; computational chemistry; chemical informatics; toxicity; activity; drug design; recursive neural networks;
D O I
10.1016/j.neunet.2005.07.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Increased availability of large repositories of chemical compounds is creating new challenges and opportunities for the application of machine learning methods to problems in computational chemistry and chemical informatics. Because chemical compounds are often represented by the graph of their covalent bonds, machine learning methods in this domain Must be capable of processing graphical structures with variable size. Here, we first briefly review the literature on graph kernels and then introduce three new kernels (Tanimoto, MmMax, Hybrid) based on the idea of molecular fingerprints and counting labeled paths of depth up to d using depth-first search from each possible vertex. The kernels are applied to three classification problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly available data sets. The kernels achieve performances at least comparable, and most often superior, to those previously reported in the literature reaching accuracies of 91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge) dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D representations of molecules, are briefly discussed. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1093 / 1110
页数:18
相关论文
共 78 条
[1]  
AIZERMAN MA, 1965, AUTOMAT REM CONTR+, V25, P821
[2]  
[Anonymous], ADV NEURAL INFORM PR
[3]  
[Anonymous], 1998, Learning in Graphical Models, chapter A tutorial on learning with Bayesian networks
[4]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[5]  
[Anonymous], ADV NEURAL INFORM PR
[6]  
[Anonymous], BIOINFORMATICS
[7]  
[Anonymous], 2001, P 18 ICML
[8]   Modeling splicing sites with pairwise correlations [J].
Arita, M ;
Tsuda, K ;
Asai, K .
BIOINFORMATICS, 2002, 18 :S27-S34
[9]  
Bach F.R., 2002, J MACHINE LEARNING R, V3, P1
[10]   Hybrid modeling, HMM/NN architectures, and protein applications [J].
Baldi, P ;
Chauvin, Y .
NEURAL COMPUTATION, 1996, 8 (07) :1541-1565