Learning to Predict Chemical Reactions

被引:135
作者
Kayala, Matthew A. [1 ]
Azencott, Chloe-Agathe [1 ]
Chen, Jonathan H. [1 ]
Baldi, Pierre [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92612 USA
关键词
FINDING SADDLE-POINTS; NUDGED ELASTIC BAND; MECHANISM; SYSTEM; KNOWLEDGE; MODEL;
D O I
10.1021/ci200207y
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) physical laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles, respectively, are not high throughput, are not generalizable or scalable, and lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a physically inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approximations of molecular orbitals (MOs) and use topological and physicochemical attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chemistry data set consisting of 1630 full multistep reactions with 2358 distinct starting materials and intermediates, associated with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top-ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of nonproductive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal (http://cdb.ics.uci.edu) under the Toolkits section.
引用
收藏
页码:2209 / 2222
页数:14
相关论文
共 49 条
[1]  
[Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
[2]  
[Anonymous], 1978, J PRAKT CHEM, DOI DOI 10.1002/PRAC.19783200525
[3]   One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties [J].
Azencott, Chloe-Agathe ;
Ksikes, Alexandre ;
Swamidass, S. Joshua ;
Chen, Jonathan H. ;
Ralaivola, Liva ;
Baldi, Pierre .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (03) :965-974
[4]   Mining chemical structural information from the drug literature [J].
Banville, DL .
DRUG DISCOVERY TODAY, 2006, 11 (1-2) :35-42
[5]   A graph-based toy model of chemistry [J].
Benkö, G ;
Flamm, C ;
Stadler, PF .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (04) :1085-1093
[6]   CASREACT - MORE THAN A MILLION REACTIONS [J].
BLAKE, JE ;
DANA, RC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1990, 30 (04) :394-399
[7]  
Burges Chris, 2005, P 22 INT C MACHINE L, P89, DOI DOI 10.1145/1102351.1102363
[8]   A Unified Mechanistic View on the Morita-Baylis-Hillman Reaction: Computational and Experimental Investigations [J].
Cantillo, David ;
Kappe, C. Oliver .
JOURNAL OF ORGANIC CHEMISTRY, 2010, 75 (24) :8615-8626
[9]   Block-Localized Density Functional Theory (BLDFT), Diabatic Coupling, and Their Use in Valence Bond Theory for Representing Reactive Potential Energy Surfaces [J].
Cembran, Alessandro ;
Song, Lingchun ;
Mo, Yirong ;
Gao, Jiali .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2009, 5 (10) :2702-2716
[10]   Synthesis Explorer: A Chemical Reaction Tutorial System for Organic Synthesis Design and Mechanism Prediction [J].
Chen, Jonathan H. ;
Baldi, Pierre .
JOURNAL OF CHEMICAL EDUCATION, 2008, 85 (12) :1699-1703