Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification

被引:71
作者
Baylon, Javier L. [1 ,2 ]
Cilfone, Nicholas A. [1 ,2 ]
Gulcher, Jeffrey R. [1 ,3 ]
Chittenden, Thomas W. [1 ,2 ,4 ]
机构
[1] WuXi NextCODE, Adv Artificial Intelligence Res Lab, Computat Stat & Bioinformat Grp, Cambridge, MA 02142 USA
[2] Complex Biol Syst Alliance, Medford, MA 02155 USA
[3] WuXi NextCODE, Canc Genet Grp, Cambridge, MA 02142 USA
[4] Harvard Med Sch, Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02215 USA
关键词
CHEMICAL-REACTIONS; NEURAL-NETWORKS; ASYMMETRIC-SYNTHESIS; ORGANIC-CHEMISTRY; KNOWLEDGE-BASE; COMPUTER; DESIGN; EFFICIENT; REPRESENTATION; METHODOLOGY;
D O I
10.1021/acs.jcim.8b00801
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Chemical synthesis planning is a key aspect in many fields of chemistry, especially drug discovery. Recent implementations of machine learning and artificial intelligence techniques for retrosynthetic analysis have shown great potential to improve computational methods for synthesis planning. Herein, we present a multiscale, data-driven approach for retrosynthetic analysis with deep highway networks (DHN). We automatically extracted reaction rules (i.e., ways in which a molecule is produced) from a data set consisting of chemical reactions derived from U.S. patents. We performed the retrosynthetic reaction prediction task in two steps: first, we built a DHN model to predict which group of reactions (consisting of chemically similar reaction rules) was employed to produce a molecule. Once a reaction group was identified, a DHN trained on the subset of reactions within the identified reaction group, was employed to predict the transformation rule used to produce a molecule. To validate our approach, we predicted the first retrosynthetic reaction step for 40 approved drugs using our multiscale model and compared its predictive performance with a conventional model trained on all machine-extracted reaction rules employed as a control. Our multiscale approach showed a success rate of 82.9% at generating valid reactants from retrosynthetic reaction predictions. Comparatively, the control model trained on all machine-extracted reaction rules yielded a success rate of 58.5% on the validation set of 40 pharmaceutical molecules, indicating a significant statistical improvement with our approach to match known first synthetic reaction of the tested drugs in this study. While our multiscale approach was unable to outperform state-of-the-art rule-based systems curated by expert chemists, multiscale classification represents a marked enhancement in retrosynthetic analysis and can be easily adapted for use in a range of artificial intelligence strategies.
引用
收藏
页码:673 / 688
页数:16
相关论文
共 68 条
  • [1] Improved Process for Ranolazine: An Antianginal Agent
    Aalla, Sampath
    Gilla, Goverdhan
    Anumula, Raghupathi Reddy
    Kurella, Srinivas
    Padi, Pratap Reddy
    Vummenthala, Prabhakar Reddy
    [J]. ORGANIC PROCESS RESEARCH & DEVELOPMENT, 2012, 16 (05) : 748 - 754
  • [2] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [3] SYNTHESIS OF SOME IMIDAZOLE-DERIVED AND PYRAZOLE-DERIVED CHELATING-AGENTS
    ADDISON, AW
    BURKE, PJ
    [J]. JOURNAL OF HETEROCYCLIC CHEMISTRY, 1981, 18 (04) : 803 - 805
  • [4] [Anonymous], 2015, ARXIV150706228
  • [5] [Anonymous], 2017, ARXIV170904555
  • [6] Route Design in the 21st Century: The ICSYNTH Software Tool as an Idea Generator for Synthesis Prediction
    Bogevig, Anders
    Federsel, Hans-Juergen
    Huerta, Fernando
    Hutchings, Michael G.
    Kraut, Hans
    Langer, Thomas
    Loew, Peter
    Oppawsky, Christoph
    Rein, Tobias
    Saller, Heinz
    [J]. ORGANIC PROCESS RESEARCH & DEVELOPMENT, 2015, 19 (02) : 357 - 368
  • [7] Brodersen Kay H., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3121, DOI 10.1109/ICPR.2010.764
  • [8] SYNTHESIS AND ANTIVIRAL ACTIVITY OF THE NUCLEOTIDE ANALOG (S)-1-[3-HYDROXY-2-(PHOSPHONYLMETHOXY)PROPYL]CYTOSINE
    BRONSON, JJ
    GHAZZOULI, I
    HITCHCOCK, MJM
    WEBB, RR
    MARTIN, JC
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 1989, 32 (07) : 1457 - 1463
  • [9] Total Synthesis of Mycophenolic Acid by a Palladium-Catalyzed Decarboxylative Allylation and Biomimetic Aromatization Sequence
    Brookes, Paul A.
    Cordes, Jens
    White, Andrew J. P.
    Barrett, Anthony G. M.
    [J]. EUROPEAN JOURNAL OF ORGANIC CHEMISTRY, 2013, 2013 (32) : 7313 - 7319
  • [10] Unsupervised data base clustering based on Daylight's fingerprint and Tanimoto similarity: A fast and automated way to cluster small and large data sets
    Butina, D
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (04): : 747 - 750