Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

被引:51
作者
Fortunato, Michael E. [1 ]
Coley, Connor W. [1 ]
Barnes, Brian C. [2 ]
Jensen, Klavs F. [1 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[2] CCDC Army Res Lab, Detonat Sci & Modeling Branch, Aberdeen Proving Ground, MD 21005 USA
关键词
NEURAL-NETWORKS;
D O I
10.1021/acs.jcim.0c00403
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high-accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access data sets of organic reactions with explicitly calculated template applicability and pretraining a template-relevance neural network on this augmented applicability data set, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small data set of well-curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating that these strategies can be very useful for small data sets.
引用
收藏
页码:3398 / 3407
页数:10
相关论文
共 36 条
  • [1] Abadi M., 2016, TENSORFLOW LARGE SCA
  • [2] [Anonymous], 2006, RDKIT OPEN SOURCE CH, DOI DOI 10.2307/3592822
  • [3] Synergy Between Expert and Machine-Learning Approaches Allows for Improved Retrosynthetic Planning
    Badowski, Tomasz
    Gajewska, Ewa P.
    Molga, Karol
    Grzybowski, Bartosz A.
    [J]. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2020, 59 (02) : 725 - 730
  • [4] Bjerrum E. J., ARXIV170307076
  • [5] Chen B., ARXIV190512712
  • [6] Chollet F., 2015, KERAS
  • [7] A robotic platform for flow synthesis of organic compounds informed by AI planning
    Coley, Connor W.
    Thomas, Dale A., III
    Lummiss, Justin A. M.
    Jaworski, Jonathan N.
    Breen, Christopher P.
    Schultz, Victor
    Hart, Travis
    Fishman, Joshua S.
    Rogers, Luke
    Gao, Hanyu
    Hicklin, Robert W.
    Plehiers, Pieter P.
    Byington, Joshua
    Piotti, John S.
    Green, William H.
    Hart, A. John
    Jamison, Timothy F.
    Jensen, Klavs F.
    [J]. SCIENCE, 2019, 365 (6453) : 557 - +
  • [8] RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application
    Coley, Connor W.
    Green, William H.
    Jensen, Klays F.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (06) : 2529 - 2537
  • [9] Computer-Assisted Retrosynthesis Based on Molecular Similarity
    Coley, Connor W.
    Rogers, Luke
    Green, William H.
    Jensen, Klavs F.
    [J]. ACS CENTRAL SCIENCE, 2017, 3 (12) : 1237 - 1245
  • [10] Data Augmentation for Deep Neural Network Acoustic Modeling
    Cui, Xiaodong
    Goel, Vaibhava
    Kingsbury, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) : 1469 - 1477