Machine Learning in Computer-Aided Synthesis Planning

被引:445
作者
Coley, Connor W. [1 ]
Green, William H. [1 ]
Jensen, Klays F. [1 ]
机构
[1] MIT, Dept Chem Engn, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
REACTION PREDICTION; NEURAL-NETWORKS; SYNTHESIS DESIGN; CHEMISTRY; RETROSYNTHESIS; DISCOVERY; SYSTEM; TOOL;
D O I
10.1021/acs.accounts.8b00087
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
CONSPECTUS: Computer-aided synthesis planning (GASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small molecule compounds. The ideal CASP program would take a molecular structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chemically feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two critical aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target molecule. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing number of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chemical reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of experimental success. While we introduce this task in the context of reaction validation, its utility extends to the prediction of side products and impurities, among other applications. We describe neural network-based approaches that we and others have developed for this forward prediction task that can be trained on previously published experimental data. Machine learning and artificial intelligence have revolutionized a number of disciplines, not limited to image recognition, dictation, translation, content recommendation, advertising, and autonomous driving. While there is a rich history of using machine learning for structure-activity models in chemistry, it is only now that it is being successfully applied more broadly to organic synthesis and synthesis design. As reported in this Account, machine learning is rapidly transforming CASP, but there are several remaining challenges and opportunities, many pertaining to the availability and standardization of both data and evaluation metrics, which must be addressed by the community at large.
引用
收藏
页码:1281 / 1289
页数:9
相关论文
共 64 条
  • [1] Predicting reaction performance in C-N cross-coupling using machine learning
    Ahneman, Derek T.
    Estrada, Jesus G.
    Lin, Shishi
    Dreher, Spencer D.
    Doyle, Abigail G.
    [J]. SCIENCE, 2018, 360 (6385) : 186 - 190
  • [2] [Anonymous], P 35 INT C MACHINE L
  • [3] [Anonymous], ARXIV170509037
  • [4] [Anonymous], 2017, ARXIV171104810
  • [5] [Anonymous], RDKit: Open-source cheminformatics
  • [6] [Anonymous], 2016, arXiv
  • [7] [Anonymous], 2017, ARXIV170401212
  • [8] Artificial intelligence in synthetic chemistry: achievements and prospects
    Baskin, Igor I.
    Madzhidov, Timur I.
    Antipin, Igor S.
    Varnek, Alexandre A.
    [J]. RUSSIAN CHEMICAL REVIEWS, 2017, 86 (11) : 1127 - 1156
  • [9] THE 1ST GENERAL INDEX OF MOLECULAR COMPLEXITY
    BERTZ, SH
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1981, 103 (12) : 3599 - 3601
  • [10] Route Design in the 21st Century: The ICSYNTH Software Tool as an Idea Generator for Synthesis Prediction
    Bogevig, Anders
    Federsel, Hans-Juergen
    Huerta, Fernando
    Hutchings, Michael G.
    Kraut, Hans
    Langer, Thomas
    Loew, Peter
    Oppawsky, Christoph
    Rein, Tobias
    Saller, Heinz
    [J]. ORGANIC PROCESS RESEARCH & DEVELOPMENT, 2015, 19 (02) : 357 - 368