Automated extraction of chemical synthesis actions from experimental procedures

被引:144
作者
Vaucher, Alain C. [1 ]
Zipoli, Federico [1 ]
Geluykens, Joppe [1 ]
Nair, Vishnu H. [1 ]
Schwaller, Philippe [1 ]
Laino, Teodoro [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
TRANSFORMER; MODEL;
D O I
10.1038/s41467-020-17266-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on manually annotated samples. Predictions on our test set result in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.
引用
收藏
页数:11
相关论文
共 31 条
[1]   A robotic platform for flow synthesis of organic compounds informed by AI planning [J].
Coley, Connor W. ;
Thomas, Dale A., III ;
Lummiss, Justin A. M. ;
Jaworski, Jonathan N. ;
Breen, Christopher P. ;
Schultz, Victor ;
Hart, Travis ;
Fishman, Joshua S. ;
Rogers, Luke ;
Gao, Hanyu ;
Hicklin, Robert W. ;
Plehiers, Pieter P. ;
Byington, Joshua ;
Piotti, John S. ;
Green, William H. ;
Hart, A. John ;
Jamison, Timothy F. ;
Jensen, Klavs F. .
SCIENCE, 2019, 365 (6453) :557-+
[2]   A graph-convolutional neural network model for the prediction of chemical reactivity [J].
Coley, Connor W. ;
Jin, Wengong ;
Rogers, Luke ;
Jamison, Timothy F. ;
Jaakkola, Tommi S. ;
Green, William H. ;
Barzilay, Regina ;
Jensen, Klavs F. .
CHEMICAL SCIENCE, 2019, 10 (02) :370-377
[3]   ChemicalTagger: A tool for semantic text-mining in chemistry [J].
Hawizy, Lezan ;
Jessop, David M. ;
Adams, Nico ;
Murray-Rust, Peter .
JOURNAL OF CHEMINFORMATICS, 2011, 3
[4]   Semi-supervised machine-learning classification of materials synthesis procedures [J].
Huo, Haoyan ;
Rong, Ziqin ;
Kononova, Olga ;
Sun, Wenhao ;
Botari, Tiago ;
He, Tanjin ;
Tshitoyan, Vahe ;
Ceder, Gerbrand .
NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
[5]   OSCAR4: a flexible architecture for chemical text-mining [J].
Jessop, David M. ;
Adams, Sam E. ;
Willighagen, Egon L. ;
Hawizy, Lezan ;
Murray-Rust, Peter .
JOURNAL OF CHEMINFORMATICS, 2011, 3
[6]   Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks [J].
Kim, Edward ;
Jensen, Zach ;
van Grootel, Alexander ;
Huang, Kevin ;
Staib, Matthew ;
Mysore, Sheshera ;
Chang, Haw-Shiuan ;
Strubell, Emma ;
McCallum, Andrew ;
Jegelka, Stefanie ;
Olivetti, Elsa .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (03) :1194-1201
[7]   Virtual screening of inorganic materials synthesis parameters with deep learning [J].
Kim, Edward ;
Huang, Kevin ;
Jegelka, Stefanie ;
Olivetti, Elsa .
NPJ COMPUTATIONAL MATERIALS, 2017, 3
[8]   Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning [J].
Kim, Edward ;
Huang, Kevin ;
Saunders, Adam ;
McCallum, Andrew ;
Ceder, Gerbrand ;
Olivetti, Elsa .
CHEMISTRY OF MATERIALS, 2017, 29 (21) :9436-9444
[9]   Data Descriptor: Machine-learned and codified synthesis parameters of oxide materials [J].
Kim, Edward ;
Huang, Kevin ;
Tomala, Alex ;
Matthews, Sara ;
Strubell, Emma ;
Saunders, Adam ;
McCallum, Andrew ;
Olivetti, Elsa .
SCIENTIFIC DATA, 2017, 4 :170127
[10]   OpenNMT: Open-Source Toolkit for Neural Machine Translation [J].
Klein, Guillaume ;
Kim, Yoon ;
Deng, Yuntian ;
Senellart, Jean ;
Rush, Alexander M. .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017): SYSTEM DEMONSTRATIONS, 2017, :67-72