Semi-supervised machine-learning classification of materials synthesis procedures

被引:110
作者
Huo, Haoyan [1 ,2 ]
Rong, Ziqin [1 ]
Kononova, Olga [1 ]
Sun, Wenhao [2 ]
Botari, Tiago [1 ,2 ]
He, Tanjin [1 ,2 ]
Tshitoyan, Vahe [2 ,3 ]
Ceder, Gerbrand [1 ,2 ]
机构
[1] Univ Calif Berkeley, Dept Mat Sci & Engn, Berkeley, CA 94720 USA
[2] Lawrence Berkeley Natl Lab, Mat Sci Div, Berkeley, CA 94720 USA
[3] Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
NEURAL-NETWORKS; TEMPERATURE;
D O I
10.1038/s41524-019-0204-1
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as "grinding" and "heating", "dissolving" and "centrifuging", etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.
引用
收藏
页数:7
相关论文
共 47 条
[21]  
Jurafsky D., 2014, SPEECH LANGUAGE PROC, V3
[22]   Virtual screening of inorganic materials synthesis parameters with deep learning [J].
Kim, Edward ;
Huang, Kevin ;
Jegelka, Stefanie ;
Olivetti, Elsa .
NPJ COMPUTATIONAL MATERIALS, 2017, 3
[23]   Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning [J].
Kim, Edward ;
Huang, Kevin ;
Saunders, Adam ;
McCallum, Andrew ;
Ceder, Gerbrand ;
Olivetti, Elsa .
CHEMISTRY OF MATERIALS, 2017, 29 (21) :9436-9444
[24]   Data Descriptor: Machine-learned and codified synthesis parameters of oxide materials [J].
Kim, Edward ;
Huang, Kevin ;
Tomala, Alex ;
Matthews, Sara ;
Strubell, Emma ;
Saunders, Adam ;
McCallum, Andrew ;
Olivetti, Elsa .
SCIENTIFIC DATA, 2017, 4 :170127
[25]   ETM: Entity Topic Models for Mining Documents Associated with Entities [J].
Kim, Hyungsul ;
Sun, Yizhou ;
Hockenmaier, Julia ;
Han, Jiawei .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :349-358
[26]   Information Retrieval and Text Mining Technologies for Chemistry [J].
Krallinger, Martin ;
Rabal, Obdulia ;
Lourenco, Analia ;
Oyarzabal, Julen ;
Valencia, Alfonso .
CHEMICAL REVIEWS, 2017, 117 (12) :7673-7761
[27]   Highly Selective Hydrodecarbonylation of Oleic Acid into n-Heptadecane over a Supported Nickel/Zinc Oxide-Alumina Catalyst [J].
Li, Guangci ;
Zhang, Feng ;
Chen, Lei ;
Zhang, Chuanhui ;
Huang, He ;
Li, Xuebing .
CHEMCATCHEM, 2015, 7 (17) :2646-2653
[28]   Synthesis and characterization of the acidic properties and pore texture of Al-SBA-15 supports for the canola oil transesterification [J].
Liang, Chenju ;
Wei, Ming-Chi ;
Tseng, Hui-Hsin ;
Shu, En-Chin .
CHEMICAL ENGINEERING JOURNAL, 2013, 223 :785-794
[29]  
Maas A. L., P 49 ANN M ASS COMP, V1, P142
[30]  
Manning C., 1999, Foundations of statistical natural language processing