A look inside the black box: Using graph-theoretical descriptors to interpret a Continuous-Filter Convolutional Neural Network (CF-CNN) trained on the global and local minimum energy structures of neutral water clusters

被引:19
作者
Bilbrey, Jenna A. [1 ]
Heindel, Joseph P. [2 ]
Schram, Malachi [3 ]
Bandyopadhyay, Pradipta [4 ]
Xantheas, Sotiris S. [2 ,3 ]
Choudhury, Sutanay [3 ]
机构
[1] Pacific Northwest Natl Lab, Comp & Analyt Div, 902 Battelle Blvd,POB 999, Richland, WA 99352 USA
[2] Univ Washington, Dept Chem, Seattle, WA 98195 USA
[3] Pacific Northwest Natl Lab, Adv Comp Math & Data Div, 902 Battelle Blvd,POB 999, Richland, WA 99352 USA
[4] Jawaharlal Nehru Univ, Sch Computat & Integrat Sci, New Delhi 110067, India
关键词
HYDROGEN-BONDING NETWORK; (H2O)(20); DYNAMICS; MODELS;
D O I
10.1063/5.0009933
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
We describe a method for the post-hoc interpretation of a neural network (NN) trained on the global and local minima of neutral water clusters. We use the structures recently reported in a newly published database containing over 5 x 10(6) unique water cluster networks (H2O)(N) of size N = 3-30. The structural properties were first characterized using chemical descriptors derived from graph theory, identifying important trends in topology, connectivity, and polygon structure of the networks associated with the various minima. The code to generate the molecular graphs and compute the descriptors is available at https://github.com/exalearn/molecular-graph-descriptors, and the graphs are available alongside the original database at https://sites.uw.edu/wdbase/. A Continuous-Filter Convolutional Neural Network (CF-CNN) was trained on a subset of 500 000 networks to predict the potential energy, yielding a mean absolute error of 0.002 +/- 0.002 kcal/mol per water molecule. Clusters of sizes not included in the training set exhibited errors of the same magnitude, indicating that the CF-CNN protocol accurately predicts energies of networks for both smaller and larger sizes than those used during training. The graph-theoretical descriptors were further employed to interpret the predictive power of the CF-CNN. Topological measures, such as the Wiener index, the average shortest path length, and the similarity index, suggested that all networks from the test set were within the range of values as the ones from the training set. The graph analysis suggests that larger errors appear when the mean degree and the number of polygons in the cluster lie further from the mean of the training set. This indicates that the structural space, and not just the chemical space, is an important factor to consider when designing training sets, as predictive errors can result when the structural composition is sufficiently different from the bulk of those in the training set. To this end, the developed descriptors are quite effective in explaining the results of the CF-CNN (a.k.a. the "black box") model.
引用
收藏
页数:15
相关论文
共 55 条
[1]  
[Anonymous], 2014, INT C LEARN REPR WOR
[2]  
Aprá E, 2009, PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS
[3]  
Battaglia Peter W, 2018, Relational inductive biases, deep learning, and graph networks, DOI DOI 10.48550/ARXIV.1806.01261
[4]   Generalized neural-network representation of high-dimensional potential-energy surfaces [J].
Behler, Joerg ;
Parrinello, Michele .
PHYSICAL REVIEW LETTERS, 2007, 98 (14)
[5]   First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems [J].
Behler, Joerg .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2017, 56 (42) :12828-12840
[6]   Constructing high-dimensional neural network potentials: A tutorial review [J].
Behler, Joerg .
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 2015, 115 (16) :1032-1050
[7]   Atom-centered symmetry functions for constructing high-dimensional neural network potentials [J].
Behler, Joerg .
JOURNAL OF CHEMICAL PHYSICS, 2011, 134 (07)
[8]   De novo exploration and self-guided learning of potential-energy surfaces [J].
Bernstein, Noam ;
Csanyi, Gabor ;
Deringer, Volker L. .
NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
[9]   Development of transferable interaction models for water.: IV.: A flexible, all-atom polarizable potential (TTM2-F) based on geometry dependent charges derived from an ab initio monomer dipole moment surface [J].
Burnham, CJ ;
Xantheas, SS .
JOURNAL OF CHEMICAL PHYSICS, 2002, 116 (12) :5115-5124
[10]   Machine learning of accurate energy-conserving molecular force fields [J].
Chmiela, Stefan ;
Tkatchenko, Alexandre ;
Sauceda, Huziel E. ;
Poltavsky, Igor ;
Schuett, Kristof T. ;
Mueller, Klaus-Robert .
SCIENCE ADVANCES, 2017, 3 (05)