Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity

被引:219
作者
Huang, Bing
von Lilienfeld, O. Anatole [1 ]
机构
[1] Univ Basel, Inst Phys Chem, Klingelbergstr 80, CH-4056 Basel, Switzerland
基金
瑞士国家科学基金会;
关键词
CHEMICAL UNIVERSE; MODELS;
D O I
10.1063/1.4964627
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The predictive accuracy of Machine Learning (ML) models of molecular properties depends on the choice of the molecular representation. Inspired by the postulates of quantum mechanics, we introduce a hierarchy of representations which meet uniqueness and target similarity criteria. To systematically control target similarity, we simply rely on interatomic many body expansions, as implemented in universal force-fields, including Bonding, Angular (BA), and higher order terms. Addition of higher order contributions systematically increases similarity to the true potential energy and predictive accuracy of the resulting ML models. We report numerical evidence for the performance of BAML models trained on molecular properties pre-calculated at electron-correlated and density functional theory level of theory for thousands of small organic molecules. Properties studied include enthalpies and free energies of atomization, heat capacity, zero-point vibrational energies, dipole-moment, polarizability, HOMO/LUMO energies and gap, ionization potential, electron affinity, and electronic excitations. After training, BAML predicts energies or electronic properties of out-of-sample molecules with unprecedented accuracy and speed. Published by AIP Publishing.
引用
收藏
页数:6
相关论文
共 34 条