Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies

被引:485
作者
Hansen, Katja [1 ]
Montavon, Gregoire [2 ]
Biegler, Franziska [2 ]
Fazli, Siamac [2 ]
Rupp, Matthias [3 ]
Scheffler, Matthias [1 ]
von Lilienfeld, O. Anatole [4 ]
Tkatchenko, Alexandre [1 ]
Mueller, Klaus-Robert [2 ,5 ]
机构
[1] Max Planck Gesell, Fritz Haber Inst, Berlin, Germany
[2] TU Berlin, Machine Learning Grp, Berlin, Germany
[3] Swiss Fed Inst Technol, Inst Pharmaceut Sci, Zurich, Switzerland
[4] Argonne Natl Lab, Argonne Leadership Comp Facil, Lemont, IL USA
[5] Korea Univ, Dept Brain & Cognit Engn, Seoul, South Korea
基金
加拿大自然科学与工程研究理事会; 欧洲研究理事会; 新加坡国家研究基金会;
关键词
MIXED-EFFECTS MODELS; DEEP; SURFACES; REGRESSION; SELECTION; BIAS;
D O I
10.1021/ct400195d
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.
引用
收藏
页码:3404 / 3419
页数:16
相关论文
共 78 条
  • [11] BENSON SW, 1965, BOND ENERGIES
  • [12] Bishop ChristopherM., 2011, PATTERN RECOGN
  • [13] NEURAL-NETWORK MODELS OF POTENTIAL-ENERGY SURFACES
    BLANK, TB
    BROWN, SD
    CALHOUN, AW
    DOREN, DJ
    [J]. JOURNAL OF CHEMICAL PHYSICS, 1995, 103 (10) : 4129 - 4137
  • [14] 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13
    Blum, Lorenz C.
    Reymond, Jean-Louis
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, 131 (25) : 8732 - +
  • [15] Ab initio molecular simulations with numeric atom-centered orbitals
    Blum, Volker
    Gehrke, Ralf
    Hanke, Felix
    Havu, Paula
    Havu, Ville
    Ren, Xinguo
    Reuter, Karsten
    Scheffler, Matthias
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2009, 180 (11) : 2175 - 2196
  • [16] Bottou L., 1991, Proceedings of Neuro-Nimes, V91, P12
  • [17] Bottou Leon, 2007, LARGE SCALE KERNEL M
  • [18] Braun ML, 2008, J MACH LEARN RES, V9, P1875
  • [19] SUBMODEL SELECTION AND EVALUATION IN REGRESSION - THE X-RANDOM CASE
    BREIMAN, L
    SPECTOR, P
    [J]. INTERNATIONAL STATISTICAL REVIEW, 1992, 60 (03) : 291 - 319
  • [20] Statistical modeling: The two cultures
    Breiman, L
    [J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215