Classifying Crystal Structures of Binary Compounds AB through Cluster Resolution Feature Selection and Support Vector Machine Analysis

被引:77
作者
Oliynyk, Anton O. [1 ]
Adutwum, Lawrence A. [1 ]
Harynuk, James J. [1 ]
Mar, Arthur [1 ]
机构
[1] Univ Alberta, Dept Chem, Edmonton, AB T6G 2G2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
IONIZATION-POTENTIALS; GASOLINE; PREDICTION; SCALE; ELECTRONEGATIVITY; CLASSIFICATION; CHEMISTRY; SAMPLES;
D O I
10.1021/acs.chemmater.6b02905
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Partial least-squares discriminant analysis (PLS-DA) and support vector machine (SVM) techniques were applied to develop a crystal structure predictor for binary AB compounds. Models were trained and validated on the basis of the classification of 706 AB compounds adopting the seven most common structure types (CsCl, NaCl, ZnS, CuAu, TlI, beta-FeB, and NiAs), through data extracted from Pearsons Crystal Data and ASM Alloy Phase Diagram Database. Out of 56 initial variables (descriptors based on elemental properties only), 31 were selected in as unbiased manner as possible through a procedure of forward selection and backward elimination, with the quality of the model evaluated by measuring the cluster resolution at each step. PLS-DA gave sensitivity of 96.5%, specificity of 66.0%, and accuracy of 77.1% for the validation set data, whereas SVM gave sensitivity of 94.2%, specificity of 92.7%, and accuracy of 93.2%, a significant improvement. Radii, electronegativity, and valence electrons, previously chosen intuitively in structure maps, were confirmed as important variables. PLS-DA and SVM could also make quantitative predictions of hypothetical compounds, unlike semiclassical approaches. The new compound RhCd was predicted to have the CsCl-type structure by PLS-DA (0.669 probability) and, at an even stronger confidence level, by SVM (0.918 probability). RhCd was synthesized by reaction of the elements at 800 degrees C and confirmed by X-ray diffraction to adopt the CsCl-type structure. SVM is thus a superior classification method in crystallography that is fast and makes correct, quantitative predictions; it may be more broadly applicable to help identify the structure of unknown compounds with any arbitrary composition.
引用
收藏
页码:6672 / 6681
页数:10
相关论文
共 59 条
[1]   The valency and the periodical system - Attempt on a theory of molecular compound [J].
Abegg, R .
ZEITSCHRIFT FUR ANORGANISCHE CHEMIE, 1904, 39 (03) :330-380
[2]   Unique Ion Filter: A Data Reduction Tool for GC/MS Data Preprocessing Prior to Chemometric Analysis [J].
Adutwum, L. A. ;
Harynuk, J. J. .
ANALYTICAL CHEMISTRY, 2014, 86 (15) :7726-7733
[3]   A SCALE OF ELECTRONEGATIVITY BASED ON ELECTROSTATIC FORCE [J].
ALLRED, AL ;
ROCHOW, EG .
JOURNAL OF INORGANIC & NUCLEAR CHEMISTRY, 1958, 5 (04) :264-268
[4]  
[Anonymous], 1974, INORGANIC SOLIDS INT
[5]  
[Anonymous], 1960, The Nature of the Chemical Bond, 3rd ed
[6]  
[Anonymous], 2015, SCIFINDER
[7]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[8]   Application of comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry method to identify potential biomarkers of perinatal asphyxia in a non-human primate model [J].
Beckstrom, Andrew C. ;
Humston, Elizabeth M. ;
Snyder, Laura R. ;
Synovec, Robert E. ;
Juul, Sandra E. .
JOURNAL OF CHROMATOGRAPHY A, 2011, 1218 (14) :1899-1906
[9]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[10]   Informatics derived materials databases for multifunctional properties [J].
Broderick, Scott ;
Rajan, Krishna .
SCIENCE AND TECHNOLOGY OF ADVANCED MATERIALS, 2015, 16 (01) :1-8