A new efficient approach for variable selection based on multiregression: Prediction of gas chromatographic retention times and response factors

被引:107
作者
Lucic, B
Trinajstic, N
Sild, S
Karelson, M
Katritzky, AR
机构
[1] Rudjer Boskovic Inst, HR-10001 Zagreb, Croatia
[2] Univ Florida, Dept Chem, Ctr Heterocucl Cpds, Gainesville, FL 32611 USA
[3] Tartu State Univ, Dept Chem, EE-2400 Tartu, Estonia
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1999年 / 39卷 / 03期
关键词
D O I
10.1021/ci980161a
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The selection of the most relevant variable is a frequent problem in the analysis of chemical data, especially now considering the large amounts of data created by the increased computer power and analytical resolution. A novel procedure for variable selection based on multiregression (MR) analysis is developed and applied to the quantitative structure-property relationship (QSPR) modeling of gas chromatographic retention times t(R) and Dietz response factors RF on 152 diverse chemical compounds. Using 296 descriptors generated by the CODESSA program, "absolutely the best" linear MR models containing from 1 to 5 descriptors were first selected (similar to 2 x 10(10) models were checked), and then "the best" linear stepwise MR models with six and seven descriptors were obtained through "i by i" stepwise selection. In this paper i was varied from 1 to 4, so that in each next step i descriptors were added to the previously selected descriptors. Nonlinear models were developed by the inclusion of cross-products of initial descriptors. We selected as the most important descriptors for tR the number of C-H and C-X bonds, connectivity indices of order 3, the highest normal mode vibrational frequency, and the rotational entropy of the molecule at 300 K. In the case of RF modeling the most important descriptors are those related to the relative number and weight of effective C atoms, the orbital electronic population, and the bond order and valency of C and II atoms. Comparison with the best six-descriptor models obtained by the normal CODESSA procedure shows that nonlinear seven-descriptor MR models now obtained achieve 30% (0.3520 vs 0.5032) and 12% (0.0472 vs 0.0530) less standard errors of estimate for tR and RF, respectively. Our novel procedure of selecting a small number of the most important descriptors from a data set allows us to extract a larger amount of useful information than with the procedure implemented in CODESSA. Thus, our new procedure enables the selection of the best possible MR models from 1010 possibilities. Through the introduction of cross-product terms, we obtained nonlinear MR models which are superior to the corresponding linear models.
引用
收藏
页码:610 / 621
页数:12
相关论文
共 48 条
[1]   The use of the ordered orthogonalized multivariate linear regression in a structure-activity study of coumarin and flavonoid derivatives as inhibitors of aldose reductase [J].
Amic, D ;
DavidovicAmic, D ;
Beslo, D ;
Lucic, B ;
Trinajstic, N .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (03) :581-586
[2]   APPLICATIONS OF NEURAL NETWORKS IN QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS OF DIHYDROFOLATE-REDUCTASE INHIBITORS [J].
ANDREA, TA ;
KALAYEH, H .
JOURNAL OF MEDICINAL CHEMISTRY, 1991, 34 (09) :2824-2836
[3]   NEURAL NETWORKS APPLIED TO PHARMACEUTICAL PROBLEMS .3. NEURAL NETWORKS APPLIED TO QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP ANALYSIS [J].
AOYAMA, T ;
SUZUKI, Y ;
ICHIKAWA, H .
JOURNAL OF MEDICINAL CHEMISTRY, 1990, 33 (09) :2583-2590
[4]   QUANTITATIVE STRUCTURE-SUBLIMATION ENTHALPY RELATIONSHIP STUDIED BY NEURAL NETWORKS, THEORETICAL CRYSTAL PACKING CALCULATIONS AND MULTILINEAR REGRESSION-ANALYSIS [J].
CHARLTON, M ;
DOCHERTY, R ;
HUTCHINGS, MG .
JOURNAL OF THE CHEMICAL SOCIETY-PERKIN TRANSACTIONS 2, 1995, (11) :2023-2030
[5]   THE DEVELOPMENT AND USE OF QUANTUM-MECHANICAL MOLECULAR-MODELS .76. AM1 - A NEW GENERAL-PURPOSE QUANTUM-MECHANICAL MOLECULAR-MODEL [J].
DEWAR, MJS ;
ZOEBISCH, EG ;
HEALY, EF ;
STEWART, JJP .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1985, 107 (13) :3902-3909
[6]   QUANTITATIVE STRUCTURE-RETENTION AND STRUCTURE ODOR INTENSITY RELATIONSHIPS FOR A DIVERSE GROUP OF ODOR-ACTIVE COMPOUNDS [J].
EGOLF, LM ;
JURS, PC .
ANALYTICAL CHEMISTRY, 1993, 65 (21) :3119-3126
[7]  
Georgakopoulos C.G., 1991, ANAL CHEM, V63, P2012
[8]   PREDICTION OF GAS-CHROMATOGRAPHIC RELATIVE RETENTION TIMES OF ANABOLIC-STEROIDS [J].
GEORGAKOPOULOS, CG ;
TSIKA, OG ;
KIBURIS, JC ;
JURS, PC .
ANALYTICAL CHEMISTRY, 1991, 63 (18) :2025-2028
[9]   PREDICTION OF CHROMATOGRAPHIC RETENTION VALUES (R(M)) AND PARTITION-COEFFICIENTS (LOG P-OCT) USING A COMBINATION OF SEMIEMPIRICAL SELF-CONSISTENT REACTION FIELD CALCULATIONS AND NEURAL NETWORKS [J].
GRUNENBERG, J ;
HERGES, R .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1995, 35 (05) :905-911
[10]  
Hoffmann R, 1996, B SOC CHIM FR, V133, P117