Quantitative structure-retention relationship for the Kovats retention indices of a large set of terpenes: A combined data splitting-feature selection strategy

被引:47
作者
Hemmateenejad, Bahram [1 ]
Javadnia, Katayoun
Elyasi, Maryam
机构
[1] Shiraz Univ, Dept Chem, Shiraz 71454, Iran
[2] Shiraz Univ Med Sci, Med & Nat Prod Chem Res Ctr, Shiraz, Iran
[3] Univ Tehran Med Sci, Fac Pharm, Pharmaceut Sci Res Ctr, Tehran, Iran
关键词
terpenoids; quantitative structure property relationship; Kovats index; data splitting; feature selection; combined data splitting-feature selection;
D O I
10.1016/j.aca.2007.04.009
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
A data set consisting of a large number of terpenoids, the widely distributed compounds in nature that are found in abundance in higher plants, have been used to develop a quantitative structure property relationship (QSPR) for their Kovats retention index. QSPR models are usually obtained by splitting the data into two sets including calibration (or training) and prediction (or validation). All model building steps, especially feature selection procedure, are performed using this initial splitting, and therefore the performances of the resulted models are highly dependent on the initial data splitting. To investigate the effects of data splitting on the feature selection in the current article we proposed a combined data splitting-feature selection (CDFS) methodology for QSPR model development by producing several different training/validation/test sets, and repeating all of the model building studies. In this method, data splitting is achieved many times and in each case feature selection is performed. The resulted models are compared for similarity and dissimilarity between the selected descriptors. The final model is one whose descriptors are the common variables between all of resulted models. The method was applied to QSPR study of a large data set containing the Kovats retention indices of 573 terpenoids. A final 8-parametric multilinear model with constitutional and topological indices was obtained. Cross-validation indicated that the model could reproduce more than 90% of variances in the Kovats retention data. The relative error of prediction for an external test set of 50 compounds was 3.2%. Finally, to improve the results, structure-retention relationships were followed by nonlinear approach using artificial neural networks and consequently better results were obtained. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:72 / 81
页数:10
相关论文
共 40 条
[1]   Quantitative study of the structure-retention index relationship in the imine family [J].
Acevedo-Martínez, J ;
Escalona-Arranz, JC ;
Villar-Rojas, A ;
Téllez-Palmero, F ;
Pérez-Rosés, R ;
González, L ;
Carrasco-Velar, R .
JOURNAL OF CHROMATOGRAPHY A, 2006, 1102 (1-2) :238-244
[2]  
Adams R.P., 2004, Identification of Essential Oil Components by Gas Chromatography/Quadrupole Mass Spectroscopy
[3]  
APUTA AO, 2005, QSAR COMB SCI, V24, P385
[4]   Comparative characteristics of HPLC columns based on quantitative structure-retention relationships (QSRR) and hydrophobic-subtraction model [J].
Baczek, T ;
Kaliszan, R ;
Novotná, K ;
Jandera, P .
JOURNAL OF CHROMATOGRAPHY A, 2005, 1075 (1-2) :109-115
[5]   Genetic algorithm applied to the selection of principal components [J].
Barros, AS ;
Rutledge, DN .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 40 (01) :65-81
[6]   Quantitative structure-activity relationship modeling of juvenile hormone mimetic compounds for Culex pipiens larvae, with a discussion of descriptor-thinning methods [J].
Basak, SC ;
Natarajan, R ;
Mills, D ;
Hawkins, DM ;
Kraker, JJ .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :65-77
[7]   Chance correlation in variable subset regression: Influence of the objective function, the selection mechanism, and ensemble averaging [J].
Baumann, K .
QSAR & COMBINATORIAL SCIENCE, 2005, 24 (09) :1033-1046
[8]   A QSPR study of the p solute polarity parameter to estimate retention in HPLC [J].
Bosque, R ;
Sales, J ;
Bosch, E ;
Rosés, M ;
García-Alvarez-Coque, MC ;
Torres-Lapasió, JR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (04) :1240-1247
[9]   In silico design in homogeneous catalysis using descriptor modelling [J].
Burello, Enrico ;
Rothenberg, Gadi .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2006, 7 (09) :375-404
[10]   Simultaneous modeling of the Kovats retention indices on OV-1 and SE-54 stationary phases using artificial neural networks [J].
Fatemi, MH .
JOURNAL OF CHROMATOGRAPHY A, 2002, 955 (02) :273-280