Adaptive weighted least square support vector machine regression integrated with outlier detection and its application in QSAR

被引:72
作者
Cui, Wentong [1 ]
Yan, Xuefeng [1 ]
机构
[1] E China Univ Sci & Technol, Automat Inst, Shanghai 200237, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier; Robust 3 sigma principle; Weight; Least square support vector machine regression; Quantitative structure-activity relationships; INHIBITORS;
D O I
10.1016/j.chemolab.2009.05.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In order to eliminate the influence of unavoidable outliers in training sample on a model's performance, a novel least square support vector machine regression, which combines outlier detection approach and adaptive weight value for the training sample. is proposed and named as adaptive weighted least square support vector machine regression (AWLS-SVM). Firstly, the effective robust 3 sigma principle is used to detect marked outliers for the training sample. Secondly, based on the training sample without marked outliers, least square support vector machine regression is employed to develop the model and the fitting error of each sample data is obtained. Thirdly, according to the fitting error of each sample data, the initial weight is calculated. The bigger the fitting error of sample data is, the smaller the weight value of the sample data. Thus, the potential outliers, which are not detected by the robust 3 sigma principle and have bigger fitting errors, have smaller weight values to reduce the influence of the potential outliers on the performance of model. Then, LS-SVM is applied for the weighted sample to develop the model again. Finally, via the proposed weight value iterative method, the weight values of the training sample are converged, and the model with good predicting performance is obtained. To illustrate the performance of AWLS-SVM. simulation experiment is designed to produce the training sample with marked outlier and some nonmarked outliers. AWLS-SVM, AWLS-SVM without the robust 3 sigma principle, LS-SVM with the robust 3 sigma principle, LS-SVM, and radial basis function network are applied to develop the model based on the designed sample. The results show that the influence of marked and un-marked outliers on the model's performance is eliminated by AWLS-SVM, and that the predicting performance of AWLS-SVM is the best. Furthermore, the AWLS-SVM method was applied to develop the quantitative structure-activity relationships (QSAR) model of HIV-1 protease inhibitors, and the satisfactory result was obtained. (c) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:130 / 135
页数:6
相关论文
共 17 条
[1]  
Cristianini N., 2000, INTRO SUPPORT VECTOR
[2]  
Cristianini Nello., 2004, INTRO SUPPORT VECTOR
[3]  
Garg R, 2006, TOP HETEROCYCL CHEM, V3, P181, DOI 10.1007/7081_038
[4]   GENERAL QUALITATIVE DEFINITION OF ROBUSTNESS [J].
HAMPEL, FR .
ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (06) :1887-&
[5]   INFLUENCE CURVE AND ITS ROLE IN ROBUST ESTIMATION [J].
HAMPEL, FR .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1974, 69 (346) :383-393
[6]  
Hansch C., 1979, Substituent constants for correlation analysis in chemistry and biology
[7]   Multivariate free-Wilson analysis of alpha-chymotrypsin inhibitors using PLS [J].
Hasegawa, K ;
Yokoo, N ;
Watanabe, K ;
Hirata, M ;
Miyashita, Y ;
Sasaki, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1996, 33 (01) :63-69
[8]  
HUBER P, 1989, ROBUST STAT
[9]   A prion molecular descriptors in QSAR: a case of HIV-1 protease inhibitors. I. The chemometric approach [J].
Kiralj, R ;
Ferreira, MMC .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2003, 21 (05) :435-448
[10]   Weighted support vector machine for quality estimation in the polymerization process [J].
Lee, DE ;
Song, JH ;
Song, SO ;
Yoon, ES .
INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2005, 44 (07) :2101-2105