Development of a new regression analysis method using independent component analysis

被引:42
作者
Kaneko, Hiromasa [1 ]
Arakawa, Masamoto [1 ]
Funatsu, Kimito [1 ]
机构
[1] Univ Tokyo, Dept Chem Syst Engn, Bunkyo Ku, Tokyo 1138656, Japan
关键词
D O I
10.1021/ci700245f
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
In this paper, independent component analysis (ICA) and regression analysis are combined to extract significant components. ICA is a method that extracts mutually independent components from explanatory variables. A relationship between the independent components and an objective variable is constructed by the least-squares method. This method is named ICA-MLR (MLR = multiple linear regression). We verified the superiority of ICA-MLR over partial least squares (PLS) with simulation data and tried to apply this method to a quantitative structure-property relationship analysis of aqueous solubility. We constructed models between aqueous solubility and 173 molecular descriptors. PLS and genetic algorithm PLS models were constructed for a comparison of ICA-MLR. R-2, Q(2), and R-pred(2) values of the PLS model are 0.836, 0.819, and 0.848, respectively. These values of the ICA-MLR model are 0.937, 0.868, and 0.894, respectively. ICA-MLR achieved higher predictive accuracy than PLS. ICA-MLR could extract effective components from explanatory variables and construct the regression model with high predictive accuracy. In addition, the information of regression coefficients b(ICA-MLR) indicates the magnitude of contribution of each descriptor in the analysis of aqueous solubility.
引用
收藏
页码:534 / 541
页数:8
相关论文
共 19 条
[1]   Drug-like annotation and duplicate analysis of a 23-supplier chemical database totalling 2.7 million compounds [J].
Baurin, N ;
Baker, R ;
Richardson, C ;
Chen, I ;
Foloppe, N ;
Potter, A ;
Jordan, A ;
Roughley, S ;
Parratt, M ;
Greaney, P ;
Morley, D ;
Hubbard, RE .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (02) :643-651
[2]   Global and local computational models for aqueous solubility prediction of drug-like molecules [J].
Bergström, CAS ;
Wassvik, CM ;
Norinder, U ;
Luthman, K ;
Artursson, P .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (04) :1477-1488
[3]   A new approach to near-infrared spectral data analysis using independent component analysis [J].
Chen, J ;
Wang, XZ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (04) :992-1001
[4]   Generalized fragment-substructure based property prediction method [J].
Clark, M .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (01) :30-38
[5]   INDEPENDENT COMPONENT ANALYSIS, A NEW CONCEPT [J].
COMON, P .
SIGNAL PROCESSING, 1994, 36 (03) :287-314
[6]  
*FUJ KY SYT ENG LI, ADMEWORKS MOD BUILD
[7]  
Gasteiger J., 2003, CHEMOINFORMATICS A T
[8]   GA strategy for variable selection in QSAR studies: GA-based PLS analysis of calcium channel antagonists [J].
Hasegawa, K ;
Miyashita, Y ;
Funatsu, K .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (02) :306-310
[9]   ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach [J].
Hou, TJ ;
Xia, K ;
Zhang, W ;
Xu, XJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :266-275
[10]  
HOUCK C, 1995, 9509 NCSUIE TR