Comparative analysis of local and consensus quantitative structure-activity relationship approaches for the prediction of bioconcentration factor

被引:14
作者
Piir, G. [1 ]
Sild, S. [1 ]
Maran, U. [1 ]
机构
[1] Univ Tartu, Inst Chem, EE-50090 Tartu, Estonia
关键词
BCF; QSAR; local models; global models; consensus models; bioconcentration factor; clustering; QSAR MODELS; BOILING POINTS; QSPR; DESCRIPTORS; REGRESSION; CHEMICALS; TOXICITY; OPTIMIZE; SCIENCE; SETS;
D O I
10.1080/1062936X.2012.762426
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Quantitative structure-activity relationships (QSARs) are broadly classified as global or local, depending on their molecular constitution. Global models use large and diverse training sets covering a wide range of chemical space. Local models focus on smaller structurally or chemically similar subsets that are conventionally selected by human experts or alternatively using clustering analysis. The current study focuses on the comparative analysis of different clustering algorithms (expectation-maximization, K-means and hierarchical) for seven different descriptor sets as structural characteristics and two rule-based approaches to select subsets for designing local QSAR models. A total of 111 local QSAR models are developed for predicting bioconcentration factor. Predictions from local models were compared with corresponding predictions from the global model. The comparison of coefficients of determination (r 2) and standard deviations for local models with similar subsets from the global model show improved prediction quality in 97% of cases. The descriptor content of derived QSARs is discussed and analyzed. Local QSAR models were further consolidated within the framework of consensus approach. All different consensus approaches increased performance over the global and local models. The consensus approach reduced the number of strongly deviating predictions by evening out prediction errors, which were produced by some local QSARs.
引用
收藏
页码:175 / 199
页数:25
相关论文
共 50 条
[1]  
[Anonymous], 2011, JCHEM 5 5 1
[2]  
[Anonymous], 1966, Applied regression analysis
[3]  
[Anonymous], 2007, JAV BAS CHEM COMP CH
[4]  
[Anonymous], 2007, CHEM DEV KIT 1 3
[5]  
Arnot JA, 2006, ENVIRON REV, V14, P257, DOI [10.1139/A06-005, 10.1139/a06-005]
[6]   CADASTER QSPR Models for Predictions of Melting and Boiling Points of Perfluorinated Chemicals [J].
Bhhatarai, Barun ;
Teetz, Wolfram ;
Liu, Tao ;
Oberg, Tomas ;
Jeliazkova, Nina ;
Kochev, Nikolay ;
Pukalov, Ognyan ;
Tetko, Igor V. ;
Kovarich, Simona ;
Papa, Ester ;
Gramatica, Paola .
MOLECULAR INFORMATICS, 2011, 30 (2-3) :189-204
[7]   Using Local Models to Improve (Q) SAR Predictivity [J].
Buchwald, Fabian ;
Girschick, Tobias ;
Seeland, Madeleine ;
Kramer, Stefan .
MOLECULAR INFORMATICS, 2011, 30 (2-3) :205-218
[8]  
CEFIC Long-range Research Initiative (LRI) project EC07, 2007, EST BIOC FACT BCF GO
[9]   The proposal of architecture for chemical splitting to optimize QSAR models for aquatic toxicity [J].
Colombo, Andrea ;
Benfenati, Emilio ;
Karelson, Mati ;
Maran, Uko .
CHEMOSPHERE, 2008, 72 (05) :772-780
[10]  
Daylight Chemical Information Systems Inc, DAYL FING