In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods

被引:79
作者
Cheng, Feixiong [1 ]
Shen, Jie [1 ]
Yu, Yue [1 ]
Li, Weihua [1 ]
Liu, Guixia [1 ]
Lee, Philip W. [1 ,2 ]
Tang, Yun [1 ]
机构
[1] E China Univ Sci & Technol, Sch Pharm, Dept Pharmaceut Sci, Shanghai 200237, Peoples R China
[2] Kyoto Univ, Grad Sch Agr, Sakyo Ku, Kyoto 6068502, Japan
关键词
Tetrahymena pyriformis toxicity; Quantitative structure-toxicity relationship; Substructure pattern recognition; Support vector machine; Machine learning; Information gain; APPLICABILITY DOMAIN; QSAR; PHENOLS; CLASSIFICATION; DESCRIPTORS; BENZENES; MODES;
D O I
10.1016/j.chemosphere.2010.11.043
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1636 / 1643
页数:8
相关论文
共 31 条
[1]   In silico approaches to prediction of aqueous and DMSO solubility of drug-like compounds:: Trends, problems and solutions [J].
Balakin, KV ;
Savchuk, NP ;
Tetko, IV .
CURRENT MEDICINAL CHEMISTRY, 2006, 13 (02) :223-241
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
Chang C., 2010, LIBSVM: A library for support vector machines 2010
[4]  
Cortes C., 1995, Machine Learning, V297, P273, DOI [DOI 10.1007/BF00994018, DOI 10.1023/A:1022627411411]
[5]   Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis [J].
Cronin, MTD ;
Aptula, AO ;
Duffy, JC ;
Netzeva, TI ;
Rowe, PH ;
Valkova, IV ;
Schultz, TW .
CHEMOSPHERE, 2002, 49 (10) :1201-1221
[6]   Parametrization of electrophilicity for the prediction of the toxicity of aromatic compounds [J].
Cronin, MTD ;
Manga, N ;
Seward, JR ;
Sinks, GD ;
Schultz, TW .
CHEMICAL RESEARCH IN TOXICOLOGY, 2001, 14 (11) :1498-1505
[7]   Reoptimization of MDL keys for use in drug discovery [J].
Durant, JL ;
Leland, BA ;
Henry, DR ;
Nourse, JG .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (06) :1273-1280
[8]   The predictive value of microbiologic diagnostic tests if asymptomatic carriers are present [J].
Gunnarsson, RK ;
Lanke, J .
STATISTICS IN MEDICINE, 2002, 21 (12) :1773-1785
[9]   The problem of overfitting [J].
Hawkins, DM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :1-12
[10]   The REACH concept and its impact on toxicological sciences [J].
Hengstler, JG ;
Foth, H ;
Kahl, R ;
Kramer, PJ ;
Lilienblum, W ;
Schulz, T ;
Schweinfurth, H .
TOXICOLOGY, 2006, 220 (2-3) :232-239