Data quality in predictive toxicology: Identification of chemical structures and calculation of chemical properties

被引:15
作者
Helma, C
Kramer, S
Pfahringer, B
Gottmann, E
机构
[1] Univ Freiburg, Inst Comp Sci, Machine Learning Lab, D-79110 Freiburg, Germany
[2] Univ Vienna, Canc Res Inst, Vienna, Austria
[3] Univ Vienna, Inst Environm Hyg, Vienna, Austria
[4] Austrian Res Inst Artificial Intelligence, Vienna, Austria
关键词
carcinogenicity; knowledge discovery; machine learning; predictive toxicology; quality assurance; structure-activity relationships;
D O I
10.1289/ehp.001081029
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Every technique for toxicity prediction and for the detection of structure-activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the potential sources of errors associated with the identification of compounds, the representation of their structures, and the calculation of chemical descriptors. It is based on a case study where machine learning techniques were applied to data from noncongeneric compounds and a complex toxicologic end point (carcinogenicity). We propose methods applicable to the routine quality control of large chemical datasets, but our main intention is to raise awareness about this topic and to open a discussion about quality assurance in predictive toxicology. The accuracy and reproducibility of toxicity data will be reported in another paper.
引用
收藏
页码:1029 / 1033
页数:5
相关论文
共 26 条
[1]   THE INFLUENCE OF CHEMICAL-STRUCTURE ON THE EXTENT AND SITES OF CARCINOGENESIS FOR 522 RODENT CARCINOGENS AND 55 DIFFERENT HUMAN CARCINOGEN EXPOSURES [J].
ASHBY, J ;
PATON, D .
MUTATION RESEARCH, 1993, 286 (01) :3-74
[2]  
BABEL, 2000, MOL STRUCTURE INFORM
[3]   The NIEHS Predictive-Toxicology Evaluation Project [J].
Bristol, DW ;
Wachsman, JT ;
Greenwell, A .
ENVIRONMENTAL HEALTH PERSPECTIVES, 1996, 104 :1001-1010
[4]  
*CORINA, 2000, FAST EFF GEN HIGH QU
[5]   Data mining and knowledge discovery in databases [J].
Fayyad, U ;
Uthurusamy, R .
COMMUNICATIONS OF THE ACM, 1996, 39 (11) :24-26
[6]  
Gasteiger J., 1990, Tetrahedron ComputerMethodology, V3, P537, DOI DOI 10.1016/0898-5529(90)90156-3
[7]  
Gasteiger J., PHYS PROPERTY PREDIC, DOI [10.1007/978-3-642-74140-1_11, DOI 10.1007/978-3-642-74140-1_11]
[8]  
GOLD LS, 1997, HDB CARCINOGENIC POT
[9]  
GOTTMANN E, UNPUB
[10]  
Grant E., 1996, Statistical Quality Control