SAR modeling of unbalanced data sets

被引:11
作者
Rosenkranz, HS [1 ]
Cunningham, AR [1 ]
机构
[1] Univ Pittsburgh, Grad Sch Publ Hlth, Dept Environm & Occupat Hlth, Pittsburgh, PA 15261 USA
关键词
unbalanced data; SAR; case/multicase; optimum models;
D O I
10.1080/10629360108032916
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The increased acceptance of SAR approaches to hazard identification has led us to investigate methods to improve the predictive performance of SAR models. In the present study we demonstrate that although on theoretical grounds the ratio of active to inactive chemicals in the learning set should be unity, SAR models can "tolerate" an unbalanced range in ratios from 3 : 1 (i,e., 75% actives) to 1 : 2 (i.e., 33% actives) and still perform adequately. On the other hand SAR models derived from learning sets with ratios in excess of 4 : 1 (80% actives), even when corrected for the initial ratio do not perform satisfactorily.
引用
收藏
页码:267 / 274
页数:8
相关论文
共 21 条
[1]  
BALLS M, 1990, ATLA-ALTERN LAB ANIM, V18, P313
[2]   THE CARCINOGENICITY PREDICTION AND BATTERY SELECTION (CPBS) METHOD - A BAYESIAN-APPROACH [J].
CHANKONG, V ;
HAIMES, YY ;
ROSENKRANZ, HS ;
PETEDWARDS, J .
MUTATION RESEARCH, 1985, 153 (03) :135-166
[3]   REPRODUCIBILITY OF MICROBIAL MUTAGENICITY ASSAYS .1. TESTS WITH SALMONELLA-TYPHIMURIUM AND ESCHERICHIA-COLI USING A STANDARDIZED PROTOCOL [J].
DUNKEL, VC ;
ZEIGER, E ;
BRUSICK, D ;
MCCOY, E ;
MCGREGOR, D ;
MORTELMANS, K ;
ROSENKRANZ, HS ;
SIMMON, VF .
ENVIRONMENTAL MUTAGENESIS, 1984, 6 :1-251
[4]   REPRODUCIBILITY OF MICROBIAL MUTAGENICITY ASSAYS .2. TESTING OF CARCINOGENS AND NONCARCINOGENS IN SALMONELLA-TYPHIMURIUM AND ESCHERICHIA-COLI [J].
DUNKEL, VC ;
ZEIGER, E ;
BRUSICK, D ;
MCCOY, E ;
MCGREGOR, D ;
MORTELMANS, K ;
ROSENKRANZ, HS ;
SIMMON, VF .
ENVIRONMENTAL MUTAGENESIS, 1985, 7 :1-248
[5]   TOXICITY ESTIMATION BY CHEMICAL SUBSTRUCTURE-ANALYSIS - THE TOX-II PROGRAM [J].
KLOPMAN, G ;
ROSENKRANZ, HS .
TOXICOLOGY LETTERS, 1995, 79 (1-3) :145-155
[7]   APPROACHES TO SAR IN CARCINOGENESIS AND MUTAGENESIS - PREDICTION OF CARCINOGENICITY/MUTAGENICITY USING MULTI-CASE [J].
KLOPMAN, G ;
ROSENKRANZ, HS .
MUTATION RESEARCH, 1994, 305 (01) :33-46
[8]   A HIERARCHICAL COMPUTER AUTOMATED STRUCTURE EVALUATION PROGRAM .1. [J].
KLOPMAN, G .
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1992, 11 (02) :176-184
[9]   Structure-activity and mechanistic relationships: The effect of chemical overlap on structural overlap in data bases of varying size and composition [J].
Liu, M ;
Sussman, N ;
Klopman, G ;
Rosenkranz, HS .
MUTATION RESEARCH-FUNDAMENTAL AND MOLECULAR MECHANISMS OF MUTAGENESIS, 1996, 372 (01) :79-85
[10]   Estimation of the optimal data base size for structure-activity analyses: The Salmonella mutagenicity data base [J].
Liu, M ;
Sussman, N ;
Klopman, G ;
Rosenkranz, HS .
MUTATION RESEARCH-FUNDAMENTAL AND MOLECULAR MECHANISMS OF MUTAGENESIS, 1996, 358 (01) :63-72