Model validation software for classification models using repeated partitioning: MVREP

被引:3
作者
Li, W
Arena, VC
Sussman, NB
Mazumdar, S
机构
[1] Univ Pittsburgh, Dept Biostat, Pittsburgh, PA 15261 USA
[2] Univ Pittsburgh, Dept Environm & Occupat Hlth, Pittsburgh, PA 15261 USA
关键词
classification models; model validation; prediction error; repartitioning;
D O I
10.1016/S0169-2607(02)00119-0
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The process of assessing the prediction ability of a computational model is called model validation. For models predicting a categorical response, the prediction ability is usually quantified by prediction measures such as sensitivity, specificity, and accuracy. This paper presents a software Model Validation using Repeated Partitioning (MVREP) that implements a computer-intensive, nonparametric approach to model validation, which we call the re-partitioning method. MVREP, developed using the SAS Macro language, repeats the process of randomly partitioning a dataset and subsequently performing standard model validation procedures, such as cross-validation, a large number of times and generates the empirical sampling distributions of prediction measures. The means of the sampling distributions serve as the point estimates of prediction measures of the model. The variances of the sampling distributions provide a direct assessment of variability for the point estimates of prediction measures. An example is presented using a mouse developmental toxicity chemical dataset to illustrate how the software can be used for the assessment of structure-activity relationships models. (C) 2002 Elsevier Science Ireland Ltd. All rights reserved.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 21 条
[1]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267, DOI [DOI 10.1007/978-1-4612-1694-0_15, 10.1007/978-1-4612-1694-0_15]
[2]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[3]  
ARENA VC, 2000, 34 U PITTSB DEP BIOS
[4]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[5]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[6]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[7]   AN INVIVO TERATOLOGY SCREEN UTILIZING PREGNANT MICE [J].
CHERNOFF, N ;
KAVLOCK, RJ .
JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH, 1982, 10 (4-5) :541-550
[8]  
CLAYCAMP HG, 1999, SS9901 AAAI
[9]  
DuMouchel W, 1994, 27 NAT I STAT SCI
[10]   Improvements on cross-validation: The .632+ bootstrap method [J].
Efron, B ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :548-560