Bias arising from missing data in predictive models

被引:60
作者
Gorelick, Marc H.
机构
[1] Med Coll Wisconsin, Dept Pediat, Sect Emergency Med, Milwaukee, WI 53226 USA
[2] Med Coll Wisconsin, Dept Epidemiol, Milwaukee, WI 53226 USA
关键词
bias; logistic models; Monte Carlo method; forecasting; risk assessment; missing data;
D O I
10.1016/j.jclinepi.2004.11.029
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: The purpose of this study is to determine the effect of three common approaches to handling missing data on the results of a predictive model. Study Design and Setting: Monte Carlo simulation study using simulated data was used. A baseline logistic regression using complete data was performed to predict hospital admission, based on the white blood cell count (WBC) (dichotomized as normal or high), presence of fever, or procedures performed (PROC). A series of simulations was then performed in which WBC data were deleted for varying proportions (15-85%) of patients under various patterns of missingness. Three analytic approaches were used: analysis restricted to cases with complete data, missing data assumed to be normal (MAN), and use of imputed values. Results: In the baseline analysis, all three predictors were all significantly associated with admission. Using either the MAN approach or imputation, the odds ratio (OR) for WBC was substantially over- or underestimated depending on the missingness pattern, and there was considerable bias toward the null in the OR estimates for fever. In the CC analyses, OR for WBC was consistently biased toward the null, OR for PROC was biased away from the null, and the OR for fever was biased toward or away from the null. Estimates for overall model discrimination were substantially biased using all analytic approaches. Conclusions: All three methods of handling large amounts of missing data can lead to biased estimates of the OR and of model performance in predictive models. Predictor variables that are measured inconsistently can affect the validity of such models. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:1115 / 1123
页数:9
相关论文
共 13 条
[1]   The pediatric risk of hospital admission score: A second-generation severity-of-illness score for pediatric emergency patients [J].
Chamberlain, JM ;
Patel, KM ;
Pollack, MM .
PEDIATRICS, 2005, 115 (02) :388-395
[2]   Pediatric risk of admission (PRISA): A measure of severity of illness for assessing the risk of hospitalization from the emergency department [J].
Chamberlain, JM ;
Patel, KM ;
Ruttimann, UE ;
Pollack, MM .
ANNALS OF EMERGENCY MEDICINE, 1998, 32 (02) :161-169
[3]   Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses [J].
Faris, PD ;
Ghali, WA ;
Brant, R ;
Norris, CM ;
Galbraith, PD ;
Knudtson, ML .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2002, 55 (02) :184-191
[4]   Pediatric emergency assessment tool (PEAT): A risk-adjustment measure for pediatric emergency patients [J].
Gorelick, LH ;
Lee, C ;
Cronan, K ;
Kost, S ;
Palmer, K .
ACADEMIC EMERGENCY MEDICINE, 2001, 8 (02) :156-162
[5]  
Iezzoni LI., 1997, RISK ADJUSTMENT MEAS, V2nd
[6]  
JAFFE DM, 1991, PEDIATRICS, V87, P670
[7]   THE APACHE-III PROGNOSTIC SYSTEM - RISK PREDICTION OF HOSPITAL MORTALITY FOR CRITICALLY ILL HOSPITALIZED ADULTS [J].
KNAUS, WA ;
WAGNER, DP ;
DRAPER, EA ;
ZIMMERMAN, JE ;
BERGNER, M ;
BASTOS, PG ;
SIRIO, CA ;
MURPHY, DJ ;
LOTRING, T ;
DAMIANO, A ;
HARRELL, FE .
CHEST, 1991, 100 (06) :1619-1636
[8]  
Little R.J., 1987, Statistical Analysis With Missing Data
[9]   REGRESSION WITH MISSING XS - A REVIEW [J].
LITTLE, RJA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (420) :1227-1237
[10]   Dealing with missing data in observational health care outcome analyses [J].
Norris, CM ;
Ghali, WA ;
Knudtson, ML ;
Naylor, CD ;
Saunders, LD .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2000, 53 (04) :377-383