Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets

被引:444
作者
Steyerberg, EW
Eijkemans, MJC
Harrell, FE
Habbema, JDF
机构
[1] Erasmus Univ, Ctr Clin Decis Sci, Dept Publ Hlth, Rotterdam, Netherlands
[2] Univ Virginia, Div Biostat & Epidemiol, Dept Hlth Evaluat Sci, Charlottesville, VA 22903 USA
关键词
regression analysis; logistic models; bias; variable selection; prediction;
D O I
10.1177/0272989X0102100106
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Clinical decision making often requires estimates of the likelihood of a dichotomous outcome in individual patents. When empirical data are available, these estimates may well be obtained from a logistic regression model. Several strategies may be followed in the development of such a model. In this study, the authors compare alternative strategies in 23 small subsamples from a large data set of patients with an acute myocardial infarction, where they developed predictive models for 30-day mortality. Evaluations were performed in an independent part of the data set. Specifically, the authors studied the effect of coding of covariables and stepwise selection on discriminative ability of the resulting model, and the effect of statistical "shrinkage" techniques on calibration. As expected, dichotomization of continuous covariables implied a loss of information. Remarkably, stepwise selection resulted in less discriminating models compared to full models including all available covariables, even when more than half of these were randomly associated with the outcome. Using qualitative information on the sign of the effect of predictors slightly improved the predictive ability. Calibration improved when shrinkage was applied on the standard maximum likelihood estimates of the regression coefficients. In conclusion, a sensible strategy in small data sets is to apply shrinkage methods in full models that include well-coded predictors that are selected based on external information.
引用
收藏
页码:45 / 56
页数:12
相关论文
共 61 条
[1]   BOOTSTRAP INVESTIGATION OF THE STABILITY OF A COX REGRESSION-MODEL [J].
ALTMAN, DG ;
ANDERSEN, PK .
STATISTICS IN MEDICINE, 1989, 8 (07) :771-783
[2]   BETTER SUBSET REGRESSION USING THE NONNEGATIVE GARROTE [J].
BREIMAN, L .
TECHNOMETRICS, 1995, 37 (04) :373-384
[3]   Controlling for continuous confounders in epidemiologic research [J].
Brenner, H ;
Blettner, M .
EPIDEMIOLOGY, 1997, 8 (04) :429-434
[4]   Model selection: An integral part of inference [J].
Buckland, ST ;
Burnham, KP ;
Augustin, NH .
BIOMETRICS, 1997, 53 (02) :603-618
[5]   Selection of thrombolytic therapy for individual patients: Development of a clinical model [J].
Califf, RM ;
Woodlief, LH ;
Harrell, FE ;
Lee, KL ;
White, HD ;
Guerci, A ;
Barbash, GI ;
Simes, RJ ;
Weaver, WDD ;
Simoons, ML ;
Topol, EJ .
AMERICAN HEART JOURNAL, 1997, 133 (06) :630-639
[6]   MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE [J].
CHATFIELD, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 :419-466
[7]   THE BOOTSTRAP AND IDENTIFICATION OF PROGNOSTIC FACTORS VIA COX PROPORTIONAL HAZARDS REGRESSION-MODEL [J].
CHEN, CH ;
GEORGE, SL .
STATISTICS IN MEDICINE, 1985, 4 (01) :39-46
[8]  
COPAS JB, 1983, J R STAT SOC B, V45, P311
[9]  
COX DR, 1958, BIOMETRIKA, V45, P562, DOI 10.1093/biomet/45.3-4.562
[10]   BACKWARD, FORWARD AND STEPWISE AUTOMATED SUBSET-SELECTION ALGORITHMS - FREQUENCY OF OBTAINING AUTHENTIC AND NOISE VARIABLES [J].
DERKSEN, S ;
KESELMAN, HJ .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1992, 45 :265-282