Purposeful selection of variables in logistic regression

被引:2771
作者
Bursac, Zoran [1 ]
Gauss, C. Heath [1 ]
Williams, David Keith [1 ]
Hosmer, David W. [2 ]
机构
[1] Univ Arkansas Med Sci, Biostat, Little Rock, AR 72205 USA
[2] Univ Massachusetts, Biostat, Amherst, MA 01003 USA
来源
SOURCE CODE FOR BIOLOGY AND MEDICINE | 2008年 / 3卷 / 01期
关键词
D O I
10.1186/1751-0473-3-17
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results: We show that the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. Conclusion: If an analyst is in need of an algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.
引用
收藏
页数:8
相关论文
共 12 条
[1]  
Bang-Jensen J., 2004, DISCRETE OPTIM, V1, P121
[2]   Greedy-type resistance of combinatorial problems [J].
Bendall, Gareth ;
Margot, Francois .
DISCRETE OPTIMIZATION, 2006, 3 (04) :288-298
[3]   COMPARISON OF STOPPING RULES IN FORWARD STEPWISE REGRESSION [J].
BENDEL, RB ;
AFIFI, AA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1977, 72 (357) :46-53
[4]  
Flisseeff A., 2003, J MACH LEARN RES
[5]   RECENT CHANGES IN ATTACK AND SURVIVAL RATES OF ACUTE MYOCARDIAL-INFARCTION (1975 THROUGH 1981) - THE WORCESTER HEART-ATTACK STUDY [J].
GOLDBERG, RJ ;
GORE, JM ;
ALPERT, JS ;
DALEN, JE .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1986, 255 (20) :2774-2779
[6]   A COMMUNITY-WIDE PERSPECTIVE OF SEX-DIFFERENCES AND TEMPORAL TRENDS IN THE INCIDENCE AND SURVIVAL RATES AFTER ACUTE MYOCARDIAL-INFARCTION AND OUT-OF-HOSPITAL DEATHS CAUSED BY CORONARY HEART-DISEASE [J].
GOLDBERG, RJ ;
GORAK, EJ ;
YARZEBSKI, J ;
HOSMER, DW ;
DALEN, P ;
GORE, JM ;
ALPERT, JS ;
DALEN, JE .
CIRCULATION, 1993, 87 (06) :1947-1953
[7]   Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the TSP [J].
Gutin, G ;
Yeo, A ;
Zverovich, A .
DISCRETE APPLIED MATHEMATICS, 2002, 117 (1-3) :81-86
[8]  
Hosmer D. W., 1999, APPL SURVIVAL ANAL R
[9]  
Hosmer DW., 2000, APPL LOGISTIC REGRES, DOI DOI 10.1002/0471722146
[10]   THE IMPACT OF CONFOUNDER SELECTION CRITERIA ON EFFECT ESTIMATION [J].
MICKEY, RM ;
GREENLAND, S .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1989, 129 (01) :125-137