Logic regression

被引:233
作者
Ruczinski, I [1 ]
Kooperberg, C [1 ]
LeBlanc, M [1 ]
机构
[1] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
关键词
adaptive model selection; Boolean logic; binary variables; interactions; simulated annealing; SNP data;
D O I
10.1198/1061860032238
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Logic regression is an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates. In many regression problems a model is developed that relates the main effects (the predictors or transformations thereof) to the response, while interactions are usually kept simple (two- to three-way interactions at most). Often, especially when all predictors are binary, the interaction between many predictors may be what causes the differences in response. This issue arises, for example, in the analysis of SNP microarray data or in some data mining problems. In the proposed methodology, given a set of binary predictors we create new predictors such as "X-1, X-2, X-3, and X-4 are true," or "X-5 or X-6 but not X-7 are true." In more specific terms: we try to fit regression models of the form g(E[Y]) = b(0) + b(1)L(1) + (. . .) + b(n)L(n), where L-j is any Boolean expression of the predictors. The L-j and b(j) are estimated simultaneously using a simulated annealing algorithm. This article discusses how to fit logic regression models, how to carry out model selection for these models, and gives some examples.
引用
收藏
页码:475 / 511
页数:37
相关论文
共 67 条
[1]   Shape quantization and recognition with randomized trees [J].
Amit, Y ;
Geman, D .
NEURAL COMPUTATION, 1997, 9 (07) :1545-1588
[2]  
[Anonymous], 1987, SIMULATED ANNEALING
[3]  
ANTHONY M, 2001, SIAM MONOGRAPHS DISC, V8
[4]   Data mining with decision trees and decision rules [J].
Apte, C ;
Weiss, S .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1997, 13 (2-3) :197-210
[5]  
APTE C, 1994, RES DEV INFORM RETRI, P23
[6]  
Bala J., 1991, Proceedings of the First International Workshop on Multistrategy Learning (MSL-91), P316
[7]  
Bayardo R.J., 1999, P 5 ACM SIGKDD INT C, P145, DOI [10.1145/312129.312219, DOI 10.1145/312129.312219]
[8]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[9]  
Breiman L, 1999, RANDOM FORESTS RANDO
[10]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946