Sequence analysis using logic regression

被引:97
作者
Kooperberg, C [1 ]
Ruczinski, I [1 ]
LeBlanc, ML [1 ]
Hsu, L [1 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Seattle, WA 98109 USA
关键词
adaptive estimation; Boolean combinations; simulated annealing; SNP;
D O I
10.1002/gepi.2001.21.s1.s626
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Logic Regression is a new adaptive regression methodology that attempts to construct predictors as Boolean combinations of (binary) covariates. In this paper we use this algorithm to deal with single-nucleotide polymorphism (SNP) sequence data. The predictors that are found are interpretable as risk factors of the disease. Significance of these risk factors is assessed using techniques like cross-validation, permutation tests, and independent test sets. These model selection techniques remain valid when data is dependent, as is the case for the family data used here. In our analysis of the Genetic Analysis Workshop 12 data we identify the exact locations of mutations on gene I and gene 6 and a number of mutations on gene 2 that are associated with the affected status, without selecting any false positives. (C) 2001 Wiley-Liss, Inc.
引用
收藏
页码:S626 / S631
页数:6
相关论文
共 6 条
[1]  
Aarts E., 1989, Wiley-Interscience Series in Discrete Mathematics and Optimization
[2]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[3]   Bayesian CART model search [J].
Chipman, HA ;
George, EI ;
McCulloch, RE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :935-948
[4]   EXCLUSIVE-OR REPRESENTATIONS OF BOOLEAN FUNCTIONS [J].
FLEISHER, H ;
TAVEL, M ;
YEAGER, J .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1983, 27 (04) :412-416
[5]  
RUCZINZKI I, 2001, LOGIC REGRESSION TEC
[6]  
RUCZINZKI I, 2000, THESIS U WASHINGTON