Identifying interacting SNPs using Monte Carlo logic regression

被引:152
作者
Kooperberg, C
Ruczinski, I
机构
[1] Fred Hutchinson Canc Res Ctr, Div Publ Hlth Sci, Seattle, WA 98109 USA
[2] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD USA
关键词
association studies; binary variables; Boolean logic; haplotype;
D O I
10.1002/gepi.20042
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Interactions are frequently at the center of interest in single-nucleotide polymorphism (SNP) association studies. When interacting SNPs are in the same gene or in genes that are close in sequence, such interactions may suggest which haplotypes are associated with a disease. Interactions between unrelated SNPs may suggest genetic pathways. Unfortunately, data sets are often still too small to definitively determine whether interactions between SNPs occur. Also, competing sets of interactions could often be of equal interest. Here we propose Monte Carlo logic regression, an exploratory tool that combines Markov chain Monte Carlo and logic regression, an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates such as SNPs. The goal of Monte Carlo logic regression is to generate a collection of (interactions of) SNPs that may be associated with a disease outcome, and that warrant further investigation. As such, the models that are fitted in the Markov chain are not combined into a single model, as is often done in Bayesian model averaging procedures. Instead, the most frequently occurring patterns in these models are tabulated. The method is applied to a study of heart disease with 779 participants and 89 SNPs. A simulation study is carried out to investigate the performance of the Monte Carlo logic regression approach. (C) 2004 Wiley-Liss, Inc.
引用
收藏
页码:157 / 170
页数:14
相关论文
共 16 条
[1]  
Breiman L., 1998, CLASSIFICATION REGRE
[2]   Automatic Bayesian curve fitting [J].
Denison, DGT ;
Mallick, BK ;
Smith, AFM .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 :333-350
[3]   Reversible jump Markov chain Monte Carlo computation and Bayesian model determination [J].
Green, PJ .
BIOMETRIKA, 1995, 82 (04) :711-732
[4]   Spline adaptation in extended linear models [J].
Hansen, MH ;
Kooperberg, C .
STATISTICAL SCIENCE, 2002, 17 (01) :2-20
[5]   Trimming, weighting, and grouping SNPs in human case-control association studies [J].
Hoh, J ;
Wille, A ;
Ott, J .
GENOME RESEARCH, 2001, 11 (12) :2115-2119
[6]   Mathematical multi-locus approaches to localizing complex human trait genes [J].
Hoh, J ;
Ott, J .
NATURE REVIEWS GENETICS, 2003, 4 (09) :701-709
[7]   Haplotype tagging for the identification of common disease genes [J].
Johnson, GCL ;
Esposito, L ;
Barratt, BJ ;
Smith, AN ;
Heward, J ;
Di Genova, G ;
Ueda, H ;
Cordell, HJ ;
Eaves, IA ;
Dudbridge, F ;
Twells, RCJ ;
Payne, F ;
Hughes, W ;
Nutland, S ;
Stevens, H ;
Carr, P ;
Tuomilehto-Wolf, E ;
Tuomilehto, J ;
Gough, SCL ;
Clayton, DG ;
Todd, JA .
NATURE GENETICS, 2001, 29 (02) :233-237
[8]   Polychotomous regression [J].
Kooperberg, C ;
Bose, S ;
Stone, CJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (437) :117-127
[9]   Sequence analysis using logic regression [J].
Kooperberg, C ;
Ruczinski, I ;
LeBlanc, ML ;
Hsu, L .
GENETIC EPIDEMIOLOGY, 2001, 21 :S626-S631
[10]   A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation [J].
Nelson, MR ;
Kardia, SLR ;
Ferrell, RE ;
Sing, CF .
GENOME RESEARCH, 2001, 11 (03) :458-470