Inference from genome-wide association studies using a novel Markov model

被引:5
作者
Hosking, Fay J. [1 ]
Sterne, Jonathan A. C. [2 ]
Smith, George Davey [2 ]
Green, Peter J. [1 ]
机构
[1] Univ Bristol, Dept Math, Bristol BS8 4PB, Avon, England
[2] Univ Bristol, Dept Social Med, Bristol BS8 4PB, Avon, England
基金
英国工程与自然科学研究理事会; 英国医学研究理事会;
关键词
Markov chain Monte Carlo; hidden Markov model; logistic regression; SNP-base population association study;
D O I
10.1002/gepi.20322
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
In this paper we propose a Bayesian modeling approach to the analysis of genome-wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k-means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis-Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples,where single SNP analysis is both successful and unsuccessful at identifying the causal SNP.
引用
收藏
页码:497 / 504
页数:8
相关论文
共 22 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   General methods for monitoring convergence of iterative simulations [J].
Brooks, SP ;
Gelman, A .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 1998, 7 (04) :434-455
[3]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[5]   Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies [J].
Dudbridge, F ;
Koeleman, BPC .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 75 (03) :424-435
[6]  
Gelman A., 1992, Statistical Science, V7, DOI [DOI 10.1214/SS/1177011136, 10.1214/ss/1177011136]
[7]   The International HapMap Project [J].
Gibbs, RA ;
Belmont, JW ;
Hardenbol, P ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Ch'ang, LY ;
Huang, W ;
Liu, B ;
Shen, Y ;
Tam, PKH ;
Tsui, LC ;
Waye, MMY ;
Wong, JTF ;
Zeng, CQ ;
Zhang, QR ;
Chee, MS ;
Galver, LM ;
Kruglyak, S ;
Murray, SS ;
Oliphant, AR ;
Montpetit, A ;
Hudson, TJ ;
Chagnon, F ;
Ferretti, V ;
Leboeuf, M ;
Phillips, MS ;
Verner, A ;
Kwok, PY ;
Duan, SH ;
Lind, DL ;
Miller, RD ;
Rice, JP ;
Saccone, NL ;
Taillon-Miller, P ;
Xiao, M ;
Nakamura, Y ;
Sekine, A ;
Sorimachi, K ;
Tanaka, T ;
Tanaka, Y ;
Tsunoda, T ;
Yoshino, E ;
Bentley, DR ;
Deloukas, P ;
Hunt, S ;
Powell, D ;
Altshuler, D ;
Gabriel, SB ;
Qiu, RZ .
NATURE, 2003, 426 (6968) :789-796
[8]  
HASTINGS WK, 1970, BIOMETRIKA, V57, P97, DOI 10.1093/biomet/57.1.97
[9]   Linkage disequilibrium mapping identifies a 390 kb region associated with CYP2D6 poor drug metabolising activity [J].
Hosking L.K. ;
Boyd P.R. ;
Xu C.F. ;
Nissum M. ;
Cantone K. ;
Purvis I.J. ;
Khakhar R. ;
Barnes M.R. ;
Liberwirth U. ;
Hagen-Mann K. ;
Ehm M.G. ;
Riley J.H. .
The Pharmacogenomics Journal, 2002, 2 (3) :165-175
[10]   A block-free hidden Markov model for genotypes and its application to disease association [J].
Kimmel, G ;
Shamir, R .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (10) :1243-1260