Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies

被引:153
作者
Carbonetto, Peter [1 ]
Stephens, Matthew [1 ,2 ]
机构
[1] Univ Chicago, Dept Human Genet, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
来源
BAYESIAN ANALYSIS | 2012年 / 7卷 / 01期
关键词
variable selection; variational inference; genetic association studies; Monte Carlo; GENOME-WIDE ASSOCIATION; MODEL; SHRINKAGE; LASSO;
D O I
10.1214/12-BA703
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Bayesian approach to variable selection in regression is a powerful tool for tackling many scientific problems. Inference for variable selection models is usually implemented using Markov chain Monte Carlo (MCMC). Because MCMC can impose a high computational cost in studies with a large number of variables, we assess an alternative to MCMC based on a simple variational approximation. Our aim is to retain useful features of Bayesian variable selection at a reduced cost. Using simulations designed to mimic genetic association studies, we show that this simple variational approximation yields posterior inferences in some settings that closely match exact values. In less restrictive (and more realistic) conditions, we show that posterior probabilities of inclusion for individual variables are often incorrect, but variational estimates of other useful quantities including posterior distributions of the hyperparameters are remarkably accurate. We illustrate how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.
引用
收藏
页码:73 / 107
页数:35
相关论文
共 51 条
[1]   BAYESIAN-ANALYSIS OF BINARY AND POLYCHOTOMOUS RESPONSE DATA [J].
ALBERT, JH ;
CHIB, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) :669-679
[2]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[3]  
[Anonymous], 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]
[4]   Independent factor analysis [J].
Attias, H .
NEURAL COMPUTATION, 1999, 11 (04) :803-851
[5]   Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease [J].
Barrett, Jeffrey C. ;
Hansoul, Sarah ;
Nicolae, Dan L. ;
Cho, Judy H. ;
Duerr, Richard H. ;
Rioux, John D. ;
Brant, Steven R. ;
Silverberg, Mark S. ;
Taylor, Kent D. ;
Barmada, M. Michael ;
Bitton, Alain ;
Dassopoulos, Themistocles ;
Datta, Lisa Wu ;
Green, Todd ;
Griffiths, Anne M. ;
Kistner, Emily O. ;
Murtha, Michael T. ;
Regueiro, Miguel D. ;
Rotter, Jerome I. ;
Schumm, L. Philip ;
Steinhart, A. Hillary ;
Targan, Stephan R. ;
Xavier, Ramnik J. ;
Libioulle, Cecile ;
Sandor, Cynthia ;
Lathrop, Mark ;
Belaiche, Jacques ;
Dewit, Olivier ;
Gut, Ivo ;
Heath, Simon ;
Laukens, Debby ;
Mni, Myriam ;
Rutgeerts, Paul ;
Van Gossum, Andre ;
Zelenika, Diana ;
Franchimont, Denis ;
Hugot, Jean-Pierre ;
de Vos, Martine ;
Vermeire, Severine ;
Louis, Edouard ;
Cardon, Lon R. ;
Anderson, Carl A. ;
Drummond, Hazel ;
Nimmo, Elaine ;
Ahmad, Tariq ;
Prescott, Natalie J. ;
Onnie, Clive M. ;
Fisher, Sheila A. ;
Marchini, Jonathan ;
Ghori, Jilur .
NATURE GENETICS, 2008, 40 (08) :955-962
[6]  
Berger J.O., 1985, Statistical decision theory and Bayesian analysis, V2nd
[7]  
Bishop C., 2006, PATTERN RECOGN, DOI DOI 10.1117/1.2819119
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   Evolutionary Stochastic Search for Bayesian Model Exploration [J].
Bottolo, Leonard ;
Richardson, Sylvia .
BAYESIAN ANALYSIS, 2010, 5 (03) :583-618
[10]  
Bouchard Guillaume., 2009, Proceedings of the 26th Annual International Conference on Machine Learning, P57