A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis

被引:75
作者
Logsdon, Benjamin A. [1 ]
Hoffman, Gabriel E. [1 ]
Mezey, Jason G. [1 ,2 ]
机构
[1] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14850 USA
[2] Weill Cornell Med Coll, Dept Med Genet, New York, NY USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
QUANTITATIVE TRAIT LOCI; COMPLEX TRAITS; LASSO; INFERENCE; SELECTION;
D O I
10.1186/1471-2105-11-58
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability. Results: V-Bay provides a novel solution to the computational scaling constraints of most multiple locus methods and can complete a simultaneous analysis of a million genetic markers in a few hours, when using a desktop. Using a range of simulated genetic and GWA experimental scenarios, we demonstrate that V-Bay is highly accurate, and reliably identifies associations that are too weak to be discovered by single-marker testing approaches. V-Bay can also outperform a multiple locus analysis method based on the lasso, which has similar scaling properties for large numbers of genetic markers. For demonstration purposes, we also use V-Bay to confirm associations with gene expression in cell lines derived from the Phase II individuals of HapMap. Conclusions: V-Bay is a versatile, fast, and accurate multiple locus GWA analysis tool for the practitioner interested in identifying weaker associations without high false positive rates.
引用
收藏
页数:13
相关论文
共 37 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]  
[Anonymous], 2006, Pattern recognition and machine learning
[3]  
Beal M. J., 2003, PhD Thesis, P544
[4]   Variational Inference for Dirichlet Process Mixtures [J].
Blei, David M. ;
Jordan, Michael I. .
BAYESIAN ANALYSIS, 2006, 1 (01) :121-143
[5]  
Boyd S., 2004, Convex Opimization
[6]   Fast and flexible simulation of DNA sequence data [J].
Chen, Gary K. ;
Marjoram, Paul ;
Wall, Jeffrey D. .
GENOME RESEARCH, 2009, 19 (01) :136-142
[7]   Mapping complex disease traits with global gene expression [J].
Cookson, William ;
Liang, Liming ;
Abecasis, Goncalo ;
Moffatt, Miriam ;
Lathrop, Mark .
NATURE REVIEWS GENETICS, 2009, 10 (03) :184-194
[8]   A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data:: Application to HLA in type 1 diabetes [J].
Cordell, HJ ;
Clayton, DG .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (01) :124-141
[9]   Progress and challenges in genome-wide association studies in humans [J].
Donnelly, Peter .
NATURE, 2008, 456 (7223) :728-731
[10]   Two-stage two-locus models in genome-wide association [J].
Evans, David M. ;
Marchini, Jonathan ;
Morris, Andrew P. ;
Cardon, Lon R. .
PLOS GENETICS, 2006, 2 (09) :1424-1432