Leveraging Hierarchical Population Structure in Discrete Association Studies

被引:30
作者
Carlson, Jonathan [1 ,2 ]
Kadie, Carl [2 ]
Mallal, Simon [3 ]
Heckerman, David [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[2] Microsoft Res, Machine Learning & Appl Stat Grp, Redmond, WA USA
[3] Royal Perth Hosp, Ctr Clin Immunol & Biomed Stat, Perth, WA, Australia
来源
PLOS ONE | 2007年 / 2卷 / 07期
关键词
D O I
10.1371/journal.pone.0000591
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with hierarchical population structure and identify two distinct confounding processes, which we call coevolution and conditional influence. We describe these processes in terms of generative models and show that these generative models can be used to correct for the confounding effects. Finally, we apply the models to three applications: identification of escape mutations in HIV-1 in response to specific HLA-mediated immune pressure, prediction of coevolving residues in an HIV-1 peptide, and a search for genotypes that are associated with bacterial resistance traits in Arabidopsis thaliana. We show that coevolution is a better description of confounding in some applications and conditional influence is better in others. That is, we show that no single method is best for addressing all forms of confounding. Analysis tools based on these models are available on the internet as both web based applications and downloadable source code at http://atom.research.microsoft.com/bio/phylod.aspx.
引用
收藏
页数:13
相关论文
共 55 条
[1]   Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes [J].
Aranzana, MJ ;
Kim, S ;
Zhao, KY ;
Bakker, E ;
Horton, M ;
Jakob, K ;
Lister, C ;
Molitor, J ;
Shindo, C ;
Tang, CL ;
Toomajian, C ;
Traw, B ;
Zheng, HG ;
Bergelson, J ;
Dean, C ;
Marjoram, P ;
Nordborg, M .
PLOS GENETICS, 2005, 1 (05) :531-539
[2]   Evidence for allelic association on chromosome 3q25-27 in families with autism spectrum disorders originating from a subisolate of Finland [J].
Auranen, M ;
Varilo, T ;
Alen, R ;
Vanhala, R ;
Ayers, K ;
Kempas, E ;
Ylisaukko-oja, T ;
Peltonen, L ;
Järvelä, I .
MOLECULAR PSYCHIATRY, 2003, 8 (10) :879-884
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]  
BHATTACHARYA T, SCIENCE IN PRESS
[5]   Modeling residue usage in aligned protein sequences via maximum likelihood [J].
Bruno, WJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (10) :1368-1374
[6]   Networks of coevolving sites in structural and functional domains of serpin proteins [J].
Buck, MJ ;
Atchley, WR .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (07) :1627-1634
[7]   Demonstrating stratification in a European American population [J].
Campbell, CD ;
Ogburn, EL ;
Lunetta, KL ;
Lyon, HN ;
Freedman, ML ;
Groop, LC ;
Altshuler, D ;
Ardlie, KG ;
Hirschhorn, JN .
NATURE GENETICS, 2005, 37 (08) :868-872
[8]   A RANDOM EFFECTS MODEL FOR BINARY DATA [J].
CONAWAY, MR .
BIOMETRICS, 1990, 46 (02) :317-328
[9]   Plant pathogens and integrated defence responses to infection [J].
Dangl, JL ;
Jones, JDG .
NATURE, 2001, 411 (6839) :826-833
[10]   Genomic control for association studies [J].
Devlin, B ;
Roeder, K .
BIOMETRICS, 1999, 55 (04) :997-1004