Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies

被引:16
作者
Homer, Nils [1 ,2 ]
Tembe, Waibhav D. [1 ]
Szelinger, Szabolcs [1 ]
Redman, Margot [1 ]
Stephan, Dietrich A. [1 ]
Pearson, John V. [1 ]
Nelson, Stanley F. [2 ]
Craig, David [1 ]
机构
[1] Translat Genom Res Inst TGen, Phoenix, AZ 85004 USA
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
关键词
D O I
10.1093/bioinformatics/btn333
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
For many genome-wide association (GWA) studies individually genotyping one million or more SNPs provides a marginal increase in coverage at a substantial cost. Much of the information gained is redundant due to the correlation structure inherent in the human genome. Pooling-based GWA studies could benefit significantly by utilizing this redundancy to reduce noise, improve the accuracy of the observations and increase genomic coverage. We introduce a measure of correlation between individual genotyping and pooling, under the same framework that r(2) provides a measure of linkage disequilibrium (LD) between pairs of SNPs. We then report a new non-haplotype multimarker multi-loci method that leverages the correlation structure between SNPs in the human genome to increase the efficacy of pooling-based GWA studies. We first give a theoretical framework and derivation of our multimarker method. Next, we evaluate simulations using this multimarker approach in comparison to single marker analysis. Finally, we experimentally evaluate our method using different pools of HapMap individuals on the Illumina 450S Duo, Illumina 550K and Affymetrix 5.0 platforms for a combined total of 1 333 631 SNPs. Our results show that use of multimarker analysis reduces noise specific to pooling-based studies, allows for efficient integration of multiple microarray platforms and provides more accurate measures of significance than single marker analysis. Additionally, this approach can be extended to allow for imputing the association significance for SNPs not directly observed using neighboring SNPs in LD. This multimarker method can now be used to cost-effectively complete pooling-based GWA studies with multiple platforms across over one million SNPs and to impute neighboring SNPs weighted for the loss of information due to pooling.
引用
收藏
页码:1896 / 1902
页数:7
相关论文
共 32 条
[1]  
Barratt BJ, 2002, ANN HUM GENET, V66, P393, DOI [10.1046/j.1469-1809.2002.00125.x, 10.1017/S0003480002001252]
[2]  
BROWN KM, 2008, NAT GENET
[3]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[4]   Identification of disease causing loci using an array-based genotyping approach on pooled DNA [J].
Craig, DW ;
Huentelman, MJ ;
Hu-Lince, D ;
Zismann, VL ;
Kruer, MC ;
Lee, AM ;
Puffenberger, EG ;
Pearson, JM ;
Stephan, DA .
BMC GENOMICS, 2005, 6 (1)
[5]   Imputation methods to improve inference in SNP association studies [J].
Dai, James Y. ;
Ruczinski, Ingo ;
LeBlanc, Michael ;
Kooperberg, Charles .
GENETIC EPIDEMIOLOGY, 2006, 30 (08) :690-702
[6]  
HANSON RL, 2007, DIABETES IN PRESS
[7]  
Hinds David A., 2004, Human Genomics, V1, P421
[8]   Cheap, accurate and rapid allele frequency estimation of single nucleotide polymorphisms by primer extension and DHPLC in DNA pools [J].
Hoogendoorn, B ;
Norton, N ;
Kirov, G ;
Williams, N ;
Hamshere, ML ;
Spurlock, G ;
Austin, J ;
Stephens, MK ;
Buckland, PR ;
Owen, MJ ;
O'Donovan, MC .
HUMAN GENETICS, 2000, 107 (05) :488-493
[9]   SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays [J].
Hua, Jianping ;
Craig, David W. ;
Brun, Marcel ;
Webster, Jennifer ;
Zismann, Victoria ;
Tembe, Waibhav ;
Joshipura, Keta ;
Huentelman, Matthew J. ;
Dougherty, Edward R. ;
Stephan, Dietrich A. .
BIOINFORMATICS, 2007, 23 (01) :57-63