Tracing Sub-Structure in the European American Population with PCA-Informative Markers

被引:53
作者
Paschou, Peristera [1 ]
Drineas, Petros [2 ]
Lewis, Jamey [2 ]
Nievergelt, Caroline M. [3 ,4 ]
Nickerson, Deborah A. [5 ]
Smith, Joshua D. [5 ]
Ridker, Paul M. [6 ,7 ]
Chasman, Daniel I. [7 ]
Krauss, Ronald M. [8 ]
Ziv, Elad [9 ,10 ]
机构
[1] Democritus Univ Thrace, Dept Mol Biol & Genet, Alexandroupolis, Greece
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[3] Scripps Res Inst, Dept Mol & Expt Med, La Jolla, CA USA
[4] Univ Calif San Diego, Dept Psychiat, La Jolla, CA 92093 USA
[5] Univ Washington, Dept Genome Sci, Seattle, WA USA
[6] Brigham & Womens Hosp, Div Cardiovasc Dis, Ctr Cardiovasc Dis Prevent, Boston, MA 02115 USA
[7] Brigham & Womens Hosp, Div Prevent Med, Boston, MA 02115 USA
[8] Childrens Hosp Oakland, Res Inst, Oakland, CA 94609 USA
[9] Univ Calif San Francisco, Inst Human Genet, Div Gen Internal Med, San Francisco, CA 94143 USA
[10] Univ Calif San Francisco, Ctr Comprehens Canc, San Francisco, CA 94143 USA
来源
PLOS GENETICS | 2008年 / 4卷 / 07期
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
10.1371/journal.pgen.1000114
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals-307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150-200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.
引用
收藏
页数:13
相关论文
共 60 条
[1]   Effect of statin therapy on C-reactive protein levels - The Pravastatin Inflammation/CRP Evaluation (PRINCE): A randomized trial and cohort study [J].
Albert, MA ;
Danielson, E ;
Rifai, N ;
Ridker, PM .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2001, 286 (01) :64-70
[2]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[3]  
[Anonymous], 2004, ANCESTRY 2000 CENSUS
[4]   Measuring European population stratification with microarray genotype data [J].
Bauchet, Marc ;
McEvoy, Brian ;
Pearson, Laurel N. ;
Quillen, Ellen E. ;
Sarkisian, Tamara ;
Hovhannesyan, Kristine ;
Deka, Ranjan ;
Bradley, Daniel G. ;
Shriver, Mark D. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (05) :948-956
[5]   Demonstrating stratification in a European American population [J].
Campbell, CD ;
Ogburn, EL ;
Lunetta, KL ;
Lyon, HN ;
Freedman, ML ;
Groop, LC ;
Altshuler, D ;
Ardlie, KG ;
Hirschhorn, JN .
NATURE GENETICS, 2005, 37 (08) :868-872
[6]  
Carter S.B., 2006, HIST STAT US EARLIES
[7]  
Cavalli-Sforza L.L., 1994, HIST GEOGRAPHY HUMAN
[8]   Simultaneously correcting for population stratification and for genotyping error in case-control association studies [J].
Cheng, K. F. ;
Lin, W. J. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (04) :726-743
[9]   Y genetic data support the Neolithic demic diffusion model [J].
Chikhi, L ;
Nichols, RA ;
Barbujani, G ;
Beaumont, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (17) :11008-11013
[10]   Ethnic-difference markers for use in mapping by admixture linkage disequilibrium [J].
Collins-Schramm, HE ;
Phillips, CM ;
Operario, DJ ;
Lee, JS ;
Weber, JL ;
Hanson, RL ;
Knowler, WC ;
Cooper, R ;
Li, HZ ;
Seldin, MF .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (03) :737-750