Tracing Sub-Structure in the European American Population with PCA-Informative Markers

被引:53
作者
Paschou, Peristera [1 ]
Drineas, Petros [2 ]
Lewis, Jamey [2 ]
Nievergelt, Caroline M. [3 ,4 ]
Nickerson, Deborah A. [5 ]
Smith, Joshua D. [5 ]
Ridker, Paul M. [6 ,7 ]
Chasman, Daniel I. [7 ]
Krauss, Ronald M. [8 ]
Ziv, Elad [9 ,10 ]
机构
[1] Democritus Univ Thrace, Dept Mol Biol & Genet, Alexandroupolis, Greece
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[3] Scripps Res Inst, Dept Mol & Expt Med, La Jolla, CA USA
[4] Univ Calif San Diego, Dept Psychiat, La Jolla, CA 92093 USA
[5] Univ Washington, Dept Genome Sci, Seattle, WA USA
[6] Brigham & Womens Hosp, Div Cardiovasc Dis, Ctr Cardiovasc Dis Prevent, Boston, MA 02115 USA
[7] Brigham & Womens Hosp, Div Prevent Med, Boston, MA 02115 USA
[8] Childrens Hosp Oakland, Res Inst, Oakland, CA 94609 USA
[9] Univ Calif San Francisco, Inst Human Genet, Div Gen Internal Med, San Francisco, CA 94143 USA
[10] Univ Calif San Francisco, Ctr Comprehens Canc, San Francisco, CA 94143 USA
来源
PLOS GENETICS | 2008年 / 4卷 / 07期
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
10.1371/journal.pgen.1000114
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals-307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150-200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.
引用
收藏
页数:13
相关论文
共 60 条
[11]  
DEAN M, 1994, AM J HUM GENET, V55, P788
[12]   Genomic control for association studies [J].
Devlin, B ;
Roeder, K .
BIOMETRICS, 1999, 55 (04) :997-1004
[13]  
DRINEAS P, 2007, SIAM J MATR IN PRESS
[14]   Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition [J].
Drineas, Petros ;
Kannan, Ravi ;
Mahoney, Michael W. .
SIAM JOURNAL ON COMPUTING, 2006, 36 (01) :184-206
[15]   Sampling Algorithms for l2 Regression and Applications [J].
Drineas, Petros ;
Mahoney, Michael W. ;
Muthukrishnan, S. .
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, :1127-+
[16]   A simple and improved correction for population stratification in case-control studies [J].
Epstein, Michael P. ;
Allen, Andrew S. ;
Satten, Glen A. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (05) :921-930
[17]  
Feldman GE, 2001, ISR MED ASSOC J, V3, P341
[18]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[19]   Assessing the impact of population stratification on genetic association studies [J].
Freedman, ML ;
Reich, D ;
Penney, KL ;
McDonald, GJ ;
Mignault, AA ;
Patterson, N ;
Gabriel, SB ;
Topol, EJ ;
Smoller, JW ;
Pato, CN ;
Pato, MT ;
Petryshen, TYL ;
Kolonel, LN ;
Lander, ES ;
Sklar, P ;
Henderson, B ;
Hirschhorn, JN ;
Altshuler, D .
NATURE GENETICS, 2004, 36 (04) :388-393
[20]   Genome-wide genotyping in Parkinson's disease and neurologically normal controls:: first stage analysis and public release of data [J].
Fung, Hon-Chung ;
Scholz, Sonja ;
Matarin, Mar ;
Simon-Sanchez, Javier ;
Hernandez, Dena ;
Britton, Angela ;
Gibbs, J. Raphael ;
Langefeld, Carl ;
Stiegert, Matt L. ;
Schymick, Jennifer ;
Okun, Michael S. ;
Mandel, Ronald J. ;
Fernandez, Hubert H. ;
Foote, Kelly D. ;
Rodriguez, Ramon L. ;
Peckham, Elizabeth ;
De Vrieze, Fabienne Wavrant ;
Gwinn-Hardy, Katrina ;
Hardy, John A. ;
Singleton, Andrew .
LANCET NEUROLOGY, 2006, 5 (11) :911-916