A Genealogical Interpretation of Principal Components Analysis

被引:383
作者
McVean, Gil [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
来源
PLOS GENETICS | 2009年 / 5卷 / 10期
关键词
SIMULATION;
D O I
10.1371/journal.pgen.1000686
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's F-ST and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.
引用
收藏
页数:10
相关论文
共 18 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices [J].
Baik, J ;
Ben Arous, G ;
Péché, S .
ANNALS OF PROBABILITY, 2005, 33 (05) :1643-1697
[3]   INDO-EUROPEAN ORIGINS - A COMPUTER-SIMULATION TEST OF 5 HYPOTHESES [J].
BARBUJANI, G ;
SOKAL, RR ;
ODEN, NL .
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 1995, 96 (02) :109-132
[4]  
Cavalli-Sforza L.L., 1994, HIST GEOGRAPHY HUMAN
[5]   Y genetic data support the Neolithic demic diffusion model [J].
Chikhi, L ;
Nichols, RA ;
Barbujani, G ;
Beaumont, MA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (17) :11008-11013
[6]   The effect of the Neolithic expansion on European molecular diversity [J].
Currat, M ;
Excoffier, L .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2005, 272 (1564) :679-688
[7]  
Fix AG, 1997, HUM BIOL, V69, P663
[8]   The fate of mutations surfing on the wave of a range expansion [J].
Klopfstein, S ;
Currat, M ;
Excoffier, L .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (03) :482-490
[9]  
McVean GAT, 2002, GENETICS, V162, P987
[10]   Genes mirror geography within Europe [J].
Novembre, John ;
Johnson, Toby ;
Bryc, Katarzyna ;
Kutalik, Zoltan ;
Boyko, Adam R. ;
Auton, Adam ;
Indap, Amit ;
King, Karen S. ;
Bergmann, Sven ;
Nelson, Matthew R. ;
Stephens, Matthew ;
Bustamante, Carlos D. .
NATURE, 2008, 456 (7218) :98-U5