Inference of Population Structure using Dense Haplotype Data

被引:782
作者
Lawson, Daniel John [1 ]
Hellenthal, Garrett [2 ]
Myers, Simon [3 ]
Falush, Daniel [4 ,5 ]
机构
[1] Univ Bristol, Dept Math, Bristol BS8 1TW, Avon, England
[2] Wellcome Trust Ctr Human Genet, Oxford, England
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Univ Coll Cork, Environm Res Inst, Cork, Ireland
[5] Max Planck Inst Evolutionary Anthropol, Leipzig, Germany
基金
英国惠康基金;
关键词
MULTILOCUS GENOTYPE DATA; LINKAGE DISEQUILIBRIUM; PRINCIPAL-COMPONENTS; STATISTICAL-MODEL; GENETIC-STRUCTURE; HUMAN GENOME; ANCESTRY; HISTORY; RECOMBINATION; INDIVIDUALS;
D O I
10.1371/journal.pgen.1002453
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.
引用
收藏
页数:16
相关论文
共 51 条
[1]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   THE INQUIRY INTO THE HISTORY OF THE HAZARA MONGOLS OF AFGHANISTAN [J].
Bacon, Elizabeth E. .
SOUTHWESTERN JOURNAL OF ANTHROPOLOGY, 1951, 7 (03) :230-265
[4]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[5]   Population Structure With Localized Haplotype Clusters [J].
Browning, Sharon R. ;
Weir, Bruce S. .
GENETICS, 2010, 185 (04) :1337-U302
[6]  
Cardin N, 2007, THESIS U OXFORD
[7]   A worldwide survey of haplotype variation and linkage disequilibrium in the human genome [J].
Conrad, Donald F. ;
Jakobsson, Mattias ;
Coop, Graham ;
Wen, Xiaoquan ;
Wall, Jeffrey D. ;
Rosenberg, Noah A. ;
Pritchard, Jonathan K. .
NATURE GENETICS, 2006, 38 (11) :1251-1260
[8]  
Corander J, 2003, GENETICS, V163, P367
[9]   A Bayesian approach to the identification of panmictic populations and the assignment of individuals [J].
Dawson, KJ ;
Belkhir, K .
GENETICAL RESEARCH, 2001, 78 (01) :59-77
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38