Estimating and interpreting FST: The impact of rare variants

被引:385
作者
Bhatia, Gaurav [1 ,2 ]
Patterson, Nick [2 ]
Sankararaman, Sriram [2 ,3 ]
Price, Alkes L. [2 ,4 ,5 ]
机构
[1] Harvard Massachusetts Inst Technol MIT, Div Hlth Sci & Technol, Cambridge, MA 02139 USA
[2] Broad Inst Harvard & MIT, Cambridge, MA 02142 USA
[3] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[4] Harvard Univ, Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[5] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
关键词
GEOGRAPHICALLY STRUCTURED POPULATIONS; ASCERTAINMENT BIASES; GENETIC DIVERSITY; FIXATION INDEXES; DIFFERENTIATION; POLYMORPHISM; COEFFICIENTS; STATISTICS; DIVERGENCE; DISTANCE;
D O I
10.1101/gr.154831.113
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In a pair of seminal papers, Sewall Wright and Gustave Malecot introduced F-ST as a measure of structure in natural populations. In the decades that followed, a number of papers provided differing definitions, estimation methods, and interpretations beyond Wright's. While this diversity in methods has enabled many studies in genetics, it has also introduced confusion regarding how to estimate F-ST from available data. Considering this confusion, wide variation in published estimates of F-ST for pairs of HapMap populations is a cause for concern. These estimates changed-in some cases more than twofold-when comparing estimates from genotyping arrays to those from sequence data. Indeed, changes in F-ST from sequencing data might be expected due to population genetic factors affecting rare variants. While rare variants do influence the result, we show that this is largely through differences in estimation methods. Correcting for this yields estimates of F-ST that are much more concordant between sequence and genotype data. These differences relate to three specific issues: (1) estimating F-ST for a single SNP, (2) combining estimates of F-ST across multiple SNPs, and (3) selecting the set of SNPs used in the computation. Changes in each of these aspects of estimation may result in F-ST estimates that are highly divergent from one another. Here, we clarify these issues and propose solutions.
引用
收藏
页码:1514 / 1521
页数:8
相关论文
共 40 条
[1]   Ascertainment Biases in SNP Chips Affect Measures of Population Divergence [J].
Albrechtsen, Anders ;
Nielsen, Finn Cilius ;
Nielsen, Rasmus .
MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (11) :2534-2547
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[4]   Likelihood-based inference for genetic correlation coefficients [J].
Balding, DJ .
THEORETICAL POPULATION BIOLOGY, 2003, 63 (03) :221-230
[5]   A METHOD FOR QUANTIFYING DIFFERENTIATION BETWEEN POPULATIONS AT MULTI-ALLELIC LOCI AND ITS IMPLICATIONS FOR INVESTIGATING IDENTITY AND PATERNITY [J].
BALDING, DJ ;
NICHOLS, RA .
GENETICA, 1995, 96 (1-2) :3-12
[6]   Natural selection has driven population differentiation in modern humans [J].
Barreiro, Luis B. ;
Laval, Guillaume ;
Quach, Helene ;
Patin, Etienne ;
Quintana-Murci, Lluis .
NATURE GENETICS, 2008, 40 (03) :340-345
[7]  
Cavalli-Sforza LL, 1971, GENETICS HUMAN POPUL
[8]   Ascertainment bias in studies of human genome-wide polymorphism [J].
Clark, AG ;
Hubisz, MJ ;
Bustamante, CD ;
Williamson, SH ;
Nielsen, R .
GENOME RESEARCH, 2005, 15 (11) :1496-1502
[9]  
COCKERHAM CC, 1969, EVOLUTION, V23, P72, DOI 10.1111/j.1558-5646.1969.tb03496.x
[10]   Population differentiation and restricted gene flow in Spanish crossbills: not isolation-by-distance but isolation-by-ecology [J].
Edelaar, P. ;
Alonso, D. ;
Lagerveld, S. ;
Senar, J. C. ;
Bjorklund, M. .
JOURNAL OF EVOLUTIONARY BIOLOGY, 2012, 25 (03) :417-430