Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms

被引:23
作者
Allocco, Dominic J. [1 ]
Song, Qing
Gibbons, Gary H.
Ramoni, Marco F.
Kohane, Isaac S.
机构
[1] Harvard Univ, MIT, Div Hlth Sci & Technol, Childrens Hosp,Informat Program, Boston, MA 02115 USA
[2] Beth Israel Deaconess Med Ctr, Div Cardiol, Boston, MA 02215 USA
[3] Morehouse Sch Med, Cardiovasc Res Inst, Atlanta, GA USA
[4] Harvard Univ, Partners Ctr Genet & Genom, Boston, MA 02115 USA
关键词
D O I
10.1186/1471-2164-8-68
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Recent studies have shown that when individuals are grouped on the basis of genetic similarity, group membership corresponds closely to continental origin. There has been considerable debate about the implications of these findings in the context of larger debates about race and the extent of genetic variation between groups. Some have argued that clustering according to continental origin demonstrates the existence of significant genetic differences between groups and that these differences may have important implications for differences in health and disease. Others argue that clustering according to continental origin requires the use of large amounts of genetic data or specifically chosen markers and is indicative only of very subtle genetic differences that are unlikely to have biomedical significance. Results: We used small numbers of randomly selected single nucleotide polymorphisms (SNPs) from the International HapMap Project to train naive Bayes classifiers for prediction of ancestral continent of origin. Predictive accuracy was tested on two independent data sets. Genetically similar groups should be difficult to distinguish, especially if only a small number of genetic markers are used. The genetic differences between continentally defined groups are sufficiently large that one can accurately predict ancestral continent of origin using only a minute, randomly selected fraction of the genetic variation present in the human genome. Genotype data from only 50 random SNPs was sufficient to predict ancestral continent of origin in our primary test data set with an average accuracy of 95%. Genetic variations informative about ancestry were common and widely distributed throughout the genome. Conclusion: Accurate characterization of ancestry is possible using small numbers of randomly selected SNPs. The results presented here show how investigators conducting genetic association studies can use small numbers of arbitrarily chosen SNPs to identify stratification in study subjects and avoid false positive genotype-phenotype associations. Our findings also demonstrate the extent of variation between continentally defined groups and argue strongly against the contention that genetic differences between groups are too small to have biomedical significance.
引用
收藏
页数:8
相关论文
共 29 条
[1]   The importance of race and ethnic background in biomedical research and clinical practice [J].
Burchard, EG ;
Ziv, E ;
Coyle, N ;
Gomez, SL ;
Tang, H ;
Karter, AJ ;
Mountain, JL ;
Pérez-Stable, EJ ;
Sheppard, D ;
Risch, N .
NEW ENGLAND JOURNAL OF MEDICINE, 2003, 348 (12) :1170-1175
[2]   Classifying humans [J].
Calafell, F .
NATURE GENETICS, 2003, 33 (04) :435-436
[3]   The application of molecular genetic approaches to the study of human evolution [J].
Cavalli-Sforza, LL ;
Feldman, MW .
NATURE GENETICS, 2003, 33 :266-275
[4]   New goals for the US Human Genome Project: 1998-2003 [J].
Collins, FS ;
Patrinos, A ;
Jordan, E ;
Chakravarti, A ;
Gesteland, R ;
Walters, L ;
Fearon, E ;
Hartwelt, L ;
Langley, CH ;
Mathies, RA ;
Olson, M ;
Pawson, AJ ;
Pollard, T ;
Williamson, A ;
Wold, B ;
Buetow, K ;
Branscomb, E ;
Capecchi, M ;
Church, G ;
Garner, H ;
Gibbs, RA ;
Hawkins, T ;
Hodgson, K ;
Knotek, M ;
Meisler, M ;
Rubin, GM ;
Smith, LM ;
Smith, RF ;
Westerfield, M ;
Clayton, EW ;
Fisher, NL ;
Lerman, CE ;
McInerney, JD ;
Nebo, W ;
Press, N ;
Valle, D .
SCIENCE, 1998, 282 (5389) :682-689
[5]   Race and genomics [J].
Cooper, RS ;
Kaufman, JS ;
Ward, R .
NEW ENGLAND JOURNAL OF MEDICINE, 2003, 348 (12) :1166-1170
[6]  
Corander J, 2003, GENETICS, V163, P367
[7]   A Bayesian approach to the identification of panmictic populations and the assignment of individuals [J].
Dawson, KJ ;
Belkhir, K .
GENETICAL RESEARCH, 2001, 78 (01) :59-77
[8]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[9]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[10]   Genetics - FDA races in wrong direction [J].
Haga, SB ;
Venter, JC .
SCIENCE, 2003, 301 (5632) :466-466