Identifying Personal Genomes by Surname Inference

被引:710
作者
Gymrek, Melissa [1 ,2 ,3 ,4 ,5 ]
McGuire, Amy L. [6 ]
Golan, David [7 ]
Halperin, Eran [8 ,9 ,10 ]
Erlich, Yaniv [1 ]
机构
[1] Whitehead Inst Biomed Res, Cambridge, MA 02142 USA
[2] MIT, Harvard Massachusetts Inst Technol MIT Div Hlth S, Cambridge, MA 02139 USA
[3] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02142 USA
[4] Massachusetts Gen Hosp, Dept Mol Biol, Boston, MA 02114 USA
[5] Massachusetts Gen Hosp, Diabet Unit, Boston, MA 02114 USA
[6] Baylor Coll Med, Ctr Med Eth & Hlth Policy, Houston, TX 77030 USA
[7] Tel Aviv Univ, Dept Stat & Operat Res, IL-69978 Tel Aviv, Israel
[8] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[9] Tel Aviv Univ, Dept Mol Microbiol & Biotechnol, IL-69978 Tel Aviv, Israel
[10] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
Y-STR LOCI; CHROMOSOME; PRIVACY; FOUNDERS; PROJECT; SCIENCE;
D O I
10.1126/science.1229566
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
引用
收藏
页码:321 / 324
页数:4
相关论文
共 27 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2012, PRIVACY PROGR WHOLE
[3]  
[Anonymous], WASHINGTON POST
[4]   Human genetics - Finding criminals through DNA of their relatives [J].
Bieber, FR ;
Brenner, CH ;
Lazer, D .
SCIENCE, 2006, 312 (5778) :1315-1316
[5]   Assessing and managing risk when sharing aggregate genetic variant data [J].
Craig, David W. ;
Goor, Robert M. ;
Wang, Zhenyuan ;
Paschall, Justin ;
Ostell, Jim ;
Feolo, Michael ;
Sherry, Stephen T. ;
Manolio, Teri A. .
NATURE REVIEWS GENETICS, 2011, 12 (10) :730-736
[6]   Inferential Genotyping of Y Chromosomes in Latter-Day Saints Founders and Comparison to Utah Samples in the HapMap Project [J].
Gitschier, Jane .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) :251-258
[7]   lobSTR: A short tandem repeat profiler for personal genomes [J].
Gymrek, Melissa ;
Golan, David ;
Rosset, Saharon ;
Erlich, Yaniv .
GENOME RESEARCH, 2012, 22 (06) :1154-1162
[8]   Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays [J].
Homer, Nils ;
Szelinger, Szabolcs ;
Redman, Margot ;
Duggan, David ;
Tembe, Waibhav ;
Muehling, Jill ;
Pearson, John V. ;
Stephan, Dietrich A. ;
Nelson, Stanley F. ;
Craig, David W. .
PLOS GENETICS, 2008, 4 (08)
[9]   On Sharing Quantitative Trait GWAS Results in an Era of Multiple-omics Data and the Limits of Genomic Privacy [J].
Im, Hae Kyung ;
Gamazon, Eric R. ;
Nicolae, Dan L. ;
Cox, Nancy J. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 90 (04) :591-598
[10]   A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies [J].
Jacobs, Kevin B. ;
Yeager, Meredith ;
Wacholder, Sholom ;
Craig, David ;
Kraft, Peter ;
Hunter, David J. ;
Paschal, Justin ;
Manolio, Teri A. ;
Tucker, Margaret ;
Hoover, Robert N. ;
Thomas, Gilles D. ;
Chanock, Stephen J. ;
Chatterjee, Nilanjan .
NATURE GENETICS, 2009, 41 (11) :1253-U126