An integrated map of genetic variation from 1,092 human genomes

被引:4874
作者
Altshuler, David M. [3 ]
Durbin, Richard M. [5 ]
Abecasis, Goncalo R. [6 ]
Bentley, David R. [7 ]
Chakravarti, Aravinda [8 ]
Clark, Andrew G. [9 ]
Donnelly, Peter [1 ,2 ]
Eichler, Evan E. [10 ,11 ]
Flicek, Paul [12 ]
Gabriel, Stacey B. [3 ]
Gibbs, Richard A. [13 ]
Green, Eric D.
Hurles, Matthew E. [5 ]
Knoppers, Bartha M. [14 ]
Korbel, Jan O. [15 ]
Lander, Eric S.
Lee, Charles [16 ]
Lehrach, Hans [17 ]
Mardis, Elaine R. [18 ]
Marth, Gabor T. [19 ]
McVean, Gil A. [1 ]
Nickerson, Deborah A. [20 ]
Schmidt, Jeanette P. [21 ]
Sherry, Stephen T. [22 ]
Wang, Jun [23 ]
Wilson, Richard K. [18 ]
Gibbs, Richard A. [13 ]
Dinh, Huyen [13 ]
Kovar, Christie [13 ]
Lee, Sandra [13 ]
Lewis, Lora [13 ]
Muzny, Donna [13 ]
Reid, Jeff [13 ]
Wang, Min [13 ]
Wang, Jun [23 ]
Fang, Xiaodong [23 ]
Guo, Xiaosen [23 ]
Jian, Min [23 ]
Jiang, Hui [23 ]
Jin, Xin [23 ]
Li, Guoqing [23 ]
Li, Jingxiang [23 ]
Li, Yingrui [23 ]
Li, Zhuo [23 ]
Liu, Xiao [23 ]
Lu, Yao [23 ]
Ma, Xuedi [23 ]
Su, Zhe [23 ]
Tai, Shuaishuai [23 ]
Tang, Meifang [23 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[3] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[4] Harvard Univ, Sch Med, Dept Genet, Cambridge, MA 02142 USA
[5] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[6] Univ Michigan, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[7] Illumina United Kingdom, Near Saffron Walden CB10 1XL, Essex, England
[8] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
[9] Cornell Univ, Ctr Comparat & Populat Genom, Ithaca, NY 14850 USA
[10] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[11] Howard Hughes Med Inst, Seattle, WA 98195 USA
[12] European Bioinformat Inst, Cambridge CB10 1SD, England
[13] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
[14] McGill Univ, Ctr Genom & Policy, Montreal, PQ H3A 1A4, Canada
[15] European Mol Biol Lab, Genome Biol Res Unit, D-69117 Heidelberg, Germany
[16] Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[17] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[18] Washington Univ, Sch Med, Genome Ctr, St Louis, MO 63108 USA
[19] Boston Coll, Dept Biol, Chestnut Hill, MA 02467 USA
[20] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[21] Affymetrix Inc, Santa Clara, CA 95051 USA
[22] US Natl Inst Hlth, Natl Ctr Biotechnol Informat, Bethesda, MD 20892 USA
[23] BGI Shenzhen, Shenzhen 518083, Peoples R China
[24] Alacris Theranost GmbH, D-14195 Berlin, Germany
[25] Albert Einstein Coll Med, Dept Genet, Bronx, NY 10461 USA
[26] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[27] Mt Sinai Sch Med, Seaver Autism Ctr, New York, NY 10029 USA
[28] Dankook Univ, Dept Nanobiomed Sci, Cheonan 330714, South Korea
[29] Dankook Univ, Dept Biol Sci, Cheonan 330714, South Korea
[30] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
[31] Harvard Univ, Ctr Syst Biol, Cambridge, MA 02138 USA
[32] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
[33] Cardiff Univ, Sch Med, Inst Med Genet, Cardiff CF14 4XN, S Glam, Wales
[34] Illumina Inc, San Diego, CA 92122 USA
[35] Leiden Univ, Med Ctr, Dept Med Stat & Bioinformat, Mol Epidemiol Sect, NL-2333 ZA Leiden, Netherlands
[36] Louisiana State Univ, Dept Biol Sci, Baton Rouge, LA 70803 USA
[37] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[38] Penn State Univ, Dept Anthropol, University Pk, PA 16802 USA
[39] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[40] Ancestry Com, San Francisco, CA 94107 USA
[41] Tel Aviv Univ, Blavatnik Sch Comp Sci, IL-69978 Tel Aviv, Israel
[42] Tel Aviv Univ, Dept Microbiol, IL-69978 Tel Aviv, Israel
[43] Int Comp Sci Inst, Berkeley, CA 94704 USA
[44] Translat Genom Res Inst, Phoenix, AZ 85004 USA
[45] Life Technol, Beverly, MA 01915 USA
[46] Univ Calif Los Angeles, David Geffen Sch ofMedicine, Dept Human Genet, Los Angeles, CA 90024 USA
[47] Univ Calif San Diego, Dept Psychiat, La Jolla, CA 92093 USA
[48] Univ Calif San Diego, Dept Cellular & Mol Med, La Jolla, CA 92093 USA
[49] Univ Calif San Diego, Dept Comp Sci, La Jolla, CA 92093 USA
[50] Albert Einstein Coll Med, Dept Epidemiol & Populat Hlth, Bronx, NY 10461 USA
基金
瑞士国家科学基金会; 英国生物技术与生命科学研究理事会; 英国惠康基金; 英国医学研究理事会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; WIDE ASSOCIATION; POPULATION-STRUCTURE; RARE; VARIANTS; LOCI; MUTATION; RISK;
D O I
10.1038/nature11632
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
引用
收藏
页码:56 / 65
页数:10
相关论文
共 47 条
[21]   Potential etiologic and functional implications of genome-wide association loci for human diseases and traits [J].
Hindorff, Lucia A. ;
Sethupathy, Praveen ;
Junkins, Heather A. ;
Ramos, Erin M. ;
Mehta, Jayashri P. ;
Collins, Francis S. ;
Manolio, Teri A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (23) :9362-9367
[22]   Genotype Imputation with Thousands of Genomes [J].
Howie, Bryan ;
Marchini, Jonathan ;
Stephens, Matthew .
G3-GENES GENOMES GENETICS, 2011, 1 (06) :457-469
[23]   De novo assembly and genotyping of variants using colored de Bruijn graphs [J].
Iqbal, Zamin ;
Caccamo, Mario ;
Turner, Isaac ;
Flicek, Paul ;
McVean, Gil .
NATURE GENETICS, 2012, 44 (02) :226-232
[24]   KEGG for integration and interpretation of large-scale molecular data sets [J].
Kanehisa, Minoru ;
Goto, Susumu ;
Sato, Yoko ;
Furumichi, Miho ;
Tanabe, Mao .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D109-D114
[25]   Recent Explosive Human Population Growth Has Resulted in an Excess of Rare Genetic Variants [J].
Keinan, Alon ;
Clark, Andrew G. .
SCIENCE, 2012, 336 (6082) :740-743
[26]   Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome [J].
Kim, Tae Hoon ;
Abdullaev, Ziedulla K. ;
Smith, Andrew D. ;
Ching, Keith A. ;
Loukinov, Dmitri I. ;
Green, Roland D. ;
Zhang, Michael Q. ;
Lobanenkov, Victor V. ;
Ren, Bing .
CELL, 2007, 128 (06) :1231-1245
[27]   Inference of Population Structure using Dense Haplotype Data [J].
Lawson, Daniel John ;
Hellenthal, Garrett ;
Myers, Simon ;
Falush, Daniel .
PLOS GENETICS, 2012, 8 (01)
[28]   Low-coverage sequencing: Implications for design of complex trait association studies [J].
Li, Yun ;
Sidore, Carlo ;
Kang, Hyun Min ;
Boehnke, Michael ;
Abecasis, Goncalo R. .
GENOME RESEARCH, 2011, 21 (06) :940-951
[29]   Clan Genomics and the Complex Architecture of Human Disease [J].
Lupski, James R. ;
Belmont, John W. ;
Boerwinkle, Eric ;
Gibbs, Richard A. .
CELL, 2011, 147 (01) :32-43
[30]   Sequence variations in the public human genome data reflect a bottlenecked population history [J].
Marth, G ;
Schuler, G ;
Yeh, R ;
Davenport, R ;
Agarwala, R ;
Church, D ;
Wheelan, S ;
Baker, J ;
Ward, M ;
Kholodov, M ;
Phan, L ;
Czabarka, E ;
Murvai, J ;
Cutler, D ;
Wooding, S ;
Rogers, A ;
Chakravarti, A ;
Harpending, HC ;
Kwok, PY ;
Sherry, ST .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (01) :376-381