Identification of deleterious mutations within three human genomes

被引:801
作者
Chun, Sung [1 ]
Fay, Justin C. [1 ,2 ]
机构
[1] Washington Univ, Computat Biol Program, St Louis, MO 63108 USA
[2] Washington Univ, Dept Genet, St Louis, MO 63108 USA
基金
美国国家卫生研究院;
关键词
SEQUENCE; POLYMORPHISMS; SUBSTITUTIONS; PREDICTION; CONSTRAINT; ALLELES; DISEASE; FITNESS; SNPS;
D O I
10.1101/gr.092619.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Each human carries a large number of deleterious mutations. Together, these mutations make a significant contribution to human disease. Identification of deleterious mutations within individual genome sequences could substantially impact an individual's health through personalized prevention and treatment of disease. Yet, distinguishing deleterious mutations from the massive number of nonfunctional variants that occur within a single genome is a considerable challenge. Using a comparative genomics data set of 32 vertebrate species we show that a likelihood ratio test (LRT) can accurately identify a subset of deleterious mutations that disrupt highly conserved amino acids within protein-coding sequences, which are likely to be unconditionally deleterious. The LRT is also able to identify known human disease alleles and performs as well as two commonly used heuristic methods, SIFT and PolyPhen. Application of the LRT to three human genomes reveals 796-837 deleterious mutations per individual, similar to 40% of which are estimated to be at <5% allele frequency. However, the overlap between predictions made by the LRT, SIFT, and PolyPhen, is low; 76% of predictions are unique to one of the three methods, and only 5% of predictions are shared across all three methods. Our results indicate that only a small subset of deleterious mutations can be reliably identified, but that this subset provides the raw material for personalized medicine.
引用
收藏
页码:1553 / 1561
页数:9
相关论文
共 50 条
  • [1] Medical sequencing at the extremes of human body mass
    Ahituv, Nadav
    Kavaslar, Nihan
    Schackwitz, Wendy
    Ustaszewska, Anna
    Martin, Joel
    Hebert, Sybil
    Doelle, Heather
    Ersoy, Baran
    Kryukov, Gregory
    Schmidt, Steffen
    Yosef, Nir
    Ruppin, Eytan
    Sharan, Roded
    Vaisse, Christian
    Sunyaev, Shamil
    Dent, Robert
    Cohen, Jonathan
    McPherson, Ruth
    Pennacchio, Len A.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (04) : 779 - 791
  • [2] Analysis of sequence conservation at nucleotide resolution
    Asthana, Saurabh
    Roytberg, Mikhail
    Stamatoyannopoulos, John
    Sunyaev, Shamil
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (12) : 2559 - 2568
  • [3] Assessing the evolutionary impact of amino acid mutations in the human genome
    Boyko, Adam R.
    Williamson, Scott H.
    Indap, Amit R.
    Degenhardt, Jeremiah D.
    Hernandez, Ryan D.
    Lohmueller, Kirk E.
    Adams, Mark D.
    Schmidt, Steffen
    Sninsky, John J.
    Sunyaev, Shamil R.
    White, Thomas J.
    Nielsen, Rasmus
    Clark, Andrew G.
    Bustamante, Carlos D.
    [J]. PLOS GENETICS, 2008, 4 (05):
  • [4] Quality scores and SNP detection in sequencing-by-synthesis systems
    Brockman, William
    Alvarez, Pablo
    Young, Sarah
    Garber, Manuel
    Giannoukos, Georgia
    Lee, William L.
    Russ, Carsten
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 763 - 770
  • [5] SNAP: predict effect of non-synonymous polymorphisms on function
    Bromberg, Yana
    Rost, Burkhard
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 (11) : 3823 - 3835
  • [6] Deleterious SNP prediction: be mindful of your training data!
    Care, Matthew A.
    Needham, Chris J.
    Bulpitt, Andrew J.
    Westhead, David R.
    [J]. BIOINFORMATICS, 2007, 23 (06) : 664 - 672
  • [7] Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: Structure-based assessment of amino acid variation
    Chasman, D
    Adams, RM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (02) : 683 - 706
  • [8] *CHIMP SEQ AN CONS, 2005, NATURE, V0437
  • [9] Multiple rare Alleles contribute to low plasma levels of HDL cholesterol
    Cohen, JC
    Kiss, RS
    Pertsemlidis, A
    Marcel, YL
    McPherson, R
    Hobbs, HH
    [J]. SCIENCE, 2004, 305 (5685) : 869 - 872
  • [10] Distribution and intensity of constraint in mammalian genomic sequence
    Cooper, GM
    Stone, EA
    Asimenos, G
    Green, ED
    Batzoglou, S
    Sidow, A
    [J]. GENOME RESEARCH, 2005, 15 (07) : 901 - 913