A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

被引:7845
作者
Cingolani, Pablo [1 ,2 ,3 ,4 ]
Platts, Adrian [5 ]
Wang, Le Lily [1 ]
Coon, Melissa [2 ]
Tung Nguyen [6 ]
Wang, Luan [1 ,2 ]
Land, Susan J. [2 ]
Lu, Xiangyi [1 ]
Ruden, Douglas M. [1 ,2 ]
机构
[1] Wayne State Univ, Inst Environm Hlth Sci, Detroit, MI 48202 USA
[2] Wayne State Univ, Sch Med, Dept Obstet & Gynecol, CS Mott Ctr, Detroit, MI 48201 USA
[3] McGill Univ, Sch Comp Sci, Quebec City, PQ, Canada
[4] McGill Univ, Genome Quebec Innovat Ctr, Quebec City, PQ, Canada
[5] McGill Univ, Dept Bioinformat, Quebec City, PQ, Canada
[6] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
关键词
personal genomes; Drosophila melanogaster; whole-genome SNP analysis; next generation DNA sequencing; NONSENSE MUTATION; EVOLUTION; PROTEIN; GENES; YEAST; GENERATION; IDENTIFY; BRAIN; RATES; RNA;
D O I
10.4161/fly.19695
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating similar to 356,660 candidate SNPs in similar to 117 Mb unique sequences, representing a substitution rate of similar to 1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that similar to 15,842 SNPs are synonymous and similar to 4,467 SNPs are non-synonymous (N/S similar to 0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3'UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
引用
收藏
页码:80 / 92
页数:13
相关论文
共 43 条
  • [21] SEQUENCE DIVERSITY WITHIN A SUBGROUP OF MOUSE IMMUNOGLOBULIN KAPPA CHAINS CONTROLLED BY THE IGK-EF2 LOCUS
    LAZURE, C
    HUM, WT
    GIBSON, DM
    [J]. JOURNAL OF EXPERIMENTAL MEDICINE, 1981, 154 (01) : 146 - 155
  • [22] Improving SNP discovery by base alignment quality
    Li, Heng
    [J]. BIOINFORMATICS, 2011, 27 (08) : 1157 - 1158
  • [23] Cyclin Y Is a Novel Conserved Cyclin Essential for Development in Drosophila
    Liu, Dongmei
    Finley, Russell L., Jr.
    [J]. GENETICS, 2010, 184 (04) : 1025 - U232
  • [24] Third Generation DNA Sequencing: Pacific Biosciences' Single Molecule Real Time Technology
    McCarthy, Alice
    [J]. CHEMISTRY & BIOLOGY, 2010, 17 (07): : 675 - 676
  • [25] The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
    McKenna, Aaron
    Hanna, Matthew
    Banks, Eric
    Sivachenko, Andrey
    Cibulskis, Kristian
    Kernytsky, Andrew
    Garimella, Kiran
    Altshuler, David
    Gabriel, Stacey
    Daly, Mark
    DePristo, Mark A.
    [J]. GENOME RESEARCH, 2010, 20 (09) : 1297 - 1303
  • [26] Systematic generation of high-resolution deletion coverage of the Drosophila melanogaster genome
    Parks, AL
    Cook, KR
    Belvin, M
    Dompe, NA
    Fawcett, R
    Huppert, K
    Tan, LR
    Winter, CG
    Bogart, KP
    Deal, JE
    Deal-Herr, ME
    Grant, D
    Marcinko, M
    Miyazaki, WY
    Robertson, S
    Shaw, KJ
    Tabios, M
    Vysotskaia, V
    Zhao, L
    Andrade, RS
    Edgar, KA
    Howie, E
    Killpack, K
    Milash, B
    Norton, A
    Thao, D
    Whittaker, K
    Winner, MA
    Friedman, L
    Margolis, J
    Singer, MA
    Kopczynski, C
    Curtis, D
    Kaufman, TC
    Plowman, GD
    Duyk, G
    Francis-Lang, HL
    [J]. NATURE GENETICS, 2004, 36 (03) : 288 - 292
  • [27] Massively parallel resequencing of the isogenic Drosophila melanogaster strain w1118; iso-2; iso-3 identifies hotspots for mutations in sensory perception genes
    Platts, Adrian E.
    Land, Susan J.
    Chen, Lang
    Page, Grier P.
    Rasouli, Parsa
    Wang, Luan
    Lu, Xiangyi
    Ruden, Douglas M.
    [J]. FLY, 2009, 3 (03) : 192 - 203
  • [28] Using VAAST to Identify an X-Linked Disorder Resulting in Lethality in Male Infants Due to N-Terminal Acetyltransferase Deficiency
    Rope, Alan F.
    Wang, Kai
    Evjenth, Rune
    Xing, Jinchuan
    Johnston, Jennifer J.
    Swensen, Jeffrey J.
    Johnson, W. Evan
    Moore, Barry
    Huff, Chad D.
    Bird, Lynne M.
    Carey, John C.
    Opitz, John M.
    Stevens, Cathy A.
    Jiang, Tao
    Schank, Christa
    Fain, Heidi Deborah
    Robison, Reid
    Dalley, Brian
    Chin, Steven
    South, Sarah T.
    Pysher, Theodore J.
    Jorde, Lynn B.
    Hakonarson, Hakon
    Lillehaug, Johan R.
    Biesecker, Leslie G.
    Yandell, Mark
    Arnesen, Thomas
    Lyon, Gholson J.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 89 (01) : 28 - 43
  • [29] GENERATING YEAST TRANSCRIPTIONAL ACTIVATORS CONTAINING NO YEAST PROTEIN SEQUENCES
    RUDEN, DM
    MA, J
    LI, Y
    WOOD, K
    PTASHNE, M
    [J]. NATURE, 1991, 350 (6315) : 250 - 252
  • [30] The EDGE hypothesis: Epigenetically directed genetic errors in repeat-containing proteins (RCPs) involved in evolution, neuroendocrine signaling, and cancer
    Ruden, Douglas M.
    Jamison, D. Curtis
    Zeeberg, Barry R.
    Garfinkel, Mark D.
    Weinstein, John N.
    Rasouli, Parsa
    Lu, Xiangyi
    [J]. FRONTIERS IN NEUROENDOCRINOLOGY, 2008, 29 (03) : 428 - 444