ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

被引:10498
作者
Wang, Kai [1 ]
Li, Mingyao [2 ]
Hakonarson, Hakon [1 ,3 ]
机构
[1] Childrens Hosp Philadelphia, Ctr Appl Genom, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pediat, Philadelphia, PA 19104 USA
关键词
SNPS; ASSOCIATION; GENOMES;
D O I
10.1093/nar/gkq603
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires similar to 4 min to perform gene-based annotation and similar to 15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
引用
收藏
页数:7
相关论文
共 21 条
  • [1] Segmental duplications: Organization and impact within the current Human Genome Project assembly
    Bailey, JA
    Yavor, AM
    Massa, HF
    Trask, BJ
    Eichler, EE
    [J]. GENOME RESEARCH, 2001, 11 (06) : 1005 - 1017
  • [2] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [3] The Ensembl automatic gene annotation system
    Curwen, V
    Eyras, E
    Andrews, TD
    Clarke, L
    Mongin, E
    Searle, SMJ
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
  • [4] Genome variation discovery with high-throughput sequencing data
    Dalca, Adrian V.
    Brudno, Michael
    [J]. BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) : 3 - 14
  • [5] Nomenclature for the description of human sequence variations
    den Dunnen, JT
    Antonarakis, E
    [J]. HUMAN GENETICS, 2001, 109 (01) : 121 - 124
  • [6] SCAN: SNP and copy number annotation
    Gamazon, Eric R.
    Zhang, Wei
    Konkashbaev, Anuar
    Duan, Shiwei
    Kistner, Emily O.
    Nicolae, Dan L.
    Dolan, M. Eileen
    Cox, Nancy J.
    [J]. BIOINFORMATICS, 2010, 26 (02) : 259 - 262
  • [7] High carrier frequency of the 35delG deafness mutation in European populations
    Gasparini, P
    Rabionet, R
    Barbujani, G
    Melchionda, S
    Petersen, M
    Brondum-Nielsen, K
    Metspalu, A
    Oitmaa, E
    Pisano, M
    Fortina, P
    Zelante, L
    Estivill, X
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2000, 8 (01) : 19 - 23
  • [8] WGAViewer: Software for genomic annotation of whole genome association studies
    Ge, Dongliang
    Zhang, Kunlin
    Need, Anna C.
    Martin, Olivier
    Fellay, Jacques
    Urban, Thomas J.
    Telenti, Amalio
    Goldstein, David B.
    [J]. GENOME RESEARCH, 2008, 18 (04) : 640 - 643
  • [9] The UCSC Known Genes
    Hsu, F
    Kent, WJ
    Clawson, H
    Kuhn, RM
    Diekhans, M
    Haussler, D
    [J]. BIOINFORMATICS, 2006, 22 (09) : 1036 - 1046
  • [10] Next generation tools for the annotation of human SNPs
    Karchin, Rachel
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) : 35 - 52