A variant by any name: quantifying annotation discordance across tools and clinical databases

被引:54
作者
Yen, Jennifer L. [1 ]
Garcia, Sarah [1 ,2 ]
Montana, Aldrin [1 ]
Harris, Jason [1 ]
Chervitz, Stephen [1 ]
Morra, Massimo [1 ]
West, John [1 ]
Chen, Richard [1 ]
Church, Deanna M. [1 ,2 ]
机构
[1] Personalis, 1330 OBrien Dr, Menlo Pk, CA 94025 USA
[2] 10X Genom, 7068 Koll Ctr Pkwy 401, Pleasanton, CA 94566 USA
来源
GENOME MEDICINE | 2017年 / 9卷
关键词
HGVS; Clinical testing; Genomics; Annotation; Sequencing; Syntax; Precision medicine; Variant; SEQUENCE VARIANTS; NOMENCLATURE; GENOME; MUTATIONS; DIAGNOSIS;
D O I
10.1186/s13073-016-0396-7
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript-or protein-based descriptions. Methods: We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. Results: We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90% for coding and 50 and 70% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88% and less than 15% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. Conclusions: Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.
引用
收藏
页数:14
相关论文
共 37 条
[31]   Unified representation of genetic variants [J].
Tan, Adrian ;
Abecasis, Goncalo R. ;
Kang, Hyun Min .
BIOINFORMATICS, 2015, 31 (13) :2202-2204
[32]   Describing Structural Changes by Extending HGVS Sequence Variation Nomenclature [J].
Taschner, Peter E. M. ;
den Dunnen, Johan T. .
HUMAN MUTATION, 2011, 32 (05) :507-511
[33]  
Varga E, 2015, FAM CANCER, V14, P1
[34]   ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data [J].
Wang, Kai ;
Li, Mingyao ;
Hakonarson, Hakon .
NUCLEIC ACIDS RESEARCH, 2010, 38 (16) :e164
[35]   Improving sequence variant descriptions in mutation Databases and literature using the mutalyzer sequence variation nomenclature checker [J].
Wildeman, Martin ;
van Ophuizen, Ernest ;
den Dunnen, Johan T. ;
Taschner, Peter E. M. .
HUMAN MUTATION, 2008, 29 (01) :6-13
[36]   Clinical Whole-Exome Sequencing for the Diagnosis of Mendelian Disorders [J].
Yang, Yaping ;
Muzny, Donna M. ;
Reid, Jeffrey G. ;
Bainbridge, Matthew N. ;
Willis, Alecia ;
Ward, Patricia A. ;
Braxton, Alicia ;
Beuten, Joke ;
Xia, Fan ;
Niu, Zhiyv ;
Hardison, Matthew ;
Person, Richard ;
Bekheirnia, Mir Reza ;
Leduc, Magalie S. ;
Kirby, Amelia ;
Peter Pham ;
Scull, Jennifer ;
Wang, Min ;
Ding, Yan ;
Plon, Sharon E. ;
Lupski, James R. ;
Beaudet, Arthur L. ;
Gibbs, Richard A. ;
Eng, Christine M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2013, 369 (16) :1502-1511
[37]   Whole-exome sequencing in undiagnosed genetic diseases: interpreting 119 trios [J].
Zhu, Xiaolin ;
Petrovski, Slave ;
Xie, Pingxing ;
Ruzzo, Elizabeth K. ;
Lu, Yi-Fan ;
McSweeney, K. Melodi ;
Ben-Zeev, Bruria ;
Nissenkorn, Andreea ;
Anikster, Yair ;
Oz-Levi, Danit ;
Dhindsa, Ryan S. ;
Hitomi, Yuki ;
Schoch, Kelly ;
Spillmann, Rebecca C. ;
Heimer, Gali ;
Marek-Yagel, Dina ;
Tzadok, Michal ;
Han, Yujun ;
Worley, Gordon ;
Goldstein, Jennifer ;
Jiang, Yong-Hui ;
Lancet, Doron ;
Pras, Elon ;
Shashi, Vandana ;
McHale, Duncan ;
Need, Anna C. ;
Goldstein, David B. .
GENETICS IN MEDICINE, 2015, 17 (10) :774-781