A variant by any name: quantifying annotation discordance across tools and clinical databases

被引:54
作者
Yen, Jennifer L. [1 ]
Garcia, Sarah [1 ,2 ]
Montana, Aldrin [1 ]
Harris, Jason [1 ]
Chervitz, Stephen [1 ]
Morra, Massimo [1 ]
West, John [1 ]
Chen, Richard [1 ]
Church, Deanna M. [1 ,2 ]
机构
[1] Personalis, 1330 OBrien Dr, Menlo Pk, CA 94025 USA
[2] 10X Genom, 7068 Koll Ctr Pkwy 401, Pleasanton, CA 94566 USA
来源
GENOME MEDICINE | 2017年 / 9卷
关键词
HGVS; Clinical testing; Genomics; Annotation; Sequencing; Syntax; Precision medicine; Variant; SEQUENCE VARIANTS; NOMENCLATURE; GENOME; MUTATIONS; DIAGNOSIS;
D O I
10.1186/s13073-016-0396-7
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript-or protein-based descriptions. Methods: We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. Results: We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90% for coding and 50 and 70% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88% and less than 15% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. Conclusions: Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.
引用
收藏
页数:14
相关论文
共 37 条
[1]   Loss-of-function mutations in the genes encoding prokineticin-2 or prokineticin receptor-2 cause autosomal recessive Kallmann syndrome [J].
Abreu, Ana Paula ;
Trarbach, Ericka Barbosa ;
de Castro, Margaret ;
Frade Costa, Elaine Maria ;
Versiani, Beatriz ;
Matias Baptista, Maria Tereza ;
Garmes, Heraldo Mendes ;
Mendonca, Berenice Bilharinho ;
Latronico, Ana Claudia .
JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM, 2008, 93 (10) :4113-4118
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]  
Antonarakis SE, 1998, HUM MUTAT, V11, P1
[4]   Towards precision medicine [J].
Ashley, Euan A. .
NATURE REVIEWS GENETICS, 2016, 17 (09) :507-522
[5]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[6]   A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference [J].
Cornish, Adam ;
Guda, Chittibabu .
BIOMED RESEARCH INTERNATIONAL, 2015, 2015
[7]   HGVS Nomenclature in Practice: An Example from the United Kingdom National External Quality Assessment Scheme [J].
Deans, Zandra C. ;
Fairley, Jennifer A. ;
den Dunnen, Johan T. ;
Clark, Caroline .
HUMAN MUTATION, 2016, 37 (06) :576-578
[8]   HGVS Recommendations for the Description of Sequence Variants: 2016 Update [J].
den Dunnen, Johan T. ;
Dalgleish, Raymond ;
Maglott, Donna R. ;
Hart, Reece K. ;
Greenblatt, Marc S. ;
McGowan-Jordan, Jean ;
Roux, Anne-Francoise ;
Smith, Timothy ;
Antonarakis, Stylianos E. ;
Taschner, Peter E. M. .
HUMAN MUTATION, 2016, 37 (06) :564-569
[9]  
den Dunnen JT, 2000, HUM MUTAT, V15, P7
[10]   The Sequence Ontology: a tool for the unification of genome annotations [J].
Eilbeck, K ;
Lewis, SE ;
Mungall, CJ ;
Yandell, M ;
Stein, L ;
Durbin, R ;
Ashburner, M .
GENOME BIOLOGY, 2005, 6 (05)