Choice of transcripts and software has a large effect on variant annotation

被引:129
作者
McCarthy, Davis J. [1 ,2 ]
Humburg, Peter [2 ]
Kanapin, Alexander [2 ]
Rivas, Manuel A. [2 ]
Gaulton, Kyle [2 ]
Cazier, Jean-Baptiste [3 ]
Donnelly, Peter [1 ,2 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX1 3TG, England
[3] Univ Oxford, Dept Oncol, Oxford OX1 3TG, England
来源
GENOME MEDICINE | 2014年 / 6卷
基金
英国惠康基金;
关键词
GENOME ANNOTATION; MUTATIONS; GENCODE; DATABASE; FRAMEWORK; EXOME;
D O I
10.1186/gm543
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Variant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail. Methods: This paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the REFSEQ and ENSEMBL transcript sets as the basis for variant annotation with the software ANNOVAR, and also compare the results from two annotation software packages, ANNOVAR and VEP (ENSEMBL's Variant Effect Predictor), when using ENSEMBL transcripts. Results: We found only 44% agreement in annotations for putative loss-of-function variants when using the REFSEQ and ENSEMBL transcript sets as the basis for annotation with ANNOVAR. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from ANNOVAR and VEP using ENSEMBL transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by ANNOVAR and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies. Conclusions: Variant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.
引用
收藏
页数:15
相关论文
共 60 条
[31]   Recessive Mutations in SPTBN2 Implicate β-III Spectrin in Both Cognitive and Motor Development [J].
Lise, Stefano ;
Clarkson, Yvonne ;
Perkins, Emma ;
Kwasniewska, Alexandra ;
Akha, Elham Sadighi ;
Schnekenberg, Ricardo Parolin ;
Suminaite, Daumante ;
Hope, Jilly ;
Baker, Ian ;
Gregory, Lorna ;
Green, Angie ;
Allan, Chris ;
Lamble, Sarah ;
Jayawant, Sandeep ;
Quaghebeur, Gerardine ;
Cader, M. Zameel ;
Hughes, Sarah ;
Armstrong, Richard J. E. ;
Kanapin, Alexander ;
Rimmer, Andrew ;
Lunter, Gerton ;
Mathieson, Iain ;
Cazier, Jean-Baptiste ;
Buck, David ;
Taylor, Jenny C. ;
Bentley, David ;
McVean, Gilean ;
Donnelly, Peter ;
Knight, Samantha J. L. ;
Jackson, Mandy ;
Ragoussis, Jiannis ;
Nemeth, Andrea H. .
PLOS GENETICS, 2012, 8 (12)
[32]   Performance comparison of benchtop high-throughput sequencing platforms [J].
Loman, Nicholas J. ;
Misra, Raju V. ;
Dallman, Timothy J. ;
Constantinidou, Chrystala ;
Gharbia, Saheer E. ;
Wain, John ;
Pallen, Mark J. .
NATURE BIOTECHNOLOGY, 2012, 30 (05) :434-+
[33]   Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads [J].
Lunter, Gerton ;
Goodson, Martin .
GENOME RESEARCH, 2011, 21 (06) :936-939
[34]   A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes [J].
MacArthur, Daniel G. ;
Balasubramanian, Suganthi ;
Frankish, Adam ;
Huang, Ni ;
Morris, James ;
Walter, Klaudia ;
Jostins, Luke ;
Habegger, Lukas ;
Pickrell, Joseph K. ;
Montgomery, Stephen B. ;
Albers, Cornelis A. ;
Zhang, Zhengdong D. ;
Conrad, Donald F. ;
Lunter, Gerton ;
Zheng, Hancheng ;
Ayub, Qasim ;
DePristo, Mark A. ;
Banks, Eric ;
Hu, Min ;
Handsaker, Robert E. ;
Rosenfeld, Jeffrey A. ;
Fromer, Menachem ;
Jin, Mike ;
Mu, Xinmeng Jasmine ;
Khurana, Ekta ;
Ye, Kai ;
Kay, Mike ;
Saunders, Gary Ian ;
Suner, Marie-Marthe ;
Hunt, Toby ;
Barnes, If H. A. ;
Amid, Clara ;
Carvalho-Silva, Denise R. ;
Bignell, Alexandra H. ;
Snow, Catherine ;
Yngvadottir, Bryndis ;
Bumpstead, Suzannah ;
Cooper, David N. ;
Xue, Yali ;
Romero, Irene Gallego ;
Wang, Jun ;
Li, Yingrui ;
Gibbs, Richard A. ;
McCarroll, Steven A. ;
Dermitzakis, Emmanouil T. ;
Pritchard, Jonathan K. ;
Barrett, Jeffrey C. ;
Harrow, Jennifer ;
Hurles, Matthew E. ;
Gerstein, Mark B. .
SCIENCE, 2012, 335 (6070) :823-828
[35]  
Martin HC., 2014, Human molecular genetics
[36]  
McCarthy DJ, 2013, ANNOTATION COMP
[37]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303
[38]   Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor [J].
McLaren, William ;
Pritchard, Bethan ;
Rios, Daniel ;
Chen, Yuan ;
Flicek, Paul ;
Cunningham, Fiona .
BIOINFORMATICS, 2010, 26 (16) :2069-2070
[39]   RNA CODEWORDS AND PROTEIN SYNTHESIS .7. ON GENERAL NATURE OF RNA CODE [J].
NIRENBERG, M ;
LEDER, P ;
BERNFIELD, M ;
BRIMACOMBE, R ;
TRUPIN, J ;
ROTTMAN, F ;
ONEAL, C .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1965, 53 (05) :1161-+
[40]   Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing [J].
O'Rawe, Jason ;
Jiang, Tao ;
Sun, Guangqing ;
Wu, Yiyang ;
Wang, Wei ;
Hu, Jingchu ;
Bodily, Paul ;
Tian, Lifeng ;
Hakonarson, Hakon ;
Johnson, W. Evan ;
Wei, Zhi ;
Wang, Kai ;
Lyon, Gholson J. .
GENOME MEDICINE, 2013, 5