Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

被引:310
作者
O'Rawe, Jason [1 ,2 ]
Jiang, Tao [3 ]
Sun, Guangqing [3 ]
Wu, Yiyang [1 ,2 ]
Wang, Wei [4 ]
Hu, Jingchu [3 ]
Bodily, Paul [5 ]
Tian, Lifeng [6 ]
Hakonarson, Hakon [6 ]
Johnson, W. Evan [7 ]
Wei, Zhi [4 ]
Wang, Kai [8 ,9 ]
Lyon, Gholson J. [1 ,2 ,9 ]
机构
[1] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, Cold Spring Harbor, NY 11724 USA
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
[3] BGI Shenzhen, Shenzhen 518000, Peoples R China
[4] New Jersey Inst Technol, Newark, NJ 07103 USA
[5] Brigham Young Univ, Provo, UT 84606 USA
[6] Childrens Hosp Philadelphia, Philadelphia, PA 19104 USA
[7] Boston Univ, Sch Med, Boston, MA 02118 USA
[8] Univ So Calif, Los Angeles, CA 90089 USA
[9] Utah Fdn Biomed Res, Salt Lake City, UT 84106 USA
来源
GENOME MEDICINE | 2013年 / 5卷
关键词
DE-NOVO MUTATIONS; GENOTYPE IMPUTATION; SMALL INSERTIONS; ASSOCIATION; FRAMEWORK; DELETIONS; ALIGNMENT; RATES; TOOL;
D O I
10.1186/gm432
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. Methods: We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage. Results: SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family. Conclusions: Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.
引用
收藏
页数:18
相关论文
共 58 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]   A public resource facilitating clinical use of genomes [J].
Ball, Madeleine P. ;
Thakuria, Joseph V. ;
Zaranek, Alexander Wait ;
Clegg, Tom ;
Rosenbaum, Abraham M. ;
Wu, Xiaodi ;
Angrist, Misha ;
Bhak, Jong ;
Bobe, Jason ;
Callow, Matthew J. ;
Cano, Carlos ;
Chou, Michael F. ;
Chung, Wendy K. ;
Douglas, Shawn M. ;
Estep, Preston W. ;
Gore, Athurva ;
Hulick, Peter ;
Labarga, Alberto ;
Lee, Je-Hyuk ;
Lunshof, Jeantine E. ;
Kim, Byung Chul ;
Kim, Jong-Il ;
Li, Zhe ;
Murray, Michael F. ;
Nilsen, Geoffrey B. ;
Peters, Brock A. ;
Raman, Anugraha M. ;
Rienhoff, Hugh Y. ;
Robasky, Kimberly ;
Wheeler, Matthew T. ;
Vandewege, Ward ;
Vorhaus, Daniel B. ;
Yang, Joyce L. ;
Yang, Luhan ;
Aach, John ;
Ashley, Euan A. ;
Drmanac, Radoje ;
Kim, Seong-Jin ;
Li, Jin Billy ;
Peshkin, Leonid ;
Seidman, Christine E. ;
Seo, Jeong-Sun ;
Zhang, Kun ;
Rehm, Heidi L. ;
Church, George M. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (30) :11920-11927
[3]  
Bearn A.G., 1993, Archibald Garrod and the individuality of Man
[4]   A Fast, Powerful Method for Detecting Identity by Descent [J].
Browning, Brian L. ;
Browning, Sharon R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 88 (02) :173-182
[5]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[6]   Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads [J].
Carnevali, Paolo ;
Baccash, Jonathan ;
Halpern, Aaron L. ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Pant, Krishna P. ;
Ebert, Jessica C. ;
Brownley, Anushka ;
Morenzoni, Matt ;
Karpinchyk, Vitali ;
Martin, Bruce ;
Ballinger, Dennis G. ;
Drmanac, Radoje .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (03) :279-292
[7]   Whole-Exome Sequencing and Homozygosity Analysis Implicate Depolarization-Regulated Neuronal Genes in Autism [J].
Chahrour, Maria H. ;
Yu, Timothy W. ;
Lim, Elaine T. ;
Ataman, Bulent ;
Coulter, Michael E. ;
Hill, R. Sean ;
Stevens, Christine R. ;
Schubert, Christian R. ;
Greenberg, Michael E. ;
Gabriel, Stacey B. ;
Walsh, Christopher A. .
PLOS GENETICS, 2012, 8 (04) :236-244
[8]   The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing [J].
Clement, Nathan L. ;
Snell, Quinn ;
Clement, Mark J. ;
Hollenhorst, Peter C. ;
Purwar, Jahnvi ;
Graves, Barbara J. ;
Cairns, Bradley R. ;
Johnson, W. Evan .
BIOINFORMATICS, 2010, 26 (01) :38-45
[9]   Variation in genome-wide mutation rates within and between human families [J].
Conrad, Donald F. ;
Keebler, Jonathan E. M. ;
DePristo, Mark A. ;
Lindsay, Sarah J. ;
Zhang, Yujun ;
Casals, Ferran ;
Idaghdour, Youssef ;
Hartl, Chris L. ;
Torroja, Carlos ;
Garimella, Kiran V. ;
Zilversmit, Martine ;
Cartwright, Reed ;
Rouleau, Guy A. ;
Daly, Mark ;
Stone, Eric A. ;
Hurles, Matthew E. ;
Awadalla, Philip .
NATURE GENETICS, 2011, 43 (07) :712-U137
[10]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+