Understanding the limitations of next generation sequencing informatics, an approach to clinical pipeline validation using artificial data sets

被引:73
作者
Daber, Robert [1 ]
Sukhadia, Shrey [1 ]
Morrissette, Jennifer J. D. [1 ]
机构
[1] Univ Penn, Sch Med, Ctr Personalized Diagnost, Philadelphia, PA 19104 USA
关键词
Next generation sequencing; bioinformatics; validation; sensitivity; artificial data set; DIAGNOSTIC LABORATORIES; GENOME; TECHNOLOGIES; CHALLENGES; FRAMEWORK; PROGRAM; CANCER; FORMAT;
D O I
10.1016/j.cancergen.2013.11.005
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
The advantages of massively parallel sequencing are quickly being realized through the adoption of comprehensive genomic panels across the spectrum of genetic testing. Despite such widespread utilization of next generation sequencing (NGS), a major bottleneck in the implementation and capitalization of this technology remains in the data processing steps, or bioinformatics. Here we describe our approach to defining the limitations of each step in the data processing pipeline by utilizing artificial amplicon data sets to simulate a wide spectrum of genomic alterations. Through this process, we identified limitations of insertion, deletion (indel), and single nucleotide variant (SNV) detection using standard approaches and described novel strategies to improve overall somatic mutation detection. Using these artificial data sets, we were able to demonstrate that NGS assays can have robust mutation detection if the data can be processed in a way that does not lead to large genomic alterations landing in the unmapped data (i.e., trash). By using these pipeline modifications and a new variant caller, Absolute Var, we have been able to validate SNV mutation detection to 100% sensitivity and specificity with an allele frequency as low 4% and detection of indels as large as 90 bp. Clinical validation of NGS relies on the ability for mutation detection across a wide array of genetic anomalies, and the utility of artificial data sets demonstrates a mechanism to intelligently test a vast array of mutation types.
引用
收藏
页码:441 / 448
页数:8
相关论文
共 19 条
[1]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[2]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[3]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[4]   Bioinformatics challenges for personalized medicine [J].
Fernald, Guy Haskin ;
Capriotti, Emidio ;
Daneshjou, Roxana ;
Karczewski, Konrad J. ;
Altman, Russ B. .
BIOINFORMATICS, 2011, 27 (13) :1741-1748
[5]   Accurate indel prediction using paired-end short reads [J].
Grimm, Dominik ;
Hagmann, Joerg ;
Koenig, Daniel ;
Weigel, Detlef ;
Borgwardt, Karsten .
BMC GENOMICS, 2013, 14
[6]   Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies [J].
Kong, Yong .
GENOMICS, 2011, 98 (02) :152-153
[7]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[8]   Validation of Next Generation Sequencing Technologies in Comparison to Current Diagnostic Gold Standards for BRAF, EGFR and KRAS Mutational Analysis [J].
McCourt, Clare M. ;
McArt, Darragh G. ;
Mills, Ken ;
Catherwood, Mark A. ;
Maxwell, Perry ;
Waugh, David J. ;
Hamilton, Peter ;
O'Sullivan, Joe M. ;
Salto-Tellez, Manuel .
PLOS ONE, 2013, 8 (07)
[9]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303
[10]   BEDTools: a flexible suite of utilities for comparing genomic features [J].
Quinlan, Aaron R. ;
Hall, Ira M. .
BIOINFORMATICS, 2010, 26 (06) :841-842