ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

被引:14
作者
Cabanski, Christopher R. [2 ]
Cavin, Keary [3 ]
Bizon, Chris [3 ]
Wilkerson, Matthew D. [1 ]
Parker, Joel S. [1 ,4 ]
Wilhelmsen, Kirk C. [1 ,3 ,4 ]
Perou, Charles M. [1 ,4 ]
Marron, J. S. [1 ,2 ]
Hayes, D. Neil [1 ,5 ]
机构
[1] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
[2] Dept Stat & Operat Res, Chapel Hill, NC USA
[3] Renaissance Comp Ctr, Chapel Hill, NC USA
[4] Dept Genet, Chapel Hill, NC USA
[5] Univ N Carolina, Multidisciplinary Thorac Oncol Program, Div Med Oncol, Dept Internal Med, Chapel Hill, NC USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家卫生研究院;
关键词
Next-generation sequencing; Quality score; Recalibration; Bioinformatics; Bioconductor; SNP DISCOVERY;
D O I
10.1186/1471-2105-13-221
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results: Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion: ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.
引用
收藏
页数:10
相关论文
共 15 条
[1]   U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line [J].
Clark, Michael James ;
Homer, Nils ;
O'Connor, Brian D. ;
Chen, Zugen ;
Eskin, Ascia ;
Lee, Hane ;
Merriman, Barry ;
Nelson, Stanley F. .
PLOS GENETICS, 2010, 6 (01)
[2]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[3]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[4]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[5]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[6]   Emerging Technologies for Improved Stratification of Cancer Patients A Review of Opportunities, Challenges, and Tools [J].
Lamlertthon, Wisut ;
Hayward, Michele C. ;
Hayes, David Neil .
CANCER JOURNAL, 2011, 17 (06) :451-464
[7]   Improving SNP discovery by base alignment quality [J].
Li, Heng .
BIOINFORMATICS, 2011, 27 (08) :1157-1158
[8]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[9]   SNP detection for massively parallel whole-genome resequencing [J].
Li, Ruiqiang ;
Li, Yingrui ;
Fang, Xiaodong ;
Yang, Huanming ;
Wang, Jian ;
Kristiansen, Karsten ;
Wang, Jun .
GENOME RESEARCH, 2009, 19 (06) :1124-1132
[10]   Genotype and SNP calling from next-generation sequencing data [J].
Nielsen, Rasmus ;
Paul, Joshua S. ;
Albrechtsen, Anders ;
Song, Yun S. .
NATURE REVIEWS GENETICS, 2011, 12 (06) :443-451