NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

被引:2363
作者
Patel, Ravi K. [1 ]
Jain, Mukesh [1 ]
机构
[1] Natl Inst Plant Genome Res NIPGR, Funct Genom & Bioinformat Lab, New Delhi, India
关键词
METAGENOMIC DATASETS; GENE DISCOVERY; IDENTIFICATION; CHICKPEA;
D O I
10.1371/journal.pone.0030619
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.
引用
收藏
页数:7
相关论文
共 14 条
[1]
Manipulation of FASTQ data with Galaxy [J].
Blankenberg, Daniel ;
Gordon, Assaf ;
Von Kuster, Gregory ;
Coraor, Nathan ;
Taylor, James ;
Nekrutenko, Anton .
BIOINFORMATICS, 2010, 26 (14) :1783-1785
[2]
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[3]
SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data [J].
Cox, Murray P. ;
Peterson, Daniel A. ;
Biggs, Patrick J. .
BMC BIOINFORMATICS, 2010, 11
[4]
Gene Discovery and Tissue-Specific Transcriptome Analysis in Chickpea with Massively Parallel Pyrosequencing and Web Resource Development [J].
Garg, Rohini ;
Patel, Ravi K. ;
Jhanwar, Shalu ;
Priya, Pushp ;
Bhattacharjee, Annapurna ;
Yadav, Gitanjali ;
Bhatia, Sabhyata ;
Chattopadhyay, Debasis ;
Tyagi, Akhilesh K. ;
Jain, Mukesh .
PLANT PHYSIOLOGY, 2011, 156 (04) :1661-1678
[5]
De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification [J].
Garg, Rohini ;
Patel, Ravi K. ;
Tyagi, Akhilesh K. ;
Jain, Mukesh .
DNA RESEARCH, 2011, 18 (01) :53-63
[6]
TagDust-a program to eliminate artifacts from next generation sequencing data [J].
Lassmann, Timo ;
Hayashizaki, Yoshihide ;
Daub, Carsten O. .
BIOINFORMATICS, 2009, 25 (21) :2839-2840
[7]
Next-generation DNA sequencing methods [J].
Mardis, Elaine R. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2008, 9 :387-402
[8]
Genome sequencing in microfabricated high-density picolitre reactors [J].
Margulies, M ;
Egholm, M ;
Altman, WE ;
Attiya, S ;
Bader, JS ;
Bemben, LA ;
Berka, J ;
Braverman, MS ;
Chen, YJ ;
Chen, ZT ;
Dewell, SB ;
Du, L ;
Fierro, JM ;
Gomes, XV ;
Godwin, BC ;
He, W ;
Helgesen, S ;
Ho, CH ;
Irzyk, GP ;
Jando, SC ;
Alenquer, MLI ;
Jarvie, TP ;
Jirage, KB ;
Kim, JB ;
Knight, JR ;
Lanza, JR ;
Leamon, JH ;
Lefkowitz, SM ;
Lei, M ;
Li, J ;
Lohman, KL ;
Lu, H ;
Makhijani, VB ;
McDade, KE ;
McKenna, MP ;
Myers, EW ;
Nickerson, E ;
Nobile, JR ;
Plant, R ;
Puc, BP ;
Ronan, MT ;
Roth, GT ;
Sarkis, GJ ;
Simons, JF ;
Simpson, JW ;
Srinivasan, M ;
Tartaro, KR ;
Tomasz, A ;
Vogt, KA ;
Volkmer, GA .
NATURE, 2005, 437 (7057) :376-380
[9]
PIQA: pipeline for Illumina G1 genome analyzer data quality assessment [J].
Martinez-Alcantara, A. ;
Ballesteros, E. ;
Feng, C. ;
Rojas, M. ;
Koshinsky, H. ;
Fofanov, V. Y. ;
Havlak, P. ;
Fofanov, Y. .
BIOINFORMATICS, 2009, 25 (18) :2438-2439
[10]
ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data [J].
Morgan, Martin ;
Anders, Simon ;
Lawrence, Michael ;
Aboyoun, Patrick ;
Pages, Herve ;
Gentleman, Robert .
BIOINFORMATICS, 2009, 25 (19) :2607-2608