SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data

被引：1070

作者：

Cox, Murray P. ^{[1
]}

Peterson, Daniel A. ^{[1
]}

Biggs, Patrick J. ^{[2
,3
]}

机构：

[1] Massey Univ, Inst Mol BioSci, Palmerston North 4442, New Zealand

[2] Massey Univ, Inst Vet Anim & Biomed Sci, Palmerston North 4442, New Zealand

[3] Massey Univ, Massey Genome Serv, Palmerston North 4442, New Zealand

来源：

BMC BIOINFORMATICS | 2010年 / 11卷

关键词：

We thank members of the Massey Genome Service for trialing earlier versions of this software package. DAP was supported by a summer research scholarship from the Institute of Molecular BioSciences; Massey University; Palmerston North; New Zealand. PJB was partly supported by the Marsden Fund of the Royal Society of New Zealand (MAU0802). We thank Nigel French (Massey University) for pre-publication access to Campylobacter genome data;

D O I：

10.1186/1471-2105-11-485

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Illumina's second-generation sequencing platform is playing an increasingly prominent role in modern DNA and RNA sequencing efforts. However, rapid, simple, standardized and independent measures of run quality are currently lacking, as are tools to process sequences for use in downstream applications based on read-level quality data. Results: We present SolexaQA, a user-friendly software package designed to generate detailed statistics and at-aglance graphics of sequence data quality both quickly and in an automated fashion. This package contains associated software to trim sequences dynamically using the quality scores of bases within individual reads. Conclusion: The SolexaQA package produces standardized outputs within minutes, thus facilitating ready comparison between flow cell lanes and machine runs, as well as providing immediate diagnostic information to guide the manipulation of sequence data for downstream analyses.

引用

页数：6

共 10 条

[1]

[Anonymous], 2010, R LANG ENV STAT COMP

[2] The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].

Cock, Peter J. A. ;

Fields, Christopher J. ;

Goto, Naohisa ;

Heuer, Michael L. ;

Rice, Peter M. .

NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771

[3] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].

Dohm, Juliane C. ;

Lottaz, Claudio ;

Borodina, Tatiana ;

Himmelbauer, Heinz .

NUCLEIC ACIDS RESEARCH, 2008, 36 (16)

[4] TileQC: A system for tile-based quality control of Solexa data [J].

Dolan, Peter C. ;

Denver, Dee R. .

BMC BIOINFORMATICS, 2008, 9 (1)

[5]

Hannon GJ, 2010, FASTX TOOLKIT

[6] PIQA: pipeline for Illumina G1 genome analyzer data quality assessment [J].

Martinez-Alcantara, A. ;

Ballesteros, E. ;

Feng, C. ;

Rojas, M. ;

Koshinsky, H. ;

Fofanov, V. Y. ;

Havlak, P. ;

Fofanov, Y. .

BIOINFORMATICS, 2009, 25 (18) :2438-2439

[7] APPLICATIONS OF NEXT-GENERATION SEQUENCING Sequencing technologies - the next generation [J].

Metzker, Michael L. .

NATURE REVIEWS GENETICS, 2010, 11 (01) :31-46

[8] Matrix2png: a utility for visualizing matrix data [J].

Pavlidis, P ;

Noble, WS .

BIOINFORMATICS, 2003, 19 (02) :295-296

[9] Probabilistic base calling of Solexa sequencing data [J].

Rougemont, Jacques ;

Amzallag, Arnaud ;

Iseli, Christian ;

Farinelli, Laurent ;

Xenarios, Ioannis ;

Naef, Felix .

BMC BIOINFORMATICS, 2008, 9 (1)

[10] Velvet: Algorithms for de novo short read assembly using de Bruijn graphs [J].

Zerbino, Daniel R. ;

Birney, Ewan .

GENOME RESEARCH, 2008, 18 (05) :821-829

← 1 →