The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

被引:1094
作者
Cock, Peter J. A. [1 ]
Fields, Christopher J. [2 ]
Goto, Naohisa [3 ]
Heuer, Michael L. [4 ]
Rice, Peter M. [5 ]
机构
[1] SCRI, Plant Pathol, Dundee DD2 5DA, Scotland
[2] Univ Illinois, Inst Genom Biol, Urbana, IL 61801 USA
[3] Osaka Univ, Microbial Dis Res Inst, Genome Informat Res Ctr, Suita, Osaka 5650871, Japan
[4] Harbinger Partners Inc, St Paul, MN 55127 USA
[5] European Bioinformat Inst, EMBL Outstn Hinxton, Cambridge CB10 1SD, England
基金
英国生物技术与生命科学研究理事会;
关键词
MOLECULAR-BIOLOGY; HUMAN GENOME; BIOINFORMATICS; ALIGNMENT; TOOLS;
D O I
10.1093/nar/gkp1137
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
引用
收藏
页码:1767 / 1771
页数:5
相关论文
共 20 条
  • [1] Solexa Ltd
    Bennett, S
    [J]. PHARMACOGENOMICS, 2004, 5 (04) : 433 - 438
  • [2] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [3] Experiment files and their application during large-scale sequencing projects
    Bonfield, JK
    Staden, R
    [J]. DNA SEQUENCE, 1996, 6 (02): : 109 - 117
  • [4] Biopython']python: freely available Python']Python tools for computational molecular biology and bioinformatics
    Cock, Peter J. A.
    Antao, Tiago
    Chang, Jeffrey T.
    Chapman, Brad A.
    Cox, Cymon J.
    Dalke, Andrew
    Friedberg, Iddo
    Hamelryck, Thomas
    Kauff, Frank
    Wilczynski, Bartek
    de Hoon, Michiel J. L.
    [J]. BIOINFORMATICS, 2009, 25 (11) : 1422 - 1423
  • [5] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [6] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185
  • [7] Consed: A graphical tool for sequence finishing
    Gordon, D
    Abajian, C
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03) : 195 - 202
  • [8] BioJava']Java:: an open-source framework for bioinformatics
    Holland, R. C. G.
    Down, T. A.
    Pocock, M.
    Prlic, A.
    Huen, D.
    James, K.
    Foisy, S.
    Draeger, A.
    Yates, A.
    Heuer, M.
    Schreiber, M. J.
    [J]. BIOINFORMATICS, 2008, 24 (18) : 2096 - 2097
  • [9] High-throughput genotyping by whole-genome resequencing
    Huang, Xuehui
    Feng, Qi
    Qian, Qian
    Zhao, Qiang
    Wang, Lu
    Wang, Ahong
    Guan, Jianping
    Fan, Danlin
    Weng, Qijun
    Huang, Tao
    Dong, Guojun
    Sang, Tao
    Han, Bin
    [J]. GENOME RESEARCH, 2009, 19 (06) : 1068 - 1076
  • [10] *ILL INC, 2008, SEQ AN SOFTW US GUID