Sequence-specific error profile of Illumina sequencers

被引:397
作者
Nakamura, Kensuke [1 ]
Oshima, Taku [2 ]
Morimoto, Takuya [2 ,3 ]
Ikeda, Shun [1 ]
Yoshikawa, Hirofumi [4 ,5 ]
Shiwa, Yuh [5 ]
Ishikawa, Shu [2 ]
Linak, Margaret C. [6 ]
Hirai, Aki [1 ]
Takahashi, Hiroki [1 ]
Altaf-Ul-Amin, Md. [1 ]
Ogasawara, Naotake [2 ]
Kanaya, Shigehiko [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara 6300192, Japan
[2] Nara Inst Sci & Technol, Grad Sch Biol Sci, Nara 6300192, Japan
[3] Kao Corp, Biol Sci Labs, Haga, Tochigi 3213497, Japan
[4] Tokyo Univ Agr, Dept Biosci, Setagaya Ku, Tokyo 1568502, Japan
[5] Tokyo Univ Agr, Genome Res Ctr, NODAI Res Inst, Setagaya Ku, Tokyo 1568502, Japan
[6] Univ Minnesota, Dept Chem Engn & Mat Sci, Minneapolis, MN 55455 USA
关键词
READ ALIGNMENT; RNA-SEQ; GENOME; GENERATION; TOOL; SEARCH; SANGER; BLAST;
D O I
10.1093/nar/gkr344
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We identified the sequence-specific starting positions of consecutive miscalls in the mapping of reads obtained from the Illumina Genome Analyser (GA). Detailed analysis of the miscall pattern indicated that the underlying mechanism involves sequence-specific interference of the base elongation process during sequencing. The two major sequence patterns that trigger this sequence-specific error (SSE) are: (i) inverted repeats and (ii) GGC sequences. We speculate that these sequences favor dephasing by inhibiting single-base elongation, by: (i) folding single-stranded DNA and (ii) altering enzyme preference. This phenomenon is a major cause of sequence coverage variability and of the unfavorable bias observed for population-targeted methods such as RNA-seq and ChIP-seq. Moreover, SSE is a potential cause of false single-nucleotide polymorphism (SNP) calls and also significantly hinders de novo assembly. This article highlights the importance of recognizing SSE and its underlying mechanisms in the hope of enhancing the potential usefulness of the Illumina sequencers.
引用
收藏
页数:13
相关论文
共 39 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Solexa Ltd
    Bennett, S
    [J]. PHARMACOGENOMICS, 2004, 5 (04) : 433 - 438
  • [3] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [4] Short read fragment assembly of bacterial genomes
    Chaisson, Mark J.
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2008, 18 (02) : 324 - 330
  • [5] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [6] The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
    Cock, Peter J. A.
    Fields, Christopher J.
    Goto, Naohisa
    Heuer, Michael L.
    Rice, Peter M.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (06) : 1767 - 1771
  • [7] De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data
    DiGuistini, Scott
    Liao, Nancy Y.
    Platt, Darren
    Robertson, Gordon
    Seidel, Michael
    Chan, Simon K.
    Docking, T. Roderick
    Birol, Inanc
    Holt, Robert A.
    Hirst, Martin
    Mardis, Elaine
    Marra, Marco A.
    Hamelin, Richard C.
    Bohlmann, Joerg
    Breuil, Colette
    Jones, Steven J. M.
    [J]. GENOME BIOLOGY, 2009, 10 (09):
  • [8] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [9] Alta-Cyclic: a selfoptimizing base caller for next-generation sequencing
    Erlich, Yaniv
    Mitra, Partha P.
    delaBastide, Melissa
    McCombie, W. Richard
    Hannon, Gregory J.
    [J]. NATURE METHODS, 2008, 5 (08) : 679 - 682
  • [10] De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads
    Farrer, Rhys A.
    Kemen, Eric
    Jones, Jonathan D. G.
    Studholme, David J.
    [J]. FEMS MICROBIOLOGY LETTERS, 2009, 291 (01) : 103 - 111