Fragment assembly with short reads

被引:124
作者
Chaisson, M [1 ]
Pevzner, P
Tang, HX
机构
[1] Univ Calif San Diego, Bioinformat Program, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
D O I
10.1093/bioinformatics/bth205
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Current DNA sequencing technology produces reads of about 500-750 bp, with typical coverage under 10x. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30x and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. Results: We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts.
引用
收藏
页码:2067 / 2074
页数:8
相关论文
共 29 条
  • [1] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [2] A new approach to sequence comparison:: normalired sequence alignment
    Arslan, AN
    Egecioglu, Ö
    Pevzner, PA
    [J]. BIOINFORMATICS, 2001, 17 (04) : 327 - 337
  • [3] Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
  • [4] Böcker S, 2003, LECT N BIOINFORMAT, V2812, P476
  • [5] SNP and mutation discovery using base-specific cleavage and MALDI-TOF mass spectrometry
    Boecker, Sebastian
    [J]. BIOINFORMATICS, 2003, 19 : i44 - i53
  • [6] Sequence information can be obtained from single DNA molecules
    Braslavsky, I
    Hebert, B
    Kartalov, E
    Quake, SR
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (07) : 3960 - 3964
  • [7] Corman T., 1990, INTRO ALGORITHMS
  • [8] DRMANAC R, 1989, GENOMICS, V4, P114
  • [9] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [10] Pyrosequencing™:: An accurate detection platform for single nucleotide polymorphisms
    Fakhrai-Rad, H
    Pourmand, N
    Ronaghi, M
    [J]. HUMAN MUTATION, 2002, 19 (05) : 479 - 485