ALLPATHS: De novo assembly of whole-genome shotgun microreads

被引:558
作者
Butler, Jonathan [1 ]
MacCallum, Iain [1 ]
Kleber, Michael [1 ]
Shlyakhter, Ilya A. [1 ]
Belmonte, Matthew K. [1 ,2 ]
Lander, Eric S. [1 ,3 ,4 ,5 ]
Nusbaum, Chad [1 ]
Jaffe, David B. [1 ]
机构
[1] Broad Inst MIT & Havard, Cambridge, MA 02141 USA
[2] Cornell Univ, Dept Human Dev, Ithaca, NY 14853 USA
[3] MIT, Whitehead Inst Biomed Res, Cambridge, MA 02139 USA
[4] MIT, Dept Biol, Cambridge, MA 02139 USA
[5] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02115 USA
关键词
D O I
10.1101/gr.7337908
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun "microreads." For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80x coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.
引用
收藏
页码:810 / 820
页数:11
相关论文
共 11 条
  • [1] Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
  • [2] SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. GENOME RESEARCH, 2007, 17 (11) : 1697 - 1706
  • [3] Extending assembly of short DNA sequences to handle error
    Jeck, William R.
    Reinhardt, Josephine A.
    Baltrus, David A.
    Hickenbotham, Matthew T.
    Magrini, Vincent
    Mardis, Elaine R.
    Dangl, Jeffery L.
    Jones, Corbin D.
    [J]. BIOINFORMATICS, 2007, 23 (21) : 2942 - 2944
  • [4] Genome-wide mapping of in vivo protein-DNA interactions
    Johnson, David S.
    Mortazavi, Ali
    Myers, Richard M.
    Wold, Barbara
    [J]. SCIENCE, 2007, 316 (5830) : 1497 - 1502
  • [5] LOW G, 2004, GRAPHVIZ
  • [6] Genome-wide maps of chromatin state in pluripotent and lineage-committed cells
    Mikkelsen, Tarjei S.
    Ku, Manching
    Jaffe, David B.
    Issac, Biju
    Lieberman, Erez
    Giannoukos, Georgia
    Alvarez, Pablo
    Brockman, William
    Kim, Tae-Kyung
    Koche, Richard P.
    Lee, William
    Mendenhall, Eric
    O'Donovan, Aisling
    Presser, Aviva
    Russ, Carsten
    Xie, Xiaohui
    Meissner, Alexander
    Wernig, Marius
    Jaenisch, Rudolf
    Nusbaum, Chad
    Lander, Eric S.
    Bernstein, Bradley E.
    [J]. NATURE, 2007, 448 (7153) : 553 - U2
  • [7] An Eulerian path approach to DNA fragment assembly
    Pevzner, PA
    Tang, HX
    Waterman, MS
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (17) : 9748 - 9753
  • [8] A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
    Sachidanandam, R
    Weissman, D
    Schmidt, SC
    Kakol, JM
    Stein, LD
    Marth, G
    Sherry, S
    Mullikin, JC
    Mortimore, BJ
    Willey, DL
    Hunt, SE
    Cole, CG
    Coggill, PC
    Rice, CM
    Ning, ZM
    Rogers, J
    Bentley, DR
    Kwok, PY
    Mardis, ER
    Yeh, RT
    Schultz, B
    Cook, L
    Davenport, R
    Dante, M
    Fulton, L
    Hillier, L
    Waterston, RH
    McPherson, JD
    Gilman, B
    Schaffner, S
    Van Etten, WJ
    Reich, D
    Higgins, J
    Daly, MJ
    Blumenstiel, B
    Baldwin, J
    Stange-Thomann, NS
    Zody, MC
    Linton, L
    Lander, ES
    Altshuler, D
    [J]. NATURE, 2001, 409 (6822) : 928 - 933
  • [9] DNA SEQUENCING WITH CHAIN-TERMINATING INHIBITORS
    SANGER, F
    NICKLEN, S
    COULSON, AR
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1977, 74 (12) : 5463 - 5467
  • [10] Accurate multiplex polony sequencing of an evolved bacterial genome
    Shendure, J
    Porreca, GJ
    Reppas, NB
    Lin, XX
    McCutcheon, JP
    Rosenbaum, AM
    Wang, MD
    Zhang, K
    Mitra, RD
    Church, GM
    [J]. SCIENCE, 2005, 309 (5741) : 1728 - 1732