Hybrid error correction and de novo assembly of single-molecule sequencing reads

被引:787
作者
Koren, Sergey [1 ,2 ]
Schatz, Michael C. [3 ]
Walenz, Brian P. [4 ]
Martin, Jeffrey [5 ]
Howard, Jason T. [6 ]
Ganapathy, Ganeshkumar [6 ]
Wang, Zhong [5 ]
Rasko, David A. [7 ]
McCombie, W. Richard [3 ]
Jarvis, Erich D. [6 ]
Phillippy, Adam M. [1 ]
机构
[1] Natl Biodef Anal & Countermeasures Ctr, Frederick, MD USA
[2] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[3] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[4] J Craig Venter Inst, Rockville, MD USA
[5] DOE Joint Genome Inst, Walnut Creek, CA USA
[6] Duke Univ, Med Ctr, Howard Hughes Med Inst, Dept Neurobiol, Durham, NC 27710 USA
[7] Univ Maryland, Sch Med, Dept Microbiol & Immunol, Inst Genome Sci, Baltimore, MD 21201 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
PYROSEQUENCING READS; STRUCTURAL VARIATION; LARGE GENOMES; GENE; EXPRESSION; SPEECH; FOXP2; COMPLEXITY; GENERATION; RECEPTORS;
D O I
10.1038/nbt.2280
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
引用
收藏
页码:692 / +
页数:10
相关论文
共 50 条
[1]
Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[2]
Slit proteins bind robe receptors and have an evolutionarily conserved role in repulsive axon guidance [J].
Brose, K ;
Bland, KS ;
Wang, KH ;
Arnott, D ;
Henzel, W ;
Goodman, CS ;
Tessier-Lavigne, M ;
Kidd, T .
CELL, 1999, 96 (06) :795-806
[3]
Evolution at two levels: On genes and form [J].
Carroll, SB .
PLOS BIOLOGY, 2005, 3 (07) :1159-1166
[4]
The Origin of the Haitian Cholera Outbreak Strain. [J].
Chin, Chen-Shan ;
Sorenson, Jon ;
Harris, Jason B. ;
Robins, William P. ;
Charles, Richelle C. ;
Jean-Charles, Roger R. ;
Bullard, James ;
Webster, Dale R. ;
Kasarskis, Andrew ;
Peluso, Paul ;
Paxinos, Ellen E. ;
Yamaichi, Yoshiharu ;
Calderwood, Stephen B. ;
Mekalanos, John J. ;
Schadt, Eric E. ;
Waldor, Matthew K. .
NEW ENGLAND JOURNAL OF MEDICINE, 2011, 364 (01) :33-42
[5]
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[6]
Assemblathon 1: A competitive assessment of de novo short read assembly methods [J].
Earl, Dent ;
Bradnam, Keith ;
St John, John ;
Darling, Aaron ;
Lin, Dawei ;
Fass, Joseph ;
Hung On Ken Yu ;
Buffalo, Vince ;
Zerbino, Daniel R. ;
Diekhans, Mark ;
Ngan Nguyen ;
Ariyaratne, Pramila Nuwantha ;
Sung, Wing-Kin ;
Ning, Zemin ;
Haimel, Matthias ;
Simpson, Jared T. ;
Fonseca, Nuno A. ;
Birol, Inanc ;
Docking, T. Roderick ;
Ho, Isaac Y. ;
Rokhsar, Daniel S. ;
Chikhi, Rayan ;
Lavenier, Dominique ;
Chapuis, Guillaume ;
Naquin, Delphine ;
Maillet, Nicolas ;
Schatz, Michael C. ;
Kelley, David R. ;
Phillippy, Adam M. ;
Koren, Sergey ;
Yang, Shiaw-Pyng ;
Wu, Wei ;
Chou, Wen-Chi ;
Srivastava, Anuj ;
Shaw, Timothy I. ;
Ruby, J. Graham ;
Skewes-Cox, Peter ;
Betegon, Miguel ;
Dimon, Michelle T. ;
Solovyev, Victor ;
Seledtsov, Igor ;
Kosarev, Petr ;
Vorobyev, Denis ;
Ramirez-Gonzalez, Ricardo ;
Leggett, Richard ;
MacLean, Dan ;
Xia, Fangfang ;
Luo, Ruibang ;
Li, Zhenyu ;
Xie, Yinlong .
GENOME RESEARCH, 2011, 21 (12) :2224-2241
[7]
Real-Time DNA Sequencing from Single Polymerase Molecules [J].
Eid, John ;
Fehr, Adrian ;
Gray, Jeremy ;
Luong, Khai ;
Lyle, John ;
Otto, Geoff ;
Peluso, Paul ;
Rank, David ;
Baybayan, Primo ;
Bettman, Brad ;
Bibillo, Arkadiusz ;
Bjornson, Keith ;
Chaudhuri, Bidhan ;
Christians, Frederick ;
Cicero, Ronald ;
Clark, Sonya ;
Dalal, Ravindra ;
deWinter, Alex ;
Dixon, John ;
Foquet, Mathieu ;
Gaertner, Alfred ;
Hardenbol, Paul ;
Heiner, Cheryl ;
Hester, Kevin ;
Holden, David ;
Kearns, Gregory ;
Kong, Xiangxu ;
Kuse, Ronald ;
Lacroix, Yves ;
Lin, Steven ;
Lundquist, Paul ;
Ma, Congcong ;
Marks, Patrick ;
Maxham, Mark ;
Murphy, Devon ;
Park, Insil ;
Pham, Thang ;
Phillips, Michael ;
Roy, Joy ;
Sebra, Robert ;
Shen, Gene ;
Sorenson, Jon ;
Tomaney, Austin ;
Travers, Kevin ;
Trulson, Mark ;
Vieceli, John ;
Wegener, Jeffrey ;
Wu, Dawn ;
Yang, Alicia ;
Zaccarin, Denis .
SCIENCE, 2009, 323 (5910) :133-138
[8]
Molecular evolution of FOXP2, a gene involved in speech and language [J].
Enard, W ;
Przeworski, M ;
Fisher, SE ;
Lai, CSL ;
Wiebe, V ;
Kitano, T ;
Monaco, AP ;
Pääbo, S .
NATURE, 2002, 418 (6900) :869-872
[9]
FOXP2 and the role of cortico-basal ganglia circuits in speech and language evolution [J].
Enard, Wolfgang .
CURRENT OPINION IN NEUROBIOLOGY, 2011, 21 (03) :415-424
[10]
Structural variation in the human genome [J].
Feuk, L ;
Carson, AR ;
Scherer, SW .
NATURE REVIEWS GENETICS, 2006, 7 (02) :85-97