Fast and accurate long-read alignment with Burrows-Wheeler transform

被引:7700
作者
Li, Heng [1 ]
Durbin, Richard [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
DNA-SEQUENCES; SEARCH; GENOME; BLAST;
D O I
10.1093/bioinformatics/btp698
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many programs for aligning short sequencing reads to a reference genome have been developed in the last 2 years. Most of them are very efficient for short reads but inefficient or not applicable for reads >200 bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate. However, some sequencing platforms already produce longer reads and others are expected to become available soon. For longer reads, hashing-based software such as BLAT and SSAHA2 remain the only choices. Nonetheless, these methods are substantially slower than short-read aligners in terms of aligned bases per unit time. Results: We designed and implemented a new algorithm, Burrows-Wheeler Aligner's Smith-Waterman Alignment (BWA-SW), to align long sequences up to 1 Mb against a large sequence database (e. g. the human genome) with a few gigabytes of memory. The algorithm is as accurate as SSAHA2, more accurate than BLAT, and is several to tens of times faster than both.
引用
收藏
页码:589 / 595
页数:7
相关论文
共 22 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT [J].
BLUMER, A ;
BLUMER, J ;
HAUSSLER, D ;
EHRENFEUCHT, A ;
CHEN, MT ;
SEIFERAS, J .
THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) :31-55
[3]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[4]   Real-Time DNA Sequencing from Single Polymerase Molecules [J].
Eid, John ;
Fehr, Adrian ;
Gray, Jeremy ;
Luong, Khai ;
Lyle, John ;
Otto, Geoff ;
Peluso, Paul ;
Rank, David ;
Baybayan, Primo ;
Bettman, Brad ;
Bibillo, Arkadiusz ;
Bjornson, Keith ;
Chaudhuri, Bidhan ;
Christians, Frederick ;
Cicero, Ronald ;
Clark, Sonya ;
Dalal, Ravindra ;
deWinter, Alex ;
Dixon, John ;
Foquet, Mathieu ;
Gaertner, Alfred ;
Hardenbol, Paul ;
Heiner, Cheryl ;
Hester, Kevin ;
Holden, David ;
Kearns, Gregory ;
Kong, Xiangxu ;
Kuse, Ronald ;
Lacroix, Yves ;
Lin, Steven ;
Lundquist, Paul ;
Ma, Congcong ;
Marks, Patrick ;
Maxham, Mark ;
Murphy, Devon ;
Park, Insil ;
Pham, Thang ;
Phillips, Michael ;
Roy, Joy ;
Sebra, Robert ;
Shen, Gene ;
Sorenson, Jon ;
Tomaney, Austin ;
Travers, Kevin ;
Trulson, Mark ;
Vieceli, John ;
Wegener, Jeffrey ;
Wu, Dawn ;
Yang, Alicia ;
Zaccarin, Denis .
SCIENCE, 2009, 323 (5910) :133-138
[5]   Opportunistic data structures with applications [J].
Ferragina, P ;
Manzini, G .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398
[6]   SeqMap: mapping massive amount of oligonucleotides to the genome [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2008, 24 (20) :2395-2396
[7]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[8]   Compressed indexing and local alignment of DNA [J].
Lam, T. W. ;
Sung, W. K. ;
Tam, S. L. ;
Wong, C. K. ;
Yiu, S. M. .
BIOINFORMATICS, 2008, 24 (06) :791-797
[9]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[10]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858