Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory

被引:890
作者
Chaisson, Mark J. [1 ]
Tesler, Glenn [2 ]
机构
[1] Pacific Biosci, Dept Secondary Anal, Menlo Pk, CA USA
[2] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家卫生研究院;
关键词
BURROWS-WHEELER TRANSFORM; MULTIPLE ALIGNMENT; DNA-SEQUENCES; SEARCH; MOUSE; TOOL; IDENTIFICATION; DUPLICATION; GENERATION; ALGORITHM;
D O I
10.1186/1471-2105-13-238
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. Results: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. Conclusions: The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
引用
收藏
页数:17
相关论文
共 35 条
[21]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[22]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[23]   Fast and accurate long-read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2010, 26 (05) :589-595
[24]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[25]   SOAP: short oligonucleotide alignment program [J].
Li, Ruiqiang ;
Li, Yingrui ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2008, 24 (05) :713-714
[26]   SOAP2: an improved ultrafast tool for short read alignment [J].
Li, Ruiqiang ;
Yu, Chang ;
Li, Yingrui ;
Lam, Tak-Wah ;
Yiu, Siu-Ming ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2009, 25 (15) :1966-1967
[27]   RAPID AND SENSITIVE PROTEIN SIMILARITY SEARCHES [J].
LIPMAN, DJ ;
PEARSON, WR .
SCIENCE, 1985, 227 (4693) :1435-1441
[28]   SUFFIX ARRAYS - A NEW METHOD FOR ONLINE STRING SEARCHES [J].
MANBER, U ;
MYERS, G .
SIAM JOURNAL ON COMPUTING, 1993, 22 (05) :935-948
[29]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[30]   Origins of the E. coli Strain Causing an Outbreak of Hemolytic-Uremic Syndrome in Germany [J].
Rasko, David A. ;
Webster, Dale R. ;
Sahl, Jason W. ;
Bashir, Ali ;
Boisen, Nadia ;
Scheutz, Flemming ;
Paxinos, Ellen E. ;
Sebra, Robert ;
Chin, Chen-Shan ;
Iliopoulos, Dimitris ;
Klammer, Aaron ;
Peluso, Paul ;
Lee, Lawrence ;
Kislyuk, Andrey O. ;
Bullard, James ;
Kasarskis, Andrew ;
Wang, Susanna ;
Eid, John ;
Rank, David ;
Redman, Julia C. ;
Steyert, Susan R. ;
Frimodt-Moller, Jakob ;
Struve, Carsten ;
Petersen, Andreas M. ;
Krogfelt, Karen A. ;
Nataro, James P. ;
Schadt, Eric E. ;
Waldor, Matthew K. .
NEW ENGLAND JOURNAL OF MEDICINE, 2011, 365 (08) :709-717