Slider-maximum use of probability information for alignment of short sequence reads and SNP detection

被引:31
作者
Malhis, Nawar [1 ]
Butterfield, Yaron S. N. [1 ]
Ester, Martin [2 ]
Jones, Steven J. M. [1 ]
机构
[1] BC Canc Agcy, Genome Sci Ctr, Vancouver, BC, Canada
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
关键词
QUALITY SCORES; SEARCH;
D O I
10.1093/bioinformatics/btn565
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. Results: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.
引用
收藏
页码:6 / 13
页数:8
相关论文
共 13 条
[1]   EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   Quality scores and SNP detection in sequencing-by-synthesis systems [J].
Brockman, William ;
Alvarez, Pablo ;
Young, Sarah ;
Garber, Manuel ;
Giannoukos, Georgia ;
Lee, William L. ;
Russ, Carsten ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :763-770
[4]   Fast algorithms for large-scale genome alignment and comparison [J].
Delcher, AL ;
Phillippy, A ;
Carlton, J ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2002, 30 (11) :2478-2483
[5]  
EPPSTEIN D, 1990, PROCEEDINGS OF THE FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P513
[6]  
GARCIAMOLINA H, 2008, DATABASE SYSTEMS COM, pCH14
[7]   The new paradigm of flow cell sequencing [J].
Holt, Robert A. ;
Jones, Steven J. M. .
GENOME RESEARCH, 2008, 18 (06) :839-846
[8]  
KNUTH D, 1998, EXTERNAL SORTING, V3
[9]   MPBLAST:: improved BLAST performance with multiplexed queries [J].
Korf, I ;
Gish, W .
BIOINFORMATICS, 2000, 16 (11) :1052-1053
[10]   Versatile and open software for comparing large genomes [J].
Kurtz, S ;
Phillippy, A ;
Delcher, AL ;
Smoot, M ;
Shumway, M ;
Antonescu, C ;
Salzberg, SL .
GENOME BIOLOGY, 2004, 5 (02)