Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

被引:64
作者
Blom, Jochen [1 ]
Jakobi, Tobias
Doppmeier, Daniel [1 ]
Jaenicke, Sebastian
Kalinowski, Joern [1 ]
Stoye, Jens [2 ,3 ]
Goesmann, Alexander [3 ,4 ]
机构
[1] Univ Bielefeld, CeBiTec, Inst Genome Res & Syst Biol, Bielefeld, Germany
[2] Univ Bielefeld, Fac Technol, Bielefeld, Germany
[3] Univ Bielefeld, Inst Bioinformat, Bielefeld, Germany
[4] Univ Bielefeld, CeBiTec, Bioinformat Resource Facil, Bielefeld, Germany
关键词
SEQUENCE;
D O I
10.1093/bioinformatics/btr151
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The introduction of next-generation sequencing techniques and especially the high-throughput systems Solexa (Illumina Inc.) and SOLiD (ABI) made the mapping of short reads to reference sequences a standard application in modern bioinformatics. Short-read alignment is needed for reference based re-sequencing of complete genomes as well as for gene expression analysis based on transcriptome sequencing. Several approaches were developed during the last years allowing for a fast alignment of short sequences to a given template. Methods available to date use heuristic techniques to gain a speedup of the alignments, thereby missing possible alignment positions. Furthermore, most approaches return only one best hit for every query sequence, thus losing the potentially valuable information of alternative alignment positions with identical scores. Results: We developed SARUMAN (Semiglobal Alignment of short Reads Using CUDA and NeedleMAN-Wunsch), a mapping approach that returns all possible alignment positions of a read in a reference sequence under a given error threshold, together with one optimal alignment for each of these positions. Alignments are computed in parallel on graphics hardware, facilitating an considerable speedup of this normally time-consuming step. Combining our filter algorithm with CUDA-accelerated alignments, we were able to align reads to microbial genomes in time comparable or even faster than all published approaches, while still providing an exact, complete and optimal result. At the same time, SARUMAN runs on every standard Linux PC with a CUDA-compatible graphics accelerator.
引用
收藏
页码:1351 / 1358
页数:8
相关论文
共 17 条
[1]  
[Anonymous], 1994, 124 DIG SYST RES CTR
[2]  
Califano A., 2002, COMP VIS PATT REC 19, P353
[3]   PASS: a program to align short sequences [J].
Campagna, Davide ;
Albiero, Alessandro ;
Bilardi, Alessandra ;
Caniato, Elisa ;
Forcato, Claudio ;
Manavski, Svetlin ;
Vitulo, Nicola ;
Valle, Giorgio .
BIOINFORMATICS, 2009, 25 (07) :967-968
[4]   Striped Smith-Waterman speeds database searches six times over other SIMD implementations [J].
Farrar, Michael .
BIOINFORMATICS, 2007, 23 (02) :156-161
[5]  
JOKINEN P, 1991, LECT NOTES COMPUT SC, V520, P240
[6]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[7]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[8]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[9]   SOAP2: an improved ultrafast tool for short read alignment [J].
Li, Ruiqiang ;
Yu, Chang ;
Li, Yingrui ;
Lam, Tak-Wah ;
Yiu, Siu-Ming ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2009, 25 (15) :1966-1967
[10]   CUDASW++: Optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units [J].
Liu Y. ;
Maskell D.L. ;
Schmidt B. .
BMC Research Notes, 2 (1)