High-throughput sequence alignment using Graphics Processing Units

被引:166
作者
Schatz, Michael C. [1 ,2 ]
Trapnell, Cole [1 ,2 ]
Delcher, Arthur L. [1 ,2 ]
Varshney, Amitabh
机构
[1] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
来源
BMC BIOINFORMATICS | 2007年 / 8卷
关键词
COMPUTATION; TOOLS;
D O I
10.1186/1471-2105-8-474
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.
引用
收藏
页数:10
相关论文
共 25 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]  
[Anonymous], 2006, PROC 2006 ACMIEEE C, DOI [DOI 10.1145/1188455.1188549, 10.1145/1188455.1188549]
[3]  
[Anonymous], 2003, SUPERCOMPUTING SC
[4]  
[Anonymous], 1997, ALGORITHMS STRINGS T
[5]   LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA [J].
Brudno, M ;
Do, CB ;
Cooper, GM ;
Kim, MF ;
Davydov, E ;
Green, ED ;
Sidow, A ;
Batzoglou, S .
GENOME RESEARCH, 2003, 13 (04) :721-731
[6]  
BUCK I, 2005, GPU GEMS, P509
[7]  
CHARALAMBOUS M, 2005, PANH C INF, P415
[8]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[9]   Fast algorithms for large-scale genome alignment and comparison [J].
Delcher, AL ;
Phillippy, A ;
Carlton, J ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2002, 30 (11) :2478-2483
[10]  
HARRIS MJ, 2002, P 2002 SIGGRAPH EUR