Heterogeneous Cloud Framework for Big Data Genome Sequencing

被引:33
作者
Wang, Chao [1 ]
Li, Xi [2 ]
Chen, Peng [1 ]
Wang, Aili [3 ]
Zhou, Xuehai [2 ]
Yu, Hong [4 ]
机构
[1] Univ Sci & Technol China, Dept Comp Sci, Hefei 230027, Anhui, Peoples R China
[2] Univ Sci & Technol China, Suzhou Inst, Suzhou 215123, Jiangsu, Peoples R China
[3] Univ Sci & Technol China, Sch Software Engn, Suzhou 215123, Jiangsu, Peoples R China
[4] Chinese Acad Sci, Ctr Plant Gene Res, Inst Genet & Dev Biol, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
Short reads; genome sequencing; mapping; reconfigurable hardware; FPGA; READ ALIGNMENT; FPGA; ARCHITECTURE;
D O I
10.1109/TCBB.2014.2351800
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
The next generation genome sequencing problem with short (long) reads is an emerging field in numerous scientific and big data research domains. However, data sizes and ease of access for scientific researchers are growing and most current methodologies rely on one acceleration approach and so cannot meet the requirements imposed by explosive data scales and complexities. In this paper, we propose a novel FPGA-based acceleration solution with MapReduce framework on multiple hardware accelerators. The combination of hardware acceleration and MapReduce execution flow could greatly accelerate the task of aligning short length reads to a known reference genome. To evaluate the performance and other metrics, we conducted a theoretical speedup analysis on a MapReduce programming platform, which demonstrates that our proposed architecture have efficient potential to improve the speedup for large scale genome sequencing applications. Also, as a practical study, we have built a hardware prototype on the real Xilinx FPGA chip. Significant metrics on speedup, sensitivity, mapping quality, error rate, and hardware cost are evaluated, respectively. Experimental results demonstrate that the proposed platform could efficiently accelerate the next generation sequencing problem with satisfactory accuracy and acceptable hardware cost.
引用
收藏
页码:166 / 178
页数:13
相关论文
共 44 条
[1]
EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[2]
Aldwairi M., 2005, Computer Architecture News, V33, P99, DOI 10.1145/1055626.1055640
[3]
[Anonymous], COMPUT ARCHIT LETT, DOI DOI 10.1109/L-CA.2008.5
[4]
[Anonymous], 2014, HDFS ARCHITECTURE GU
[5]
Baeza-Yates R. A., 1992, P 3 ANN S COMB PATT
[6]
Automatic synthesis of efficient intrusion detection systems on FPGAs [J].
Baker, Zachary K. ;
Prasanna, Viktor K. .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2006, 3 (04) :289-300
[7]
A computationally efficient engine for flexible intrusion detection [J].
Baker, ZK ;
Prasanna, VK .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2005, 13 (10) :1179-1189
[8]
Burrows M., 1994, 124 SRS
[9]
Chen P., 2014, IEEE T INTELLIGENT T, P1
[10]
Chen P, 2013, PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), P480, DOI 10.1109/FPT.2013.6718421