Comparative analysis of algorithms for next-generation sequencing read alignment

被引:136
作者
Ruffalo, Matthew [1 ]
LaFramboise, Thomas [2 ,3 ]
Koyutuerk, Mehmet [1 ,3 ]
机构
[1] Case Western Reserve Univ, Dept Elect Engn & Comp Sci, Cleveland, OH 44106 USA
[2] Case Western Reserve Univ, Dept Genet, Cleveland, OH 44106 USA
[3] Case Western Reserve Univ, Ctr Prote & Bioinformat, Cleveland, OH 44106 USA
基金
美国国家科学基金会;
关键词
ULTRAFAST;
D O I
10.1093/bioinformatics/btr477
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The advent of next-generation sequencing (NGS) techniques presents many novel opportunities for many applications in life sciences. The vast number of short reads produced by these techniques, however, pose significant computational challenges. The first step in many types of genomic analysis is the mapping of short reads to a reference genome, and several groups have developed dedicated algorithms and software packages to perform this function. As the developers of these packages optimize their algorithms with respect to various considerations, the relative merits of different software packages remain unclear. However, for scientists who generate and use NGS data for their specific research projects, an important consideration is choosing the software that is most suitable for their application. Results: With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage. We also develop criteria to compare the performances of software with disparate output structure (e.g. some packages return a single alignment while some return multiple possible alignments). Using these criteria, we comprehensively evaluate the performances of Bowtie, BWA, mr- and mrsFAST, Novoalign, SHRiMP and SOAPv2, with regard to accuracy and runtime. Conclusion: We expect that the results presented here will be useful to investigators in choosing the alignment software that is most suitable for their specific research aims. Our results also provide insights into the factors that should be considered to use alignment results effectively. Seal can also be used to evaluate the performance of algorithms that use deep sequencing data for various purposes (e.g. identification of genomic variants).
引用
收藏
页码:2790 / 2796
页数:7
相关论文
共 31 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   Systems medicine: the future of medical genomics and healthcare [J].
Auffray, Charles ;
Chen, Zhu ;
Hood, Leroy .
GENOME MEDICINE, 2009, 1
[3]   Recent segmental duplications in the human genome [J].
Bailey, JA ;
Gu, ZP ;
Clark, RA ;
Reinert, K ;
Samonte, RV ;
Schwartz, S ;
Adams, MD ;
Myers, EW ;
Li, PW ;
Eichler, EE .
SCIENCE, 2002, 297 (5583) :1003-1007
[4]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[5]  
Califano A, 1993, Proc Int Conf Intell Syst Mol Biol, V1, P56
[6]   Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence [J].
Cheung, J ;
Estivill, X ;
Khaja, R ;
MacDonald, JR ;
Lau, K ;
Tsui, LC ;
Scherer, SW .
GENOME BIOLOGY, 2003, 4 (04)
[7]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[8]   Opportunistic data structures with applications [J].
Ferragina, P ;
Manzini, G .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398
[9]   A transcriptional sketch of a primary human breast cancer by 454 deep sequencing [J].
Guffanti, Alessandro ;
Iacono, Michele ;
Pelucchi, Paride ;
Kim, Namshin ;
Solda, Giulia ;
Croft, Larry J. ;
Taft, Ryan J. ;
Rizzi, Ermanno ;
Askarian-Amiri, Marjan ;
Bonnal, Raoul J. ;
Callari, Maurizio ;
Mignone, Flavio ;
Pesole, Graziano ;
Bertalot, Giovanni ;
Bernardi, Luigi Rossi ;
Albertini, Alberto ;
Lee, Christopher ;
Mattick, John S. ;
Zucchi, Ileana ;
De Bellis, Gianluca .
BMC GENOMICS, 2009, 10
[10]   mrsFAST: a cache-oblivious algorithm for short-read mapping [J].
Hach, Faraz ;
Hormozdiari, Fereydoun ;
Alkan, Can ;
Hormozdiari, Farhad ;
Birol, Inanc ;
Eichler, Evan E. ;
Sahinalp, S. Cenk .
NATURE METHODS, 2010, 7 (08) :576-577