A novel and well-defined benchmarking method for second generation read mapping

被引:53
作者
Holtgrewe, Manuel [1 ]
Emde, Anne-Katrin [1 ,2 ]
Weese, David [1 ]
Reinert, Knut [1 ]
机构
[1] Free Univ Berlin, Dept Comp Sci, D-14195 Berlin, Germany
[2] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
GENOME; EFFICIENT; ALGORITHM; ULTRAFAST; ALIGNMENT; SEQUENCE;
D O I
10.1186/1471-2105-12-210
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not been easy to compare different read mapping approaches in a unified way and to determine which program is the best for what task. Results: We present a new benchmark method, called Rabema (Read Alignment BEnchMArk), for read mappers. It consists of a strict definition of the read mapping problem and of tools to evaluate the result of arbitrary read mappers supporting the SAM output format. Conclusions: We show the usefulness of the benchmark program by performing a comparison of popular read mappers. The tools supporting the benchmark are licensed under the GPL and available from http://www.seqan.de/projects/rabema.html.
引用
收藏
页数:10
相关论文
共 22 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   From identification to validation to gene count [J].
Amid, Clara ;
Frankish, Adam ;
Havana ;
Aken, Bronwen ;
Ezkurdia, Iakes ;
Kokocinsk, Felix ;
Gilbert, James ;
White, Simon ;
Carninci, Piero ;
Gingeras, Thomas ;
Guigo, Roderic ;
Searle, Steve ;
Tress, Michael L. ;
Harrow, Jennifer ;
Hubbard, Tim .
GENOME BIOLOGY, 2010, 11
[3]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[4]  
David M., Bioinformatics
[5]   Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming [J].
Deng, Jie ;
Shoemaker, Robert ;
Xie, Bin ;
Gore, Athurva ;
LeProust, Emily M. ;
Antosiewicz-Bourget, Jessica ;
Egli, Dieter ;
Maherali, Nimet ;
Park, In-Hyun ;
Yu, Junying ;
Daley, George Q. ;
Eggan, Kevin ;
Hochedlinger, Konrad ;
Thomson, James ;
Wang, Wei ;
Gao, Yuan ;
Zhang, Kun .
NATURE BIOTECHNOLOGY, 2009, 27 (04) :353-360
[6]   SeqAn An efficient, generic C++ library for sequence analysis [J].
Doering, Andreas ;
Weese, David ;
Rausch, Tobias ;
Reinert, Knut .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[8]   EGASP:: the human ENCODE genome annotation assessment project [J].
Guigo, Roderic ;
Flicek, Paul ;
Abril, Josep F. ;
Reymond, Alexandre ;
Lagarde, Julien ;
Denoeud, France ;
Antonarakis, Stylianos ;
Ashburner, Michael ;
Bajic, Vladimir B. ;
Birney, Ewan ;
Castelo, Robert ;
Eyras, Eduardo ;
Ucla, Catherine ;
Gingeras, Thomas R. ;
Harrow, Jennifer ;
Hubbard, Tim ;
Lewis, Suzanna E. ;
Reese, Martin G. .
GENOME BIOLOGY, 2006, 7 (Suppl 1)
[9]  
Holtgrewe M., 2010, Technical Report TR-B-10-06
[10]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)