Classification of DNA sequences using Bloom filters

被引:44
作者
Stranneheim, Henrik [1 ]
Kaller, Max [2 ]
Allander, Tobias [3 ]
Andersson, Bjorn [4 ]
Arvestad, Lars [5 ]
Lundeberg, Joakim [1 ]
机构
[1] KTH Royal Inst Technol, Sci Life Lab, SE-10044 Stockholm, Sweden
[2] LingVitae AB, S-11421 Stockholm, Sweden
[3] Karolinska Inst, Karolinska Univ Hosp, Dept Microbiol, Lab Clin Microbiol Tumor & Cell Biol, SE-17176 Stockholm, Sweden
[4] Karolinska Inst, Dept Cell & Mol Biol, SE-17177 Stockholm, Sweden
[5] Royal Inst Technol, AlbaNova Univ Ctr, Stockholm Bioinformat Ctr, Sch Comp Sci & Commun, S-10691 Stockholm, Sweden
基金
瑞典研究理事会;
关键词
ALIGNMENT; GENOME; PARVOVIRUS;
D O I
10.1093/bioinformatics/btq230
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences.
引用
收藏
页码:1595 / 1600
页数:6
相关论文
共 17 条
[1]   Cloning of a human parvovirus by molecular screening of respiratory tract samples [J].
Allander, T ;
Tammi, MT ;
Eriksson, M ;
Bjerkner, A ;
Tiveljung-Lindell, A ;
Andersson, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (36) :12891-12896
[2]   A virus discovery method incorporating DNase treatment and its application to the identification of two bovine parvovirus species [J].
Allander, T ;
Emerson, SU ;
Engle, RE ;
Purcell, RH ;
Bukh, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (20) :11609-11614
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]  
Broder Andrei, 2002, Internet mathematics, P636, DOI DOI 10.1080/15427951.2004.10129096
[5]   A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis [J].
Down, Thomas A. ;
Rakyan, Vardhman K. ;
Turner, Daniel J. ;
Flicek, Paul ;
Li, Heng ;
Kulesha, Eugene ;
Graf, Stefan ;
Johnson, Nathan ;
Herrero, Javier ;
Tomazou, Eleni M. ;
Thorne, Natalie P. ;
Backdahl, Liselotte ;
Herberth, Marlis ;
Howe, Kevin L. ;
Jackson, David K. ;
Miretti, Marcos M. ;
Marioni, John C. ;
Birney, Ewan ;
Hubbard, Tim J. P. ;
Durbin, Richard ;
Tavare, Simon ;
Beck, Stephan .
NATURE BIOTECHNOLOGY, 2008, 26 (07) :779-785
[6]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[7]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[8]   The first human acute myeloid leukemia genome ever fully sequenced [J].
Falini, Brunangelo .
HAEMATOLOGICA, 2024, 109 (01) :1-2
[9]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[10]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760