miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

被引:21
作者
Kim, YJ
Boyd, A
Athey, BD
Patel, JM [1 ]
机构
[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Michigan Ctr Biol Informat, Ann Arbor, MI 48109 USA
关键词
D O I
10.1093/nar/gki739
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Existing tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users.
引用
收藏
页码:4335 / 4344
页数:10
相关论文
共 17 条
[1]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], [No title captured], DOI DOI 10.1145/299432.299460
[4]  
[Anonymous], 2003, 3 LCI INT C LINUX CL
[5]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974
[6]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[7]   MPBLAST:: improved BLAST performance with multiplexed queries [J].
Korf, I ;
Gish, W .
BIOINFORMATICS, 2000, 16 (11) :1052-1053
[8]   Microarray analysis of developmental plasticity in monkey primary visual cortex [J].
Lachance, PED ;
Chaudhuri, A .
JOURNAL OF NEUROCHEMISTRY, 2004, 88 (06) :1455-1469
[9]   Guidelines for incorporating non-perfectly matched oligonucleotides into target-specific hybridization probes for a DNA microarray [J].
Lee, I ;
Dombkowski, AA ;
Athey, BD .
NUCLEIC ACIDS RESEARCH, 2004, 32 (02) :681-690
[10]  
NAVARRO G, 1998, CLEI ELECT J, V1, P1725