A deterministic finite automaton for faster protein hit detection in BLAST

被引:25
作者
Cameron, Michael
Williams, Hugh E.
Cannane, Adam
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
[2] RMIT Univ, Sch Comp Sci & Informat Technol, Melbourne, Vic 3001, Australia
关键词
BLAST; homology search; sequence alignment;
D O I
10.1089/cmb.2006.13.965
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BLAST is the most popular bioinformatics tool and is used to run millions of queries each day. However, evaluating such queries is slow, taking typically minutes on modern workstations. Therefore, continuing evolution of BLAST-by improving its algorithms and optimizations-is essential to improve search times in the face of exponentially increasing collection sizes. We present an optimization to the first stage of the BLAST algorithm specifically designed for protein search. It produces the same results as NCBI-BLAST but in around 59% of the time on Intel-based platforms; we also present results for other popular architectures. Overall, this is a saving of around 15% of the total typical BLAST search time. Our approach uses a deterministic finite automaton (DFA), inspired by the original scheme used in the 1990 BLAST algorithm. The techniques are optimized for modern hardware, making careful use of cache-conscious approaches to improve speed. Our optimized DFA approach has been integrated into a new version of BLAST that is freely available for download at http://www.fsa-blast.org/.
引用
收藏
页码:965 / 978
页数:14
相关论文
共 21 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[4]   Optimizing multiple seeds for protein homology search [J].
Brown, DG .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (01) :29-38
[5]   Improved gapped alignment in BLAST [J].
Cameron, M ;
Williams, HE ;
Cannane, A .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2004, 1 (03) :116-129
[6]   The ASTRAL Compendium in 2004 [J].
Chandonia, JM ;
Hon, G ;
Walker, NS ;
Lo Conte, L ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D189-D192
[7]   Assessing sequence comparison methods with the average precision criterion [J].
Chen, ZR .
BIOINFORMATICS, 2003, 19 (18) :2456-2460
[8]   Mastering seeds for genomic size nucleotide BLAST searches [J].
Gotea, V ;
Veeramachaneni, V ;
Makalowski, W .
NUCLEIC ACIDS RESEARCH, 2003, 31 (23) :6935-6941
[9]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[10]  
Li Ming, 2004, J Bioinform Comput Biol, V2, P417, DOI 10.1142/S0219720004000661