ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

被引:41
作者
Rognes, T [1 ]
机构
[1] Univ Oslo, Natl Hosp, Inst Med Microbiol, Dept Mol Biol, NO-0027 Oslo, Norway
关键词
D O I
10.1093/nar/29.7.1647
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised, The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used, In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach, Online searches are available at http://dna.uio.no/search/.
引用
收藏
页码:1647 / 1652
页数:6
相关论文
共 26 条
[1]  
ALPERN B, 1995, P 1995 ACM IEEE SUP
[2]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[7]   BLAZE (TM) - AN IMPLEMENTATION OF THE SMITH-WATERMAN SEQUENCE COMPARISON ALGORITHM ON A MASSIVELY-PARALLEL COMPUTER [J].
BRUTLAG, DL ;
DAUTRICOURT, JP ;
DIAZ, R ;
FIER, J ;
MOXON, B ;
STAMM, R .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :203-207
[8]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[9]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[10]  
Hughey R, 1996, COMPUT APPL BIOSCI, V12, P473