Query-dependent banding (QDB) for faster RNA similarity searches

被引：236

作者：

Nawrocki, Eric P. ^{[1
]}

Eddy, Sean R. ^{[1
]}

机构：

[1] Howard Hughes Med Inst, Ashburn, VA USA

来源：

PLOS COMPUTATIONAL BIOLOGY | 2007年 / 3卷 / 03期

关键词：

D O I：

10.1371/journal.pcbi.0030056

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN2.4 to LN1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

引用

页码：540 / 554

页数：15

共 45 条

[1] AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE
ALTSCHUL, SF
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) : 555 - 565
[2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Altschul, SF
Madden, TL
Schaffer, AA
Zhang, JH
Zhang, Z
Miller, W
Lipman, DJ
[J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
[3] [Anonymous], AFCRL65758
[4] BROWN M, 1993, P 1 INT C INT SYST M, P47
[5] Brown M. P. S., 2000, Proceedings. Eighth International Conference on Intelligent Systems for Molecular Biology, P57
[6] LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA
Brudno, M
Do, CB
Cooper, GM
Kim, MF
Davydov, E
Green, ED
Sidow, A
Batzoglou, S
[J]. GENOME RESEARCH, 2003, 13 (04) : 721 - 731
[7] DURBIN R, 1998, BIOL SEQUENCE ANAL B
[8] A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure
Eddy, SR
[J]. BMC BIOINFORMATICS, 2002, 3 (1)
[9] RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS
EDDY, SR
DURBIN, R
[J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (11) : 2079 - 2088
[10] EDDY SR, 2003, INFERNEL USERS GUIDE

← 1 2 3 4 5 →