Fast filtering for RNA homology search

被引:32
作者
Kolbe, Diana L. [1 ]
Eddy, Sean R. [1 ]
机构
[1] Howard Hughes Med Inst, Ashburn, VA 20147 USA
关键词
SECONDARY STRUCTURE; NONCODING RNA; SPEED-UP; SEQUENCE; ALGORITHM; PREDICTION; ALIGNMENT; COMMON;
D O I
10.1093/bioinformatics/btr545
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Homology search for RNAs can use secondary structure information to increase power by modeling base pairs, as in covariance models, but the resulting computational costs are high. Typical acceleration strategies rely on at least one filtering stage using sequence-only search. Results: Here we present the multi-segment CYK (MSCYK) filter, which implements a heuristic of ungapped structural alignment for RNA homology search. Compared to gapped alignment, this approximation has lower computation time requirements (O(N-4) reduced to O(N-3)), and space requirements (O(N-3) reduced to O(N-2)). A vector-parallel implementation of this method gives up to 100-fold speed-up; vector-parallel implementations of standard gapped alignment at two levels of precision give 3- and 6-fold speed-ups. These approaches are combined to create a filtering pipeline that scores RNA secondary structure at all stages, with results that are synergistic with existing methods.
引用
收藏
页码:3102 / 3109
页数:8
相关论文
共 22 条
[1]  
Bafna V, 2004, 2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, P52
[2]  
Brown M P, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P57
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]  
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[5]   A probabilistic model of local sequence alignment that simplifies statistical significance estimation [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (05)
[6]   A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure [J].
Eddy, SR .
BMC BIOINFORMATICS, 2002, 3 (1)
[7]   Striped Smith-Waterman speeds database searches six times over other SIMD implementations [J].
Farrar, Michael .
BIOINFORMATICS, 2007, 23 (02) :156-161
[8]   Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA [J].
Freyhult, Eva K. ;
Bollback, Jonathan P. ;
Gardner, Paul P. .
GENOME RESEARCH, 2007, 17 (01) :117-125
[9]   Rfam: Wikipedia, clans and the "decimal" release [J].
Gardner, Paul P. ;
Daub, Jennifer ;
Tate, John ;
Moore, Benjamin L. ;
Osuch, Isabelle H. ;
Griffiths-Jones, Sam ;
Finn, Robert D. ;
Nawrocki, Eric P. ;
Kolbe, Diana L. ;
Eddy, Sean R. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D141-D145
[10]   Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% [J].
Havgaard, JH ;
Lyngso, RB ;
Stormo, GD ;
Gorodkin, J .
BIOINFORMATICS, 2005, 21 (09) :1815-1824