Lightweight comparison of RNAs based on exact sequence-structure matches

被引:26
作者
Heyne, Steffen [1 ]
Will, Sebastian [1 ]
Beckstette, Michael [1 ]
Backofen, Rolf [1 ]
机构
[1] Univ Freiburg, Bioinformat Grp, D-79110 Freiburg, Germany
关键词
SECONDARY STRUCTURE; ALIGNMENT PROGRAMS; BENCHMARK; DISTANCE; DATABASE; COMMON; TREES; EDIT;
D O I
10.1093/bioinformatics/btp065
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Specific functions of ribonucleic acid ( RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence-structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method ExpaRNA (exact pattern of alignment of RNA) computes the longest collinear sequence of substructures common to two RNAs in O(H.nm) time and O(nm) space, where H << n.m for real RNA structures. Applied to different RNAs, our method correctly identifies sequence-structure similarities between two RNAs. Results: We have compared ExpaRNA with two other alignment methods that work with given RNA structures, namely RNAforester and RNA_align. The results are in good agreement, but can be obtained in a fraction of running time, in particular for larger RNAs. We have also used ExpaRNA to speed up state-of-the-art Sankoff-style alignment tools like LocARNA, and observe a tradeoff between quality and speed. However, we get a speedup of 4.25 even in the highest quality setting, where the quality of the produced alignment is comparable to that of LocARNA alone.
引用
收藏
页码:2095 / 2102
页数:8
相关论文
共 31 条
[1]   A new distance for high level RNA secondary structure comparison [J].
Allali, J ;
Sagot, MF .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (01) :3-14
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], THESIS U ALBERTA
[4]   Fast detection of common sequence structure patterns in RNAs [J].
Backofen, Rolf ;
Siebert, Sven .
JOURNAL OF DISCRETE ALGORITHMS, 2007, 5 (02) :212-228
[5]   BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations [J].
Bahr, A ;
Thompson, JD ;
Thierry, JC ;
Poch, O .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :323-326
[6]   Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization [J].
Bauer, Markus ;
Klau, Gunnar W. ;
Reinert, Knut .
BMC BIOINFORMATICS, 2007, 8 (1)
[7]  
BLIN G, 2003, RRIRIN0307 U NANT
[8]   The Comparative RNA Web (CRW) Site:: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs:: Correction (vol 3, pg 2, 2002) -: art. no. 15 [J].
Cannone, JJ ;
Subramanian, S ;
Schnare, MN ;
Collett, JR ;
D'Souza, LM ;
Du, YS ;
Feng, B ;
Lin, N ;
Madabusi, LV ;
Müller, KM ;
Pande, N ;
Shang, ZD ;
Yu, N ;
Gutell, RR .
BMC BIOINFORMATICS, 2002, 3 (1)
[9]   A benchmark of multiple sequence alignment programs upon structural RNAs [J].
Gardner, PP ;
Wilm, A ;
Washietl, S .
NUCLEIC ACIDS RESEARCH, 2005, 33 (08) :2433-2439
[10]   Rfam: annotating non-coding RNAs in complete genomes [J].
Griffiths-Jones, S ;
Moxon, S ;
Marshall, M ;
Khanna, A ;
Eddy, SR ;
Bateman, A .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D121-D124