RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

被引:41
作者
Pavesi, G
Mauri, G
Stefani, M
Pesole, G
机构
[1] Univ Milan, Dept Biomol Sci & Biotechnol, I-20133 Milan, Italy
[2] Univ Milan, Dept Comp Sci & Commun, I-20135 Milan, Italy
[3] Univ Milano Bicocca, Dept Comp Sci Syst & Commun, I-20126 Milan, Italy
关键词
D O I
10.1093/nar/gkh650
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The recent interest sparked due to the discovery of a variety of functions for non-coding RNA molecules has highlighted the need for suitable tools for the analysis and the comparison of RNA sequences. Many trans-acting non-coding RNA genes and cis-acting RNA regulatory elements present motifs, conserved both in structure and sequence, that can be hardly detected by primary sequence analysis alone. We present an algorithm that takes as input a set of unaligned RNA sequences expected to share a common motif, and outputs the regions that are most conserved throughout the sequences, according to a similarity measure that takes into account both the sequence of the regions and the secondary structure they can form according to base-pairing and thermodynamic rules. Only a single parameter is needed as input, which denotes the number of distinct hairpins the motif has to contain. No further constraints on the size, number and position of the single elements comprising the motif are required. The algorithm can be split into two parts: first, it extracts from each input sequence a set of candidate regions whose predicted optimal secondary structure contains the number of hairpins given as input. Then, the regions selected are compared with each other to find the groups of most similar ones, formed by a region taken from each sequence. To avoid exhaustive enumeration of the search space and to reduce the execution time, a greedy heuristic is introduced for this task. We present different experiments, which show that the algorithm is capable of characterizing and discovering known regulatory motifs in mRNA like the iron responsive element (IRE) and selenocysteine insertion sequence (SECIS) stem-loop structures. We also show how it can be applied to corrupted datasets in which a motif does not appear in all the input sequences, as well as to the discovery of more complex motifs in the non-coding RNA.
引用
收藏
页码:3258 / 3269
页数:12
相关论文
共 50 条
[1]   IRESdb:: the internal ribosome entry site database [J].
Bonnal, S ;
Boutonnet, C ;
Prado-Lourenço, L ;
Vagner, S .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :427-428
[2]   The Ribonuclease P Database [J].
Brown, JW .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :314-314
[3]   Plant snoRNA database [J].
Brown, JWS ;
Echeverria, M ;
Qu, LH ;
Lowe, TM ;
Bachellerie, JP ;
Hüttenhofer, A ;
Kastenmayer, JP ;
Green, PJ ;
Shaw, P ;
Marshall, DF .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :432-435
[4]   In silico identification of novel selenoproteins in the Drosophila melanogaster genome [J].
Castellano, S ;
Morozova, N ;
Morey, M ;
Berry, MJ ;
Serras, F ;
Corominas, M ;
Guigó, R .
EMBO REPORTS, 2001, 2 (08) :697-702
[5]   RNA folding energy landscapes [J].
Chen, SJ ;
Dill, KA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (02) :646-651
[6]   Overlapping but distinct RNA elements control repression and activation of nanos translation [J].
Crucs, S ;
Chatterjee, S ;
Gavis, ER .
MOLECULAR CELL, 2000, 5 (03) :457-467
[7]   FINDING THE HAIRPIN IN THE HAYSTACK - SEARCHING FOR RNA MOTIFS [J].
DANDEKAR, T ;
HENTZE, MW .
TRENDS IN GENETICS, 1995, 11 (02) :45-50
[8]   Finding needles in a haystack -: In silico identification of eukaryotic selenoprotein genes [J].
Driscoll, DM ;
Chavatte, L .
EMBO REPORTS, 2004, 5 (02) :140-141
[9]   Non-coding RNA genes and the modern RNA world [J].
Eddy, SR .
NATURE REVIEWS GENETICS, 2001, 2 (12) :919-929
[10]   Computational Genomics of noncoding RNA genes [J].
Eddy, SR .
CELL, 2002, 109 (02) :137-140