Discovering common stem-loop motifs in unaligned RNA sequences

被引:105
作者
Gorodkin, J
Stricklin, SL
Stormo, GD
机构
[1] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
[2] Aarhus Univ, Inst Biol Sci, Dept Ecol & Genet, DK-8000 Aarhus C, Denmark
关键词
D O I
10.1093/nar/29.10.2135
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem-loop RNA motifs in unaligned sequences, such as UTRs from posttranscriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem-Loop Align SearcH (SLASH), which will perform stem-loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/.
引用
收藏
页码:2135 / 2144
页数:10
相关论文
共 57 条
[1]   Phylogenetically enhanced statistical tools for RNA structure prediction [J].
Akmaev, VR ;
Kelley, ST ;
Stormo, GD .
BIOINFORMATICS, 2000, 16 (06) :501-512
[2]  
ALTSCHUL SF, 1985, MOL BIOL EVOL, V2, P526
[3]   Approaches to the automatic discovery of patterns in biosequences [J].
Brazma, A ;
Jonassen, I ;
Eidhammer, I ;
Gilbert, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (02) :279-305
[4]  
Brown M. P. S., 2000, Proceedings. Eighth International Conference on Intelligent Systems for Molecular Biology, P57
[5]   PREDICTION OF HUMAN MESSENGER-RNA DONOR AND ACCEPTOR SITES FROM THE DNA-SEQUENCE [J].
BRUNAK, S ;
ENGELBRECHT, J ;
KNUDSEN, S .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (01) :49-65
[6]   Prediction of common secondary structures of RNAs: a genetic algorithm approach [J].
Chen, JH ;
Le, SY ;
Maizel, JV .
NUCLEIC ACIDS RESEARCH, 2000, 28 (04) :991-999
[7]   FINDING THE HAIRPIN IN THE HAYSTACK - SEARCHING FOR RNA MOTIFS [J].
DANDEKAR, T ;
HENTZE, MW .
TRENDS IN GENETICS, 1995, 11 (02) :45-50
[8]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[9]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[10]  
FOUCRAULT M, 1995, P 3 INT C INT SYST M, P121