A comparative method for finding and folding RNA secondary structures within protein-coding regions

被引:67
作者
Pedersen, JS
Meyer, IM
Forsberg, R
Simmonds, P
Hein, J
机构
[1] Univ Aarhus, Bioinformat Res Ctr, Dept Ecol & Genet, Inst Biol Sci, DK-8000 Aarhus C, Denmark
[2] Univ Oxford, Oxford Ctr Gene Funct, Oxford OX1 3QB, England
[3] Univ Edinburgh, Ctr Infect Dis, Edinburgh EH9 1QH, Midlothian, Scotland
关键词
D O I
10.1093/nar/gkh839
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Existing computational methods for RNA secondary-structure prediction tacitly assume RNA to only encode functional RNA structures. However, experimental studies have revealed that some RNA sequences, e.g. compact viral genomes, can simultaneously encode functional RNA structures as well as proteins, and evidence is accumulating that this phenomenon may also be found in Eukaryotes. We here present the first comparative method, called RNA-Decoder, which explicitly takes the known protein-coding context of an RNA-sequence alignment into account in order to predict evolutionarily conserved secondary-structure elements, which may span both coding and non-coding regions. RNA-Decoder employs a stochastic context-free grammar together with a set of carefully devised phylogenetic substitution-models, which can disentangle and evaluate the different kinds of overlapping evolutionary constraints which arise. We show that RNA-Decoder's parameters can be automatically trained to successfully fold known secondary structures within the HCV genome. We scan the genomes of HCV and polio virus for conserved secondary-structure elements, and analyze performance as a function of available evolutionary information. On known secondary structures, RNA-Decoder shows a sensitivity similar to the programs Mfold, Pfold and RNAalifold. When scanning the entire genomes of HCV and polio virus for structure elements, RNA-Decoder's results indicate a markedly higher specificity than Mfold, Pfold and RNAalifold.
引用
收藏
页码:4925 / 4936
页数:12
相关论文
共 47 条
[1]  
[Anonymous], [No title captured]
[2]  
[Anonymous], 2000, PHYLOGENETIC ANAL MA
[3]  
BLANCHETTE M, 2003, P 7 ANN INT C RES CO, P57
[4]   Weighted neighbor joining: A likelihood-based approach to distance-based phylogeny reconstruction [J].
Bruno, WJ ;
Socci, ND ;
Halpern, AL .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (01) :189-197
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]  
Chomsky Noam, 1959, Infromation and Control, V2, P137, DOI 10.1016/S0019-9958(59)90362-6
[7]  
Diwa A, 2000, GENE DEV, V14, P1249
[8]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[9]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[10]   Non-coding RNA genes and the modern RNA world [J].
Eddy, SR .
NATURE REVIEWS GENETICS, 2001, 2 (12) :919-929