Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures

被引:45
作者
Matsui, H [1 ]
Sato, K [1 ]
Sakakibara, Y [1 ]
机构
[1] Keio Univ, Dept Biosci & Informat, Kohoku Ku, Yokohama, Kanagawa 2238522, Japan
关键词
D O I
10.1093/bioinformatics/bti385
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the whole genome sequences of many species have been determined, computational prediction of RNA secondary structures and computational identification of those non-coding RNA regions by comparative genomics become important. Therefore, more advanced alignment methods are required. Recently, an approach of structural alignment for RNA sequences has been introduced to solve these problems. Pair hidden Markov models on tree structures (PHMMTSs) proposed by Sakakibara are efficient automata-theoretic models for structural alignment of RNA secondary structures, although PHMMTSs are incapable of handling pseudoknots. On the other hand, tree adjoining grammars (TAGs), a subclass of context-sensitive grammars, are suitable for modeling pseudoknots. Our goal is to extend PHMMTSs by incorporating TAGs to be able to handle pseudoknots. Results: We propose pair stochastic TAGs (PSTAGs) for aligning and predicting RNA secondary structures including a simple type of pseudoknot which can represent most known pseudoknot structures. First, we extend PHMMTSs defined on alignment of 'trees' to PSTAGs defined on alignment of 'TAG trees' which represent derivation processes of TAGs and are functionally equivalent to derived trees of TAGs. Then, we develop an efficient dynamic programming algorithm of PSTAGs for obtaining an optimal structural alignment including pseudoknots. We implement the PSTAG algorithm and demonstrate the properties of the algorithm by using it to align and predict several small pseudoknot structures. We believe that our implemented program based on PSTAGs is the first grammar-based and practically executable software for comparative analyses of RNA pseudoknot structures, and, further, non-coding RNAs.
引用
收藏
页码:2611 / 2617
页数:7
相关论文
共 19 条
[1]   PREDICTION OF RNA SECONDARY STRUCTURE, INCLUDING PSEUDOKNOTTING, BY COMPUTER-SIMULATION [J].
ABRAHAMS, JP ;
VANDENBERG, M ;
VANBATENBURG, E ;
PLEIJ, C .
NUCLEIC ACIDS RESEARCH, 1990, 18 (10) :3035-3044
[2]  
CARY RB, 1995, P 3 INT C INT SYST M, P75
[3]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[4]   Finding the most significant common sequence and structure motifs in a set of RNA sequences [J].
Gorodkin, J ;
Heyer, LJ ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 1997, 25 (18) :3724-3732
[5]   Rfam: an RNA family database [J].
Griffiths-Jones, S ;
Bateman, A ;
Marshall, M ;
Khanna, A ;
Eddy, SR .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :439-441
[6]   THE COMPUTER-SIMULATION OF RNA FOLDING PATHWAYS USING A GENETIC ALGORITHM [J].
GULTYAEV, AP ;
VANBATENBURG, FHD ;
PLEIJ, CWA .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 250 (01) :37-51
[7]   Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs [J].
Ieong, S ;
Kao, MY ;
Lam, TW ;
Sung, WK ;
Yiu, SM .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) :981-995
[8]   TREE ADJUNCT GRAMMARS [J].
JOSHI, AK ;
LEVY, LS ;
TAKAHASHI, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1975, 10 (01) :136-163
[9]   RSEARCH: Finding homologs of single structured RNA sequences [J].
Klein, RJ ;
Eddy, SR .
BMC BIOINFORMATICS, 2003, 4 (1)
[10]   RNA pseudoknot prediction in energy-based models [J].
Lyngso, RB ;
Pedersen, CNS .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :409-427