Pair hidden Markov models on tree structures

被引:28
作者
Sakakibara, Yasubumi [1 ]
机构
[1] Keio Univ, Dept Biosci & Informat, Kohoku Ku, Yokohama, Kanagawa 2238522, Japan
关键词
hidden Markov model; tree automaton; structural alignment; stochastic context-free grammar; RNA; secondary structure;
D O I
10.1093/bioinformatics/btg1032
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computationally identifying non-coding RNA regions on the genome has much scope for investigation and is essentially harder than gene-finding problems for protein-coding regions. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignments of RNA sequences. On the other hand, Hidden Markov Models (HMMs) have played important roles for modeling and analysing biological sequences. Especially, the concept of Pair HMMs (PHMMs) have been examined extensively as mathematical models for alignments and gene finding. Results: We propose the pair HMMs on tree structures (PHMMTSs), which is an extension of PHMMs defined on alignments of trees and provides a unifying framework and an automata-theoretic model for alignments of trees, structural alignments and pair stochastic context-free grammars. By structural alignment, we mean a pairwise alignment to align an unfolded RNA sequence into an RNA sequence of known secondary structure. First, we extend the notion of PHMMs defined on alignments of 'linear' sequences to pair stochastic tree automata, called PHMMTSs, defined on alignments of 'trees'. The PHMMTSs provide various types of alignments of trees such as affine-gap alignments of trees and an automata-theoretic model for alignment of trees. Second, based on the observation that a secondary structure of RNA can be represented by a tree, we apply PHMMTSs to the problem of structural alignments of RNAs. We modify PHMMTSs so that it takes as input a pair of a 'linear' sequence and a 'tree' representing a secondary structure of RNA to produce a structural alignment. Further, the PHMMTSs with input of a pair of two linear sequences is mathematically equal to the pair stochastic context-free grammars. We demonstrate some computational experiments to show the effectiveness of our method for structural alignments, and discuss a complexity issue of PHMMTSs.
引用
收藏
页码:i232 / i240
页数:9
相关论文
共 23 条
[1]  
Aho Alfred V., 1972, The theory of parsing, translation, and compiling
[2]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[3]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[4]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088
[5]   Non-coding RNA genes and the modern RNA world [J].
Eddy, SR .
NATURE REVIEWS GENETICS, 2001, 2 (12) :919-929
[6]   Finding the most significant common sequence and structure motifs in a set of RNA sequences [J].
Gorodkin, J ;
Heyer, LJ ;
Stormo, GD .
NUCLEIC ACIDS RESEARCH, 1997, 25 (18) :3724-3732
[7]   Rfam: an RNA family database [J].
Griffiths-Jones, S ;
Bateman, A ;
Marshall, M ;
Khanna, A ;
Eddy, SR .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :439-441
[8]  
Holmes I, 2002, Pac Symp Biocomput, P163
[9]   ALIGNMENT OF TREES - AN ALTERNATIVE TO TREE EDIT [J].
JIANG, T ;
WANG, LS ;
ZHANG, KZ .
THEORETICAL COMPUTER SCIENCE, 1995, 143 (01) :137-148
[10]   A general edit distance between RNA structures [J].
Jiang, T ;
Lin, GH ;
Ma, B ;
Zhang, KZ .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) :371-388