Pair hidden Markov models on tree structures

被引：28

作者：

Sakakibara, Yasubumi ^{[1
]}

机构：

[1] Keio Univ, Dept Biosci & Informat, Kohoku Ku, Yokohama, Kanagawa 2238522, Japan

来源：

BIOINFORMATICS | 2003年 / 19卷

关键词：

hidden Markov model; tree automaton; structural alignment; stochastic context-free grammar; RNA; secondary structure;

D O I：

10.1093/bioinformatics/btg1032

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Computationally identifying non-coding RNA regions on the genome has much scope for investigation and is essentially harder than gene-finding problems for protein-coding regions. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignments of RNA sequences. On the other hand, Hidden Markov Models (HMMs) have played important roles for modeling and analysing biological sequences. Especially, the concept of Pair HMMs (PHMMs) have been examined extensively as mathematical models for alignments and gene finding. Results: We propose the pair HMMs on tree structures (PHMMTSs), which is an extension of PHMMs defined on alignments of trees and provides a unifying framework and an automata-theoretic model for alignments of trees, structural alignments and pair stochastic context-free grammars. By structural alignment, we mean a pairwise alignment to align an unfolded RNA sequence into an RNA sequence of known secondary structure. First, we extend the notion of PHMMs defined on alignments of 'linear' sequences to pair stochastic tree automata, called PHMMTSs, defined on alignments of 'trees'. The PHMMTSs provide various types of alignments of trees such as affine-gap alignments of trees and an automata-theoretic model for alignment of trees. Second, based on the observation that a secondary structure of RNA can be represented by a tree, we apply PHMMTSs to the problem of structural alignments of RNAs. We modify PHMMTSs so that it takes as input a pair of a 'linear' sequence and a 'tree' representing a secondary structure of RNA to produce a structural alignment. Further, the PHMMTSs with input of a pair of two linear sequences is mathematically equal to the pair stochastic context-free grammars. We demonstrate some computational experiments to show the effectiveness of our method for structural alignments, and discuss a complexity issue of PHMMTSs.

引用

页码：i232 / i240

页数：9

共 23 条

[1]

Aho Alfred V., 1972, The theory of parsing, translation, and compiling

[2]

Durbin R., 1998, BIOL SEQUENCE ANAL

[3] Profile hidden Markov models [J].