XRate:: a fast prototyping, training and annotation tool for phylo-grammars

被引:40
作者
Klosterman, Peter S.
Uzilov, Andrew V.
Bendana, Yuri R.
Bradley, Robert K.
Chao, Sharon
Kosiol, Carolin
Goldman, Nick
Holmes, Ian [1 ]
机构
[1] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
[2] European Bioinformat Inst, Hinxton, Cambs, England
[3] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY USA
基金
英国惠康基金;
关键词
D O I
10.1186/1471-2105-7-428
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. Results: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. Conclusion: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools.
引用
收藏
页数:25
相关论文
共 92 条
[1]  
ABE N, 1994, P GEN INF WORKSH 5 U, P19
[2]   SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model [J].
Alexandersson, M ;
Cawley, S ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (03) :496-502
[3]  
[Anonymous], 1972, ATLAS PROTEIN SEQUEN
[4]  
[Anonymous], 1978, Atlas of protein sequence and structure
[5]  
[Anonymous], 1996, The EM Algorithm and Extensions
[6]   Estimation of reversible substitution matrices from multiple pairs of sequences [J].
Arvestad, L ;
Bruno, WJ .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 45 (06) :696-703
[7]  
Baum L.E., 1972, Inequalities III: Proceedings of the Third Symposium on Inequalities, page, V3, P1
[8]  
Birney E, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P56
[9]  
BOCKHORST J, 2003, P 11 INT C INT SYST, P34
[10]  
Branden C., 1999, Introduction to Protein Structure, V2nd