Multiple-sequence functional annotation and the generalized hidden Markov phylogeny

被引:28
作者
McAuliffe, JD
Pachter, L
Jordan, MI
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
10.1093/bioinformatics/bth153
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints. Results: We show how GHMPs can be used to predict complete shared gene structures in multiple primate sequences. We also describe shadower, our implementation of such a prediction system. We find that shadower outperforms previously reported ab initio gene finders, including comparative human-mouse approaches, on a small sample of diverse exonic regions. Finally, we report on an empirical analysis of shadower's performance which reveals that as few as five well-chosen species may suffice to attain maximal sensitivity and specificity in exon demarcation.
引用
收藏
页码:1850 / 1860
页数:11
相关论文
共 21 条
[1]   SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model [J].
Alexandersson, M ;
Cawley, S ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (03) :496-502
[2]  
[Anonymous], 1999, Probabilistic Networks and Expert Systems
[3]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[4]   MAVID multiple alignment server [J].
Bray, N ;
Pachter, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3525-3526
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104
[7]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[8]   Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses [J].
Goldman, N ;
Thorne, JL ;
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (02) :196-208
[9]  
Jordan Michael Irwin, 1999, Learning in graphical models
[10]  
Korf I, 2001, Bioinformatics, V17 Suppl 1, pS140