Combining phylogenetic and hidden Markov models in biosequence analysis

被引:153
作者
Siepel, A
Haussler, D
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Howard Hughes Med Inst, Santa Cruz, CA 95064 USA
关键词
phylogenetic models; hidden Markov models; gene prediction; maximum likelihood; context-dependent substitution;
D O I
10.1089/1066527041410472
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow for changes from site to site. Besides improving the realism of ordinary phylogenetic models, they are potentially very powerful tools for inference and prediction-for example, for gene finding or prediction of secondary structure. In this paper, we review progress on combined phylogenetic and hidden Markov models and present some extensions to previous work. Our main result is a simple and efficient method for accommodating higher-order states in the HMM, which allows for context-dependent models of substitution-that is, models that consider the effects of neighboring bases on the pattern of substitution. We present experimental results indicating that higher-order states, autocorrelated rates, and multiple functional categories all lead to significant improvements in the fit of a combined phylogenetic and hidden Markov model, with the effect of higher-order states being particularly pronounced.
引用
收藏
页码:413 / 428
页数:16
相关论文
共 61 条
[51]  
WAINWRIGHT M, 2001, ADV NEURAL INFORMATI, V14
[52]   Initial sequencing and comparative analysis of the mouse genome [J].
Waterston, RH ;
Lindblad-Toh, K ;
Birney, E ;
Rogers, J ;
Abril, JF ;
Agarwal, P ;
Agarwala, R ;
Ainscough, R ;
Alexandersson, M ;
An, P ;
Antonarakis, SE ;
Attwood, J ;
Baertsch, R ;
Bailey, J ;
Barlow, K ;
Beck, S ;
Berry, E ;
Birren, B ;
Bloom, T ;
Bork, P ;
Botcherby, M ;
Bray, N ;
Brent, MR ;
Brown, DG ;
Brown, SD ;
Bult, C ;
Burton, J ;
Butler, J ;
Campbell, RD ;
Carninci, P ;
Cawley, S ;
Chiaromonte, F ;
Chinwalla, AT ;
Church, DM ;
Clamp, M ;
Clee, C ;
Collins, FS ;
Cook, LL ;
Copley, RR ;
Coulson, A ;
Couronne, O ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Daly, M ;
David, R ;
Davies, J ;
Delehaunty, KD ;
Deri, J ;
Dermitzakis, ET .
NATURE, 2002, 420 (6915) :520-562
[53]   Molecular phylogenetics:: state-of-the-art methods for looking into the past [J].
Whelan, S ;
Liò, P ;
Goldman, N .
TRENDS IN GENETICS, 2001, 17 (05) :262-272
[54]   The proteins of linked genes evolve at similar rates [J].
Williams, EJB ;
Hurst, LD .
NATURE, 2000, 407 (6806) :900-903
[55]  
YANG ZH, 1995, GENETICS, V139, P993
[56]  
YANG ZH, 1994, MOL BIOL EVOL, V11, P316
[57]  
YANG ZH, 1993, MOL BIOL EVOL, V10, P1396
[58]   MAXIMUM-LIKELIHOOD PHYLOGENETIC ESTIMATION FROM DNA-SEQUENCES WITH VARIABLE RATES OVER SITES - APPROXIMATE METHODS [J].
YANG, ZH .
JOURNAL OF MOLECULAR EVOLUTION, 1994, 39 (03) :306-314
[59]   Maximum-likelihood models for combined analyses of multiple sequence data [J].
Yang, ZH .
JOURNAL OF MOLECULAR EVOLUTION, 1996, 42 (05) :587-596
[60]  
Yang ZH, 1997, COMPUT APPL BIOSCI, V13, P555