Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction

被引:26
作者
Chu, W [1 ]
Ghahramani, Z
Podtelezhnikov, A
Wild, DL
机构
[1] UCL, Gatsby Computat Neurosci Unitwc1n 3ar, London, England
[2] Keck Grad Inst Life Sci, Claremont, CA 91711 USA
关键词
Bayesian segmental semi-Markov models; generative models; protein secondary structure; contact maps; multiple sequence alignment profiles; parametric models;
D O I
10.1109/TCBB.2006.17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/similar to wild/bsm.html.
引用
收藏
页码:98 / 113
页数:16
相关论文
共 45 条
[31]   EVA: Large-scale analysis of secondary structure prediction [J].
Rost, B ;
Eyrich, VA .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2001, :192-199
[32]   PREDICTION OF PROTEIN SECONDARY STRUCTURE AT BETTER THAN 70-PERCENT ACCURACY [J].
ROST, B ;
SANDER, C .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 232 (02) :584-599
[33]  
SCHMIDLER CS, 2002, THESIS STANFROD U
[34]  
SCHMIDLER CS, 2002, CASE STUDIES BAYESIA, P363
[35]   Bayesian segmentation of protein secondary structure [J].
Schmidler, SC ;
Liu, JS ;
Brutlag, DL .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) :233-248
[36]   Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions [J].
Simons, KT ;
Kooperberg, C ;
Huang, E ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :209-225
[37]  
Sjolander K, 1996, COMPUT APPL BIOSCI, V12, P327
[38]   CROSS-VALIDATORY CHOICE AND ASSESSMENT OF STATISTICAL PREDICTIONS [J].
STONE, M .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1974, 36 (02) :111-147
[39]   CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE [J].
THOMPSON, JD ;
HIGGINS, DG ;
GIBSON, TJ .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4673-4680
[40]   Teaching computers to fold proteins [J].
Winther, O ;
Krogh, A .
PHYSICAL REVIEW E, 2004, 70 (03) :4