How reliably can we predict the reliability of protein structure predictions?

被引:9
作者
Miklos, Istvan [1 ]
Novak, Adam [1 ]
Dombai, Balazs [2 ]
Hein, Jotun [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[2] Eotvos Lorand Univ, E Sci Reg Knowledge Ctr, H-1117 Budapest, Hungary
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-9-137
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results: Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between C-alpha amino acids in superimposed tertiary structures. Conclusion: Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.
引用
收藏
页数:14
相关论文
共 47 条
[1]  
[Anonymous], 1978, Atlas of protein sequence and structure
[2]  
BRADLEY R, 2007, BIOINFORMATICS
[3]  
Drummond AJ, 2002, GENETICS, V161, P1307
[4]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
[5]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[6]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[7]   PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES [J].
FENG, DF ;
DOOLITTLE, RF .
JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) :351-360
[8]   Simultaneous statistical multiple alignment and phylogeny reconstruction [J].
Fleissner, R ;
Metzler, D ;
Von Haeseler, A .
SYSTEMATIC BIOLOGY, 2005, 54 (04) :548-561
[9]  
Garnier J, 1996, METHOD ENZYMOL, V266, P540
[10]   Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses [J].
Goldman, N ;
Thorne, JL ;
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (02) :196-208