A hidden Markov model derived structural alphabet for proteins

被引:120
作者
Camproux, AC [1 ]
Gautier, R [1 ]
Tufféry, P [1 ]
机构
[1] Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France
关键词
protein structure; structural alphabet; hidden Markov models; protein structural organization;
D O I
10.1016/j.jmb.2004.04.005
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1 Angstrom root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:591 / 605
页数:15
相关论文
共 47 条
[1]   HELIX GEOMETRY IN PROTEINS [J].
BARLOW, DJ ;
THORNTON, JM .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 201 (03) :601-619
[2]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[3]  
Bonneau R, 2001, PROTEINS, P119
[4]   Prediction of local structure in proteins using a library of sequence-structure motifs [J].
Bystroff, C ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) :565-577
[5]   HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins [J].
Bystroff, C ;
Thorsson, V ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) :173-190
[6]  
Bystroff Christopher, 2002, Bioinformatics, V18 Suppl 1, pS54
[7]   Analyzing patterns between regular secondary structures using short structural building blocks defined by a hidden Markov model [J].
Camproux, AC ;
Tuffery, P ;
Buffat, L ;
André, C ;
Boisvieux, JF ;
Hazout, S .
THEORETICAL CHEMISTRY ACCOUNTS, 1999, 101 (1-3) :33-40
[8]   Hidden Markov model approach for identifying the modular framework of the protein backbone [J].
Camproux, AC ;
Tuffery, P ;
Chevrolat, JP ;
Boisvieux, JF ;
Hazout, S .
PROTEIN ENGINEERING, 1999, 12 (12) :1063-1073
[9]   Exploring the use of a structural alphabet for structural prediction of protein loops [J].
Camproux, AC ;
Brevern, AG ;
Hazout, S ;
Tufféry, P .
THEORETICAL CHEMISTRY ACCOUNTS, 2001, 106 (1-2) :28-35
[10]  
Celeux G., 1992, STOCHASTICS STOCHAST, V41, P119, DOI DOI 10.1080/17442509208833797