Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts

被引:36
作者
Bjorkholm, Patrik [1 ,2 ]
Daniluk, Pawel [3 ]
Kryshtafovych, Andriy [4 ]
Fidelis, Krzysztof [4 ]
Andersson, Robin [1 ]
Hvidsten, Torgeir R. [1 ,5 ]
机构
[1] Uppsala Univ, Linnaeus Ctr Bioinformat, Uppsala, Sweden
[2] Stockholm Univ, Stockholm Bioinformat Ctr, S-10691 Stockholm, Sweden
[3] Univ Warsaw, Fac Phys, Dept Biophys, Warsaw, Poland
[4] UC Davis, UC Davis Genome Ctr, Davis, CA USA
[5] Umea Univ, Dept Plant Physiol, Umea Plant Sci Ctr, S-90187 Umea, Sweden
关键词
CORRELATED MUTATIONS; INFORMATION; MACHINES; MATRICES; CASP7;
D O I
10.1093/bioinformatics/btp149
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. Results: We propose a novel hidden Markov model (HMM)based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 . L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short- range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.
引用
收藏
页码:1264 / 1270
页数:7
相关论文
共 34 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases [J].
Altschul, SF ;
Koonin, EV .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) :444-447
[3]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[4]  
[Anonymous], NUCL ACIDS RES
[5]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[6]   Protein-structure prediction by recombination of fragments [J].
Bujnicki, JM .
CHEMBIOCHEM, 2006, 7 (01) :19-27
[7]   Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers [J].
Cawley, GC ;
Talbot, NLC .
PATTERN RECOGNITION, 2003, 36 (11) :2585-2592
[8]   Improved residue contact prediction using support vector machines and a large feature set [J].
Cheng, Jianlin ;
Baldi, Pierre .
BMC BIOINFORMATICS, 2007, 8 (1)
[9]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[10]   A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction [J].
Eyal, Eran ;
Frenkel-Morgenstern, Milana ;
Sobolev, Vladimir ;
Pietrokovski, Shmuel .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 67 (01) :142-153