β Edge strands in protein structure prediction and aggregation

被引:48
作者
Siepen, JA [1 ]
Radford, SE [1 ]
Westhead, DR [1 ]
机构
[1] Univ Leeds, Sch Biochem & Mol Biol, Leeds LS2 9JT, W Yorkshire, England
基金
英国生物技术与生命科学研究理事会;
关键词
decision tree; support vector machine; secondary structure prediction; machine learning;
D O I
10.1110/ps.03234503
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
It is well established that recognition between exposed edges of beta-sheets is an important mode of protein-protein interaction and can have pathological consequences; for instance, it has been linked to the aggregation of proteins into a fibrillar structure, which is associated with a number of predominantly neurode-generative disorders. A number of protective mechanisms have evolved in the edge strands of beta-sheets, preventing the aggregation and insolubility of most natural beta-sheet proteins. Such mechanisms are unfavorable in the interior of a beta-sheet. The problem of distinguishing edge strands from central strands based on sequence information alone is important in predicting residues and mutations likely to be involved in aggregation, and is also a first step in predicting folding topology. Here we report support vector machine (SVM) and decision tree methods developed to classify edge strands from central strands in a representative set of protein domains. Interestingly, rules generated by the decision tree method are in close agreement with our knowledge of protein structure and are potentially useful in a number of different biological applications. When trained on strands from proteins of known structure, using structure-based (Dictionary of Secondary Structure in Proteins) strand assignments, both methods achieved mean cross-validated, prediction accuracies of similar to78%. These accuracies were reduced when strand assignments from secondary structure prediction were used. Further investigation of this effect revealed that it could be explained by a significant reduction in the accuracy of standard secondary structure prediction methods for edge strands, in comparison with central strands.
引用
收藏
页码:2348 / 2359
页数:12
相关论文
共 45 条
[1]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[2]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[3]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[4]   Prediction of local structure in proteins using a library of sequence-structure motifs [J].
Bystroff, C ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) :565-577
[5]   IDENTIFICATION, CLASSIFICATION, AND ANALYSIS OF BETA-BULGES IN PROTEINS [J].
CHAN, AWE ;
HUTCHINSON, EG ;
HARRIS, D ;
THORNTON, JM .
PROTEIN SCIENCE, 1993, 2 (10) :1574-1590
[6]   SVMTorch: Support vector machines for large-scale regression problems [J].
Collobert, R ;
Bengio, S .
JOURNAL OF MACHINE LEARNING RESEARCH, 2001, 1 (02) :143-160
[7]  
Cristianini N., 2000, INTRO SUPPORT VECTOR, DOI [10.1017/CBO9780511801389, DOI 10.1017/CBO9780511801389]
[8]   De novo designed peptide-based amyloid fibrils [J].
de la Paz, ML ;
Goldie, K ;
Zurdo, J ;
Lacroix, E ;
Dobson, CM ;
Hoenger, A ;
Serrano, L .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (25) :16052-16057
[9]   Multi-class protein fold recognition using support vector machines and neural networks [J].
Ding, CHQ ;
Dubchak, I .
BIOINFORMATICS, 2001, 17 (04) :349-358
[10]   THE HYDROPHOBIC MOMENT DETECTS PERIODICITY IN PROTEIN HYDROPHOBICITY [J].
EISENBERG, D ;
WEISS, RM ;
TERWILLIGER, TC .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1984, 81 (01) :140-144