HSEpred: predict half-sphere exposure from protein sequences

被引:51
作者
Song, Jiangning [1 ]
Tan, Hao [2 ]
Takemoto, Kazuhiro [1 ]
Akutsu, Tatsuya [1 ]
机构
[1] Kyoto Univ, Bioinformat Ctr, Inst Chem Res, Kyoto 6110011, Japan
[2] Monash Univ, Caulfield Sch Informat Technol, Caulfield, E Vic 3145, Australia
基金
日本学术振兴会;
关键词
D O I
10.1093/bioinformatics/btn222
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Half-sphere exposure (HSE) is a newly developed two-dimensional solvent exposure measure. By conceptually separating an amino acids sphere in a protein structure into two half spheres which represent its distinct spatial neighborhoods in the upward and downward directions, the HSE-up and HSE-down measures show superior performance compared with other measures such as accessible surface area, residue depth and contact number. However, currently there is no existing method for the prediction of HSE measures from sequence data. Results: In this article, we propose a novel approach to predict the HSE measures and infer residue contact numbers using the predicted HSE values, based on a well-prepared non-homologous protein structure dataset. In particular, we employ support vector regression (SVR) to quantify the relationship between HSE measures and protein sequences and evaluate its prediction performance. We extensively explore five sequence-encoding schemes to examine their effects on the prediction performance. Our method could achieve the correlation coefficients of 0.72 and 0.68 between the predicted and observed HSE-up and HSE-down measures, respectively. Moreover, contact number can be accurately predicted by the summation of the predicted HSE-up and HSE-down values, which has further enlarged the application of this method. The successful application of SVR approach in this study suggests that it should be more useful in quantifying the protein sequencestructure relationship and predicting the structural property profiles from protein sequences.
引用
收藏
页码:1489 / 1497
页数:9
相关论文
共 46 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   Residue depth: a novel parameter for the analysis of protein structure and stability [J].
Chakravarty, S ;
Varadarajan, R .
STRUCTURE WITH FOLDING & DESIGN, 1999, 7 (07) :723-732
[7]  
CHANDONIA JM, 1995, PROTEIN SCI, V4, P275
[8]   PFRES: protein fold classification by using evolutionary information and predicted secondary structure [J].
Chen, Ke ;
Kurgan, Lukasz .
BIOINFORMATICS, 2007, 23 (21) :2843-2850
[9]   A machine learning information retrieval approach to protein fold recognition [J].
Cheng, Jianlin ;
Baldi, Pierre .
BIOINFORMATICS, 2006, 22 (12) :1456-1463
[10]   SOLVENT-ACCESSIBLE SURFACES OF PROTEINS AND NUCLEIC-ACIDS [J].
CONNOLLY, ML .
SCIENCE, 1983, 221 (4612) :709-713