A novel method for protein secondary structure prediction using dual-layer SVM and profiles

被引:134
作者
Guo, J
Chen, H
Sun, ZR [1 ]
Lin, YL
机构
[1] Tsing Hua Univ, Dept Biol Sci & Biotechnol, State Key Lab Biomembrane & Membrane Biotechnol, Inst Bioinformat, Beijing 10084, Peoples R China
[2] Tsing Hua Univ, Dept Math Sci, Beijing 10084, Peoples R China
关键词
protein structure prediction; protein secondary structure; support vector machine; position-specific scoring matrices; PSI-BLAST;
D O I
10.1002/prot.10634
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A high-performance method was developed for protein secondary structure prediction based on the dual-layer support vector machine (SVM) and position-specific scoring matrices (PSSMs). SVM is a new machine learning technology that has been successfully applied in solving problems in the field of bioinformatics. The SVM's performance is usually better than that of traditional machine learning approaches. The performance was further improved by combining PSSM profiles with the SVM analysis. The PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the second SVM layer output. On the CB513 data set, the three-state overall per-residue accuracy, Q3, reached 75.2%, while segment overlap (SOV) accuracy increased to 80.0%. On the CB396 data set, the Q3 of our method reached 74.0% and the SOV reached 78.1%. A web server utilizing the method has been constructed and is available at http://www.bioinfo.tsinghua.edu.cn/pmsvm. (C)2004 Wiley-Liss, Inc.
引用
收藏
页码:738 / 743
页数:6
相关论文
共 25 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]  
Chandonia JM, 1999, PROTEINS, V35, P293
[6]   CONFORMATIONAL PARAMETERS FOR AMINO-ACIDS IN HELICAL, BETA-SHEET, AND RANDOM COIL REGIONS CALCULATED FROM PROTEINS [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :211-222
[7]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
[8]  
2-4
[9]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054
[10]   Knowledge-based protein secondary structure assignment [J].
Frishman, D ;
Argos, P .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 23 (04) :566-579