Combining protein secondary structure prediction models with ensemble methods of optimal complexity

被引:22
作者
Guermeur, Y
Pollastri, G
Elisseeff, A
Zelus, D
Paugam-Moisy, H
Baldi, P
机构
[1] Univ Nancy 1, LORIA, F-54506 Vandoeuvre Les Nancy, France
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Dept Informat & Comp Sci, Irvine, CA 92697 USA
[3] Max Planck Inst Biol Cybernet, D-72076 Tubingen, Germany
[4] CIBIO, Wiener Lab, RA-2000 Rosario, Santa Fe, Argentina
[5] Univ Lyon 2, UMR CNRS 5015, ISC, F-69675 Bron, France
关键词
protein secondary structure prediction; multi-class support vector machines (M-SVMs); ensemble methods; hierarchical sequence processing systems;
D O I
10.1016/j.neucom.2003.10.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many sophisticated methods are currently available to perform protein secondary structure prediction. Since they are frequently based on different principles, and different knowledge sources, significant benefits can be expected from combining them. However, the choice of an appropriate combiner appears to be an issue in its own right. The first difficulty to overcome when combining prediction methods is overfitting. This is the reason why we investigate the implementation of Support Vector Machines to perform the task. A family of multi-class SVMs is introduced. Two of these machines are used to combine some of the current best protein secondary structure prediction methods. Their performance is consistently superior to the performance of the ensemble methods traditionally used in the field. They also outperform the decomposition approaches based on bi-class SVMs. Furthermore, initial experimental evidence suggests that their outputs could be processed by the biologist to perform higher-level treatments. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:305 / 327
页数:23
相关论文
共 79 条
[1]  
AIZERMAN MA, 1965, AUTOMAT REM CONTR+, V25, P821
[2]   Scale-sensitive dimensions, uniform convergence, and learnability [J].
Alon, N ;
BenDavid, S ;
CesaBianchi, N ;
Haussler, D .
JOURNAL OF THE ACM, 1997, 44 (04) :615-631
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]  
[Anonymous], 1998, CSDTR9804 U LOND DEP
[5]  
[Anonymous], ADV LARGE MARGIN CLA
[6]  
[Anonymous], 1982, ESTIMATION DEPENDENC
[7]  
[Anonymous], [No title captured]
[8]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[9]  
BALDI P, 2001, BIOFORMAICS MACHINE
[10]  
Bartlett P, 1999, ADVANCES IN KERNEL METHODS, P43