Protein function prediction via graph kernels

被引:681
作者
Borgwardt, KM
Ong, CS
Schönauer, S
Vishwanathan, SVN
Smola, AJ
Kriegel, HP
机构
[1] Univ Munich, Inst Comp Sci, D-80538 Munich, Germany
[2] Natl ICT Australia, Canberra, ACT 0200, Australia
基金
澳大利亚研究理事会;
关键词
D O I
10.1093/bioinformatics/bti1007
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs. Results: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively.
引用
收藏
页码:I47 / I56
页数:10
相关论文
共 38 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[3]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[4]  
[Anonymous], P 5 INT C MOL STRUCT
[5]  
BARTLETT GJ, 2003, STRUCTURAL BIOINFORM, P387
[6]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]   CASTp: Computed atlas of surface topography of proteins [J].
Binkowski, TA ;
Naghibzadeh, S ;
Liang, J .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3352-3355
[8]  
Boyd S., 2004, CONVEX OPTIMIZATION
[9]   Enzyme family classification by support vector machines [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, YZ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) :66-76
[10]   Protein function classification via support vector machine approach [J].
Cai, CZ ;
Wang, WL ;
Sun, LZ ;
Chen, YZ .
MATHEMATICAL BIOSCIENCES, 2003, 185 (02) :111-122