Fast protein classification with multiple networks

被引:136
作者
Tsuda, K
Shin, HJ
Schölkopf, B
机构
[1] Max Planck Inst Biol Cybernet, D-72076 Tubingen, Germany
[2] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo, Japan
[3] Max Planck Gesell, Friedrich Miescher Lab, D-72076 Tubingen, Germany
关键词
D O I
10.1093/bioinformatics/bti1110
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Support vector machines (SVMs) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced. In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n(3)), and produces a dense matrix of size n x n. Results: We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similar to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method. Availability: Software and data will be available on request.
引用
收藏
页码:59 / 65
页数:7
相关论文
共 29 条
[1]  
Alberts B., 1998, ESSENTIAL CELL BIOL
[2]  
[Anonymous], ADV NEURAL INFORM PR
[3]  
[Anonymous], P 19 INT C MACH LEAR
[4]  
[Anonymous], 2003, ADV NEURAL INFORM PR
[5]  
[Anonymous], P 7 INT C COMP MOL B
[6]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[7]  
Bach F. R., 2004, P 21 INT C MACH LEAR, P6
[8]  
Belkin M., 2003, ADV NEURAL INFORM PR
[9]  
Chung F. R. K., 1997, REG C SERIES MATH
[10]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923