Collaborative SVM classification in scale-free peer-to-peer networks

被引:14
作者
Khan, Umer [1 ]
Schmidt-Thieme, Lars [1 ]
Nanopoulos, Alexandros [2 ]
机构
[1] Univ Hildesheim, Univ Pl 1, D-31141 Hildesheim, Germany
[2] Univ Eichstatt, Ingotstadt, Germany
关键词
Distributed; Classification; SVM; P2P; Skew;
D O I
10.1016/j.eswa.2016.10.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed classification in large-scale P2P networks has gained relevance in recent years and support applications like distributed intrusion detection in P2P monitoring environments, online match-making, personalized information retrieval, distributed document classification in a P2P media repository and P2P recommender systems to mention a few. However, classification in a P2P network is a challenging task due to the constraints such as centralization of data is not feasible, scarce communication bandwidth, scalability, synchronization and peer dynamism. Moreover, without considering data distributions and topological scenarios of real world P2P systems, most of the existing distributed classification approaches lack in their predictive and network-cost performance. In this paper, we investigate a collaborative classification method (TRedSVM) based on Support Vector Machines (SVM) in Scale-free P2P networks. In particular, we demonstrate how to construct SVM classifier in real world P2P networks which exhibit inherently skewed distribution of node links and eventually data. The proposed method propagates the most influential instances of SVM models to the vast majority of scarcely connected peers in a controlled way that improves their local classification accuracy and, at the same time, keeps the communication cost low throughout the network. Besides using benchmark Machine Learning data sets for extensive experimental evaluations, we have evaluated the proposed method particularly for music genre classification to exhibit its performance in a real application scenario. Additionally, performance analysis is carried out with respect to centralized approaches, data replication in P2P networks and cost accuracy trade-off. TRedSVM outperforms baseline approaches of model propagation by improving the overall classification performance substantially at the cost of a tolerable increase in communication. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:74 / 86
页数:13
相关论文
共 41 条
[1]  
Agarwal S, 2009, SIGCOMM 2009, P315
[2]   Resource demand and supply in BitTorrent content-sharing communities [J].
Andrade, Nazareno ;
Santos-Neto, Elizeu ;
Brasileiro, Francisco ;
Ripeanu, Matei .
COMPUTER NETWORKS, 2009, 53 (04) :515-527
[3]   Classification in P2P Networks with Cascade Support Vector Machines [J].
Ang, Hock Hee ;
Gopalkrishnan, Vivekanand ;
Hoi, Steven C. H. ;
Ng, Wee Keong .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2013, 7 (04)
[4]  
[Anonymous], 2012, ACM COMPUT SURV, DOI DOI 10.1145/2089125.2089129
[5]  
[Anonymous], P 20 ACM INT C INF K
[6]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[7]  
Bertin-Mahieux T., 2011, P 12 INT SOC MUS INF, V2, P10, DOI DOI 10.7916/D8NZ8J07
[8]  
Bhaduri K., 2008, Stat. Anal. Data Min, V1, P177
[9]  
Bhaduri Kanishka., 2008, Statistical Analysis and Data Mining, V1, P85, DOI DOI 10.1002/SAM.10006
[10]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)