Fast SVM training using edge detection on very large datasets

被引：19

作者：

Li, Boyang ^{[1
]}

Wang, Qiangwei ^{[2
]}

Hu, Jinglu ^{[2
]}

机构：

[1] Dongbei Univ Finance & Econ, Sch Management Sci & Engn, Dalian, Peoples R China

[2] Waseda Univ, Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka, Japan

来源：

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING | 2013年 / 8卷 / 03期

关键词：

support vector machine; fast SVM training; edge detection; training data reduction; VECTOR MACHINES; SPACE;

D O I：

10.1002/tee.21844

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

080906 [电磁信息功能材料与结构]; 082806 [农业信息与电气工程];

摘要：

In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of the training dataset. For very large datasets, it is thus computationally infeasible. Reducing the size of training dataset is naturally considered as a method to solve this problem. SVM classifiers are constructed by using the training samples called support vectors (SVs) that lie close to the separation boundary. Thus, removing the other samples that are not relevant to SVs might have no effect on building the separation boundary. In other words, we need to reserve the samples that are likely to be SVs. Therefore, a method based on edge detection techniques is proposed to extract such samples near the separation boundary. In order to avoid overfitting, we also use a clustering algorithm to keep the distribution properties of the training dataset. The samples selected by the edge detector and the centroids of clusters are used to reconstruct the training dataset. In the proposed approach, the edge detection technique helps us to extract the local properties around the separation boundary and the clustering algorithm preserves the properties of the entire data. The reconstructed training dataset with a smaller number of samples can make the training process very fast without degrading the classification accuracy. (c) 2013 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

引用

页码：229 / 237

页数：9

共 26 条

[1]

Achlioptas D, 2002, ADV NEUR IN, V14, P335

[2]

[Anonymous], 1999, Advances in kernel methods: Support vector learning

[3]

[Anonymous], 1998, SUPPORT VECTOR MACHI

[4]

[Anonymous], 2000, P 17 INT C MACHINE L