Arbitrary Category Classification of Websites Based on Image Content

被引:21
作者
Akusok, Anton [1 ,2 ]
Miche, Yoan [3 ,4 ]
Karhunen, Juha [3 ]
Bjork, Kaj-Mikael [5 ]
Nian, Rui [6 ]
Lendasse, Amaury [1 ,2 ]
机构
[1] Univ Iowa, Dept Mech & Ind Engn, Iowa City, IA 52242 USA
[2] Univ Iowa, Iowa Informat Initiat, Iowa City, IA 52242 USA
[3] Aalto Univ Sch Sci, Dept Informat & Comp Sci, Espoo, Finland
[4] Nokia Solut & Networks Grp, Espoo, Finland
[5] Arcada Univ Appl Sci, Helsinki, Finland
[6] Ocean Univ China, Coll Informat Sci & Engn, Qingdao, Peoples R China
关键词
PERFORMANCE EVALUATION; SCALE; NETWORKS; FEATURES; RECOGNITION; MACHINES; OBJECT; ELM;
D O I
10.1109/MCI.2015.2405317
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
This paper presents a comprehensive methodology for general large-scale image-based classification tasks. It addresses the Big Data challenge in arbitrary image classification and more specifically, filtering of millions of websites with abstract target classes and high levels of label noise. Our approach uses local image features and their color descriptors to build image representations with the help of a modified k-NN algorithm. Image representations are refined into image and website class predictions by a two-stage classifier method suitable for a very large-scale real dataset. A modification of an Extreme Learning Machine is found to be a suitable classifier technique. The methodology is robust to noise and can learn abstract target categories; website classification accuracy surpasses 97% for the most important categories considered in this study.
引用
收藏
页码:30 / 41
页数:12
相关论文
共 59 条
[1]
Amato G., 2010, P 3 INT C SIMILARITY, P101, DOI 10.1145/1862344.1862360
[2]
[Anonymous], 1987, TECHNOMETRICS, DOI DOI 10.1080/00401706.1987.10488247
[3]
[Anonymous], 2008, P 1 ACM INT C MULTIM
[4]
Ballan L, 2012, INT C PATT RECOG, P1731
[5]
Speeded-Up Robust Features (SURF) [J].
Bay, Herbert ;
Ess, Andreas ;
Tuytelaars, Tinne ;
Van Gool, Luc .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359
[6]
Benenson R, 2012, PROC CVPR IEEE, P2903, DOI 10.1109/CVPR.2012.6248017
[7]
RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING [J].
BIEDERMAN, I .
PSYCHOLOGICAL REVIEW, 1987, 94 (02) :115-147
[8]
Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]
In defense of Nearest-Neighbor based image classification [J].
Boiman, Oren ;
Shechtman, Eli ;
Irani, Michal .
2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, :1992-+
[10]
Performance evaluation of local colour invariants [J].
Burghouts, Gertjan J. ;
Geusebroek, Jan-Mark .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2009, 113 (01) :48-62