Large Linear Classification When Data Cannot Fit in Memory

被引:49
作者
Yu, Hsiang-Fu [1 ]
Hsieh, Cho-Jui [1 ]
Chang, Kai-Wei [1 ]
Lin, Chih-Jen [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci, Taipei 106, Taiwan
关键词
Block minimization methods; large-scale learning; linear classification; support vector machines; LIBRARY;
D O I
10.1145/2086737.2086743
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in linear classification have shown that for applications such as document classification, the training process can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. Because data cannot fit in memory, many design considerations are very different from those for traditional algorithms. We discuss and compare with existing approaches that are able to handle data larger than memory. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.
引用
收藏
页数:23
相关论文
共 34 条
  • [1] [Anonymous], LESSONS LEARNED DEV
  • [2] [Anonymous], 2010, ser. WWW, DOI DOI 10.1145/1772690.1772759
  • [3] [Anonymous], 2008, WWW, DOI DOI 10.1145/1367497.1367554
  • [4] [Anonymous], 2010, Advances in neural information processing systems
  • [5] [Anonymous], 2007, Vowpal Wabbit
  • [6] Bertsekas DP., 2008, NONLINEAR PROGRAMMIN
  • [7] Bottou L., 2007, STOCHASTIC GRADIENT
  • [8] Boyd S.P, 2004, Convex optimization, DOI [DOI 10.1017/CBO9780511804441, 10.1017/CBO9780511804441]
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)