What is the Best Multi-Stage Architecture for Object Recognition?

被引:1246
作者
Jarrett, Kevin [1 ]
Kavukcuoglu, Koray [1 ]
Ranzato, Marc'Aurelio [1 ]
LeCun, Yann [1 ]
机构
[1] NYU, Courant Inst Math Sci, New York, NY 10003 USA
来源
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2009年
关键词
D O I
10.1109/ICCV.2009.5459469
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer. Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode. This paper addresses three questions: 1. How does the non-linearities that follow the filter banks influence the recognition accuracy? 2. does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3. Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks. We show that two stages of feature extraction yield better accuracy than one. Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used. Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (5.6%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (> 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset ( 0.53%).
引用
收藏
页码:2146 / 2153
页数:8
相关论文
共 30 条
  • [1] Aharon M., 2005, P SPIE C WAV, V5914
  • [2] Ahmed A., 2008, EUR C COMP VIS
  • [3] [Anonymous], 2007, P ADV NEUR INF PROC
  • [4] [Anonymous], 2006, CVPR
  • [5] [Anonymous], 2008, COMPUTER VISION PATT
  • [6] [Anonymous], 2006, P COMP VIS PATT REC
  • [7] [Anonymous], 2005, CVPR
  • [8] [Anonymous], 2007, ICCV
  • [9] [Anonymous], 2007, P COMP VIS PATT REC
  • [10] [Anonymous], 2004, INT J COMPUTER VISIO