Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

被引:397
作者
Cadieu, Charles F. [1 ,2 ]
Hong, Ha [1 ,2 ,3 ]
Yamins, Daniel L. K. [1 ,2 ]
Pinto, Nicolas [1 ,2 ]
Ardila, Diego [1 ,2 ]
Solomon, Ethan A. [1 ,2 ]
Majaj, Najib J. [1 ,2 ]
DiCarlo, James J. [1 ,2 ]
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
[2] MIT, McGovern Inst Brain Res, Cambridge, MA 02139 USA
[3] MIT, Harvard Mit Div Hlth Sci & Technol, Inst Med Engn & Sci, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
FUNCTIONAL ARCHITECTURE; HIERARCHICAL-MODELS; RECEPTIVE-FIELDS; SELECTIVITY; RESPONSES; NEURONS; SHAPE; CATEGORIZATION; FEATURES; SYSTEM;
D O I
10.1371/journal.pcbi.1003963
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of "kernel analysis" that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.
引用
收藏
页数:18
相关论文
共 69 条
[31]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[32]   Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex [J].
Li, Nuo ;
DiCarlo, James J. .
NEURON, 2010, 67 (06) :1062-1075
[33]  
Majaj N, 2012, COS 2012 SALT LAK CI
[34]   Group Invariant Scattering [J].
Mallat, Stephane .
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2012, 65 (10) :1331-1398
[35]   SEEMORE: Combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition [J].
Mel, BW .
NEURAL COMPUTATION, 1997, 9 (04) :777-804
[36]  
Montavon Gregoire, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P621, DOI 10.1007/978-3-642-35289-8_33
[37]  
Montavon G, 2011, J MACH LEARN RES, V12, P2563
[38]   Categorical, Yet Graded - Single-Image Activation Profiles of Human Category-Selective Cortical Regions [J].
Mur, Marieke ;
Ruff, Douglas A. ;
Bodurka, Jerzy ;
De Weerd, Peter ;
Bandettini, Peter A. ;
Kriegeskorte, Nikolaus .
JOURNAL OF NEUROSCIENCE, 2012, 32 (25) :8649-8662
[39]   Object class recognition and localization using sparse features with limited receptive fields [J].
Mutch, Jim ;
Lowe, David G. .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 80 (01) :45-57
[40]   The role of context in object recognition [J].
Oliva, Aude ;
Torralba, Antonio .
TRENDS IN COGNITIVE SCIENCES, 2007, 11 (12) :520-527