Why is real-world visual object recognition hard?

被引：328

作者：

Pinto, Nicolas ^{[1
,2
]}

Cox, David D. ^{[1
,2
,3
]}

DiCarlo, James J. ^{[1
,2
]}

机构：

[1] MIT, McGovern Inst Brain Res, Cambridge, MA 02139 USA

[2] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA

[3] Rowland Inst Harvard, Cambridge, MA USA

来源：

PLOS COMPUTATIONAL BIOLOGY | 2008年 / 4卷 / 01期

关键词：

D O I：

10.1371/journal.pcbi.0040027

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brain's anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, "natural'' images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled "natural'' images in guiding that progress. In particular, we show that a simple V1-like model - a neuroscientist's "null'' model, which should perform poorly at real-world visual object recognition tasks - outperforms state-of-the-art object recognition systems ( biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a "simpler'' recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition - real-world image variation.

引用

页码：0151 / 0156

页数：6

共 32 条

[1]

[Anonymous], 2007, CALTECH 256 OBJECT C

[2]

[Anonymous], 2006, 2006 IEEE COMP SOC C, DOI DOI 10.1109/CVPR.2006.324

[3]

Arathorn D., 2002, MAP SEEKING CIRCUITS

[4] The ''independent components'' of natural scenes are edge filters [J].

Bell, AJ ;

Sejnowski, TJ .

VISION RESEARCH, 1997, 37 (23) :3327-3338

[5]

BILESCHI S, 2006, THESIS MIT EECS

[6] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[7] Untangling invariant object recognition [J].

DiCarlo, James J. ;

Cox, David D. .

TRENDS IN COGNITIVE SCIENCES, 2007, 11 (08) :333-341

[8] The duration of the attentional blink in natural scenes depends on stimulus category [J].

Einhauser, Wolfgang ;

Koch, Christof ;

Makeig, Scott .

VISION RESEARCH, 2007, 47 (05) :597-607

[9]

Fei-Fei L, 2004, CVPR WORKSH GEN MOD, P178, DOI [DOI 10.1016/J.CVIU.2005.09.012, 10.1109/CVPR.2004.383]

[10] A natural approach to studying vision [J].

Felsen, G ;

Dan, Y .

NATURE NEUROSCIENCE, 2005, 8 (12) :1643-1646

← 1 2 3 4 →