Building the gist of a scene: the role of global image features in recognition

被引:942
作者
Oliva, Aude
Torralba, Antonio
机构
[1] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
来源
VISUAL PERCEPTION, PT 2: FUNDAMENTALS OF AWARENESS: MULTI-SENSORY INTEGRATION AND HIGH-ORDER PERCEPTION | 2006年 / 155卷
关键词
scene recognition; gist; spatial envelope; global image feature; spatial frequency; natural image;
D O I
10.1016/S0079-6123(06)55002-2
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? On the basis of behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene gist understanding, based on scene-centered, rather than object-centered primitives. We show that the structure of a scene image can be estimated by the mean of global image features, providing a statistical summary of the spatial layout properties (Spatial Envelope representation) of the scene. Global features are based on configurations of spatial scales and are estimated without invoking segmentation or grouping operations. The scene-centered approach is not an alternative to local image analysis but would serve as a feed-forward and parallel pathway of visual processing, able to quickly constrain local feature analysis and enhance object recognition in cluttered natural scenes.
引用
收藏
页码:23 / 36
页数:14
相关论文
共 72 条
[1]   Seeing sets: Representation by statistical properties [J].
Ariely, D .
PSYCHOLOGICAL SCIENCE, 2001, 12 (02) :157-162
[2]  
Baddeley R, 1997, COGNITIVE SCI, V21, P351, DOI 10.1207/s15516709cog2103_4
[3]   Cortical analysis of visual context [J].
Bar, M ;
Aminoff, E .
NEURON, 2003, 38 (02) :347-358
[4]   Visual objects in context [J].
Bar, M .
NATURE REVIEWS NEUROSCIENCE, 2004, 5 (08) :617-629
[5]  
Barnard K, 2001, EIGHTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOL II, PROCEEDINGS, P408, DOI 10.1109/ICCV.2001.937654
[6]   The ''independent components'' of natural scenes are edge filters [J].
Bell, AJ ;
Sejnowski, TJ .
VISION RESEARCH, 1997, 37 (23) :3327-3338
[7]  
BIEDERMAN I, 1995, INVITATION COGNITIVE, V2, P121
[8]   Blobworld: Image segmentation using expectation-maximization and its application to image querying [J].
Carson, C ;
Belongie, S ;
Greenspan, H ;
Malik, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (08) :1026-1038
[9]   Representation of statistical properties [J].
Chong, SC ;
Treisman, A .
VISION RESEARCH, 2003, 43 (04) :393-404
[10]   The parahippocampal place area: Recognition, navigation, or encoding? [J].
Epstein, R ;
Harris, A ;
Stanley, D ;
Kanwisher, N .
NEURON, 1999, 23 (01) :115-125