Modeling the shape of the scene: A holistic representation of the spatial envelope

被引:4493
作者
Oliva, A
Torralba, A
机构
[1] Harvard Univ, Sch Med, Boston, MA 02115 USA
[2] Brigham & Womens Hosp, Boston, MA 02115 USA
[3] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
关键词
scene recognition; natural images; energy spectrum; principal components; spatial layout;
D O I
10.1023/A:1011139631724
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
引用
收藏
页码:145 / 175
页数:31
相关论文
共 50 条
[31]  
RAO AR, 1993, CVGIP-GRAPH MODEL IM, V55, P218, DOI 10.1006/cgip.1993.1016
[32]   The dynamic representation of scenes [J].
Rensink, RA .
VISUAL COGNITION, 2000, 7 (1-3) :17-42
[33]   To see or not to see: The need for attention to perceive changes in scenes [J].
Rensink, RA ;
ORegan, JK ;
Clark, JJ .
PSYCHOLOGICAL SCIENCE, 1997, 8 (05) :368-373
[34]  
Ripley B. D., 1996, Pattern Recognition and Neural Networks
[35]   FAMILY RESEMBLANCES - STUDIES IN INTERNAL STRUCTURE OF CATEGORIES [J].
ROSCH, E ;
MERVIS, CB .
COGNITIVE PSYCHOLOGY, 1975, 7 (04) :573-605
[36]  
Sanocki T, 2000, INVEST OPHTH VIS SCI, V41, pS723
[37]   Priming spatial layout of scenes [J].
Sanocki, T ;
Epstein, W .
PSYCHOLOGICAL SCIENCE, 1997, 8 (05) :374-378
[38]   FROM BLOBS TO BOUNDARY EDGES - EVIDENCE FOR TIME-SCALE-DEPENDENT AND SPATIAL-SCALE-DEPENDENT SCENE RECOGNITION [J].
SCHYNS, PG ;
OLIVA, A .
PSYCHOLOGICAL SCIENCE, 1994, 5 (04) :195-200
[39]   Change blindness [J].
Simons, Daniel J. ;
Levin, Daniel T. .
TRENDS IN COGNITIVE SCIENCES, 1997, 1 (07) :261-267
[40]   LOW-DIMENSIONAL PROCEDURE FOR THE CHARACTERIZATION OF HUMAN FACES [J].
SIROVICH, L ;
KIRBY, M .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1987, 4 (03) :519-524