Modeling the influence of task on attention

被引:434
作者
Navalpakkam, V
Itti, L
机构
[1] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[2] Univ So Calif, Dept Psychol, Los Angeles, CA 90089 USA
[3] Univ So Calif, Grad Program Neurosci, Los Angeles, CA 90089 USA
关键词
attention; top-down; bottom-up; object detection; recognition; task-relevance; scene analysis;
D O I
10.1016/j.visres.2004.07.042
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
We propose a computational model for the task-specific guidance of visual attention in real-world scenes. Our model emphasizes four aspects that are important in biological vision: determining task-relevance of an entity, biasing attention for the low-level visual features of desired targets, recognizing these targets using the same low-level features, and incrementally building a visual map of task-relevance at every scene location. Given a task definition in the form of keywords, the model first determines and stores the task-relevant entities in working memory, using prior knowledge stored in long-term memory. It attempts to detect the most relevant entity by biasing its visual attention system with the entity's learned low-level features. It attends to the most salient location in the scene. and attempts to recognize the attended object through hierarchical matching against object representations stored in long-term memory. It updates its working memory with the task-relevance of the recognized entity and updates a topographic task-relevance map with the location and relevance of the recognized entity. The model is tested on three types of tasks: single-target detection in 343 natural and synthetic images, where biasing for the target accelerates target detection over twofold on average; sequential multiple-target detection in 28 natural images, where biasing, recognition, working memory and long term memory contribute to rapidly finding all targets; and learning a map of likely locations of cars from a video clip filmed while driving on a highway. The model's performance on search for single features and feature conjunctions is consistent with existing psychophysical data. These results of our biologically-motivated architecture suggest that the model may provide a reasonable approximation to many brain processes involved in complex task-driven visual behaviors. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:205 / 231
页数:27
相关论文
共 93 条
[1]  
Andrews, 1996, ATTENTION PERFORM, P125
[2]   MODEL-BASED OBJECT RECOGNITION IN DENSE-RANGE IMAGES - A REVIEW [J].
ARMAN, F ;
AGGARWAL, JK .
COMPUTING SURVEYS, 1993, 25 (01) :5-43
[3]   Visual search for colour targets that are or are not linearly separable from distractors [J].
Bauer, B ;
Jolicoeur, P ;
Cowan, WB .
VISION RESEARCH, 1996, 36 (10) :1439-1466
[4]  
BECK J, 1983, THEORY TEXTURAL SEGM
[5]   RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING [J].
BIEDERMAN, I .
PSYCHOLOGICAL REVIEW, 1987, 94 (02) :115-147
[6]   SCENE PERCEPTION - DETECTING AND JUDGING OBJECTS UNDERGOING RELATIONAL VIOLATIONS [J].
BIEDERMAN, I ;
MEZZANOTTE, RJ ;
RABINOWITZ, JC .
COGNITIVE PSYCHOLOGY, 1982, 14 (02) :143-177
[7]   PART-WHOLE INFORMATION IS USEFUL IN VISUAL-SEARCH FOR SIZE X SIZE BUT NOT ORIENTATION X ORIENTATION CONJUNCTIONS [J].
BILSKY, AB ;
WOLFE, JM .
PERCEPTION & PSYCHOPHYSICS, 1995, 57 (06) :749-760
[8]   Measuring the amplification of attention [J].
Blaser, E ;
Sperling, G ;
Lu, ZL .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (20) :11681-11686
[9]  
BURACAS GT, 1996, I NEUR COMP P 3 JOIN, V6, P11
[10]   THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].
BURT, PJ ;
ADELSON, EH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540