Modeling the influence of task on attention

被引:438
作者
Navalpakkam, V
Itti, L
机构
[1] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[2] Univ So Calif, Dept Psychol, Los Angeles, CA 90089 USA
[3] Univ So Calif, Grad Program Neurosci, Los Angeles, CA 90089 USA
关键词
attention; top-down; bottom-up; object detection; recognition; task-relevance; scene analysis;
D O I
10.1016/j.visres.2004.07.042
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
We propose a computational model for the task-specific guidance of visual attention in real-world scenes. Our model emphasizes four aspects that are important in biological vision: determining task-relevance of an entity, biasing attention for the low-level visual features of desired targets, recognizing these targets using the same low-level features, and incrementally building a visual map of task-relevance at every scene location. Given a task definition in the form of keywords, the model first determines and stores the task-relevant entities in working memory, using prior knowledge stored in long-term memory. It attempts to detect the most relevant entity by biasing its visual attention system with the entity's learned low-level features. It attends to the most salient location in the scene. and attempts to recognize the attended object through hierarchical matching against object representations stored in long-term memory. It updates its working memory with the task-relevance of the recognized entity and updates a topographic task-relevance map with the location and relevance of the recognized entity. The model is tested on three types of tasks: single-target detection in 343 natural and synthetic images, where biasing for the target accelerates target detection over twofold on average; sequential multiple-target detection in 28 natural images, where biasing, recognition, working memory and long term memory contribute to rapidly finding all targets; and learning a map of likely locations of cars from a video clip filmed while driving on a highway. The model's performance on search for single features and feature conjunctions is consistent with existing psychophysical data. These results of our biologically-motivated architecture suggest that the model may provide a reasonable approximation to many brain processes involved in complex task-driven visual behaviors. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:205 / 231
页数:27
相关论文
共 93 条
[81]   What you see is what you need [J].
Triesch, J ;
Ballard, DH ;
Hayhoe, MM ;
Sullivan, BT .
JOURNAL OF VISION, 2003, 3 (01) :86-94
[82]  
TRIESMAN A, 1986, J EXPT PSYCHOL HUMAN, V14, P107
[83]  
Walther D, 2002, LECT NOTES COMPUT SC, V2525, P472
[84]   Differential effect of distractor timing on localizing versus identifying visual changes [J].
Watanabe, K .
COGNITION, 2003, 88 (02) :243-257
[85]  
Weber M., 2000, P 6 EUR C COMP VIS E
[86]   DYNAMICS OF AUTOMATIC AND CONTROLLED VISUAL-ATTENTION [J].
WEICHSELGARTNER, E ;
SPERLING, G .
SCIENCE, 1987, 238 (4828) :778-780
[87]   DISSOCIATION OF OBJECT AND SPATIAL PROCESSING DOMAINS IN PRIMATE PREFRONTAL CORTEX [J].
WILSON, FAW ;
SCALAIDHE, SPO ;
GOLDMANRAKIC, PS .
SCIENCE, 1993, 260 (5116) :1955-1958
[88]   Face recognition by elastic bunch graph matching [J].
Wiskott, L ;
Fellous, JM ;
Kruger, N ;
vonderMalsburg, C .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) :775-779
[89]   GUIDED SEARCH 2.0 - A REVISED MODEL OF VISUAL-SEARCH [J].
WOLFE, JM .
PSYCHONOMIC BULLETIN & REVIEW, 1994, 1 (02) :202-238
[90]   THE ROLE OF CATEGORIZATION IN VISUAL-SEARCH FOR ORIENTATION [J].
WOLFE, JM ;
STEWART, MI ;
FRIEDMANHILL, SR ;
OCONNELL, KM .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1992, 18 (01) :34-49