Modeling the influence of task on attention

被引：438

作者：

Navalpakkam, V

Itti, L

机构：

[1] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA

[2] Univ So Calif, Dept Psychol, Los Angeles, CA 90089 USA

[3] Univ So Calif, Grad Program Neurosci, Los Angeles, CA 90089 USA

来源：

VISION RESEARCH | 2005年 / 45卷 / 02期

关键词：

attention; top-down; bottom-up; object detection; recognition; task-relevance; scene analysis;

D O I：

10.1016/j.visres.2004.07.042

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

We propose a computational model for the task-specific guidance of visual attention in real-world scenes. Our model emphasizes four aspects that are important in biological vision: determining task-relevance of an entity, biasing attention for the low-level visual features of desired targets, recognizing these targets using the same low-level features, and incrementally building a visual map of task-relevance at every scene location. Given a task definition in the form of keywords, the model first determines and stores the task-relevant entities in working memory, using prior knowledge stored in long-term memory. It attempts to detect the most relevant entity by biasing its visual attention system with the entity's learned low-level features. It attends to the most salient location in the scene. and attempts to recognize the attended object through hierarchical matching against object representations stored in long-term memory. It updates its working memory with the task-relevance of the recognized entity and updates a topographic task-relevance map with the location and relevance of the recognized entity. The model is tested on three types of tasks: single-target detection in 343 natural and synthetic images, where biasing for the target accelerates target detection over twofold on average; sequential multiple-target detection in 28 natural images, where biasing, recognition, working memory and long term memory contribute to rapidly finding all targets; and learning a map of likely locations of cars from a video clip filmed while driving on a highway. The model's performance on search for single features and feature conjunctions is consistent with existing psychophysical data. These results of our biologically-motivated architecture suggest that the model may provide a reasonable approximation to many brain processes involved in complex task-driven visual behaviors. (C) 2004 Elsevier Ltd. All rights reserved.

引用

页码：205 / 231

页数：27

共 93 条

[81] What you see is what you need [J].