The use of attention and spatial information for rapid facial recognition in video

被引:10
作者
Bonaiuto, J. [1 ]
Itti, L. [1 ]
机构
[1] Univ So Calif, Dept Neurosci, Los Angeles, CA 90089 USA
关键词
visual attention; bottom-up; face recognition; video processing;
D O I
10.1016/j.imavis.2005.09.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bottom-up visual attention allows primates to quickly select regions of an image that contain salient objects. In artificial systems. restricting the task of object recognition to these regions allows faster recognition and Unsupervised learning of multiple objects in cluttered scenes. A problem with this approach is that objects superficially dissimilar to the target are given the same consideration in recognition as similar objects. In video, objects recognized in previous frames at locations distant to the current fixation point are given the same consideration in recognition as objects previously recognized in locations closer to the Current target of attention. Due to the continuity of smooth motion, objects recently recognized in previous frames at locations close to the current focus of attention have a high probability of matching the current target. Here we investigate rapid pruning of the facial recognition search space using the already-computed low-level features that guide attention and spatial information derived from previous video frames. For each video frame, Itti & Koch's bottom-up visual attention algorithm is used to select salient locations based on low-level features such as contrast, orientation, color, intensity, flicker and motion. This algorithm has shown to be highly effective in selecting faces as salient objects. Lowe's SIFT object recognition algorithm then extracts a signature of the attended object. for comparison with the facial database. The database search is prioritized for faces which better match the low-level features used to guide attention to the current candidate for recognition or those that were previously recognized near the current candidate's location. The SIFT signatures of the prioritized faces are then checked against the attended candidate for a match. By comparing performance of Lowe's recognition algorithm and Itti & Koch's bottom-up attention model with or without search space pruning we demonstrate that Our pruning approach improves the speed of facial recognition in video footage. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:557 / 563
页数:7
相关论文
共 15 条
  • [1] Dynamic cell structures for the evaluation of keypoints in facial images
    Herpers, R
    Witta, L
    Bruske, J
    Sommer, G
    [J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 1997, 8 (01) : 27 - 39
  • [2] HERPERS R, 1996, P 13 INT C PATT REC, V2, P23
  • [3] A model of saliency-based visual attention for rapid scene analysis
    Itti, L
    Koch, C
    Niebur, E
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) : 1254 - 1259
  • [4] Computational modelling of visual attention
    Itti, L
    Koch, C
    [J]. NATURE REVIEWS NEUROSCIENCE, 2001, 2 (03) : 194 - 203
  • [5] Realistic avatar eye and head animation using a neurobiological model of visual attention
    Itti, L
    Dhavale, N
    Pighin, F
    [J]. APPLICATIONS AND SCIENCE OF NEURAL NETWORKS, FUZZY SYSTEMS, AND EVOLUTIONARY COMPUTATION VI, 2003, 5200 : 64 - 78
  • [6] Lowe D.G., 1999, P IEEE INT C COMP VI, P1150, DOI DOI 10.1109/ICCV.1999.790410
  • [7] Distinctive image features from scale-invariant keypoints
    Lowe, DG
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) : 91 - 110
  • [8] Lowe DG, 2000, LECT NOTES COMPUT SC, V1811, P20
  • [9] NAVALPAKKAM V, MODELING INFLUENCE T
  • [10] Rutishauser U., 2004, INT C COMP VIS PATT