Spatiotemporal saliency for video classification

被引:35
作者
Rapantzikos, Konstantinos [1 ]
Tsapatsoulis, Nicolas [2 ]
Avrithis, Yannis [1 ]
Kollias, Stefanos [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, GR-10682 Athens, Greece
[2] Cyprus Univ Technol, Dept Commun & Internet Studies, Limassol, Cyprus
关键词
Spatiotemporal visual saliency; Video classification; VISUAL-ATTENTION; MODEL; SHIFTS;
D O I
10.1016/j.image.2009.03.002
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
080906 [电磁信息功能材料与结构]; 082806 [农业信息与电气工程];
摘要
Computer vision applications often need to process only a representative part of the visual input rather than the whole image/sequence. Considerable research has been carried out into salient region detection methods based either on models emulating human visual attention (VA) mechanisms or on computational approximations. Most of the proposed methods are bottom-up and their major goal is to filter out redundant visual information. In this paper, we propose and elaborate on a saliency detection model that treats a video sequence as a spatiotemporal volume and generates a local saliency measure for each visual unit (voxel). This computation involves an optimization process incorporating inter- and intra-feature competition at the voxel level. Perceptual decomposition of the input, spatiotemporal center-surround interactions and the integration of heterogeneous feature conspicuity values are described and an experimental framework for video classification is set up. This framework consists of a series of experiments that shows the effect of saliency in classification performance and let us draw conclusions on how well the detected salient regions represent the visual input. A comparison is attempted that shows the potential of the proposed method. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:557 / 571
页数:15
相关论文
共 52 条
[1]
[Anonymous], 2 JOINT IEEE INT WOR, DOI DOI 10.1109/VSPETS.2005.1570899
[2]
[Anonymous], ICCV 05
[3]
[Anonymous], 1993, Multimedia Systems, DOI DOI 10.1007/BF01210504
[4]
[Anonymous], 1988, International journal of computer vision
[5]
Attention-based dynamic visual search using inner-scene similarity: Algorithms and bounds [J].
Avraham, T ;
Lindenbaum, M .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (02) :251-264
[6]
The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields [J].
Black, MJ ;
Anandan, P .
COMPUTER VISION AND IMAGE UNDERSTANDING, 1996, 63 (01) :75-104
[7]
Boiman O, 2005, IEEE I CONF COMP VIS, P462
[8]
Corchs S, 2004, 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, P71
[9]
DERPANIS KG, 2005, 3 DIMENSIONAL DERIVA, V3, P553
[10]
DRAPER BA, 2003, INT WORKSH ATT PERF