Automatic foveation for video compression using a neurobiological model of visual attention

被引:562
作者
Itti, L [1 ]
机构
[1] Univ So Calif, Dept Comp Sci, Psychol & Neurosci Grad Program, Los Angeles, CA 90089 USA
关键词
bottom up; eye movements; foveated; saliency; video compression; visual attention;
D O I
10.1109/TIP.2004.834657
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We evaluate the applicability of a biologically-motivated algorithm to select visually-salient regions of interest in video streams for multi ply-foveated video compression. Regions are selected based on a nonlinear integration of low-level visual cues, mimicking processing in primate occipital, and posterior parietal cortex. A dynamic foveation filter then blurs every frame, increasingly with distance from salient locations. Sixty-three variants of the algorithm (varying number and shape of virtual foveas, maximum blur, and saliency competition) are evaluated against an outdoor video scene, using MPEG-1 and constant-quality MPEG-4 (DivX) encoding. Additional compression radios of 1.1 to 8.5 are achieved by foveation. Two variants of the algorithm are validated against eye fixations recorded from four to six human observers on a heterogeneous collection of 50 video clips (over 45 000 frames in total). Significantly higher overlap than expected by chance is found between human and algorithmic foveations. With both variants, foveated clips are, on average, approximately half the size of unfoveated clips, for both MPEG-1 and MPEG-4. These results suggest a general-purpose usefulness of the algorithm in improving compression ratios of unconstrained video.
引用
收藏
页码:1304 / 1318
页数:15
相关论文
共 70 条
[41]   A perceptually based quantization technique for MPEG encoding [J].
Osberger, W ;
Maeder, AJ ;
Bergmann, N .
HUMAN VISION AND ELECTRONIC IMAGING III, 1998, 3299 :148-159
[42]   An alternate characterization of hazard in occupational epidemiology: Years of life lost per years worked [J].
Park, RM ;
Bailer, AJ ;
Stayner, LT ;
Halperin, W ;
Gilbert, SJ .
AMERICAN JOURNAL OF INDUSTRIAL MEDICINE, 2002, 42 (01) :1-10
[43]   Variable-resolution displays: A theoretical, practical, and behavioral evaluation [J].
Parkhurst, DJ ;
Niebur, E .
HUMAN FACTORS, 2002, 44 (04) :611-629
[44]   Modeling the effect of task and graphical representation on response latency in a graph reading task [J].
Peebles, D ;
Cheng, PCH .
HUMAN FACTORS, 2003, 45 (01) :28-46
[45]   RECOGNITION MEMORY FOR A RAPID SEQUENCE OF PICTURES [J].
POTTER, MC ;
LEVY, EI .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1969, 81 (01) :10-&
[46]   Algorithms for defining visual regions-of-interest: Comparison with eye fixations [J].
Privitera, CM ;
Stark, LW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (09) :970-982
[47]   Demand-driven image transmission with levels of detail and regions of interest [J].
Rauschenbach, U ;
Schumann, H .
COMPUTERS & GRAPHICS-UK, 1999, 23 (06) :857-866
[48]   Saliency of peripheral targets in gaze-contingent multiresolutional displays [J].
Reingold, EM ;
Loschky, LC .
BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2002, 34 (04) :491-499
[49]   The dynamic representation of scenes [J].
Rensink, RA .
VISUAL COGNITION, 2000, 7 (1-3) :17-42
[50]   Attention increases sensitivity of V4 neurons [J].
Reynolds, JH ;
Pasternak, T ;
Desimone, R .
NEURON, 2000, 26 (03) :703-714