AUTOMATIC FACE LOCATION DETECTION FOR MODEL-ASSISTED RATE CONTROL IN H.261-COMPATIBLE CODING OF VIDEO

被引:31
作者
ELEFTHERIADIS, A [1 ]
JACQUIN, A [1 ]
机构
[1] COLUMBIA UNIV,DEPT ELECT ENGN,NEW YORK,NY 10027
关键词
VERY LOW BIT-RATE CODING; FACIAL FEATURE DETECTION; MODEL-BASED CODING; H.261; VIDEO COMPRESSION;
D O I
10.1016/0923-5965(95)00017-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present a novel and practical way to integrate techniques from computer vision to low bit-rate coding systems for video teleconferencing applications. Our focus is to locate and track the faces and selected facial features of persons in typical head-and-shoulders video sequences, and to exploit the location information in a 'classical' video coding/decoding system. The motivation is to enable the system to encode selectively various image areas and to produce perceptually pleasing coded images where faces are sharper. We refer to this approach-a mix of classical waveform coding and model-based coding-as model-assisted coding. We propose two totally automatic algorithms which, respectively, perform the detection of a head outline, and identify an 'eyes-nose-mouth' region, both from downsampled binary thresholded edge images. The algorithms operate accurately and robustly, even in cases of significant head rotation or partial occlusion by moving objects. We show how the information about face and facial feature location can be advantageously exploited by low bit-rate waveform-based video coders. In particular, we describe a method of object-selective quantizer control in a standard coding system based on motion-compensated discrete cosine transform-CCITT's recommendation H.261. The approach is based on two novel algorithms, namely buffer rate modulation and buffer size modulation. By forcing the rate control algorithm to transfer a fraction of the total available bit-rate from the coding of the non-facial to that of the facial area, the coder produces images with better-rendered facial features, i.e. coding artefacts in the facial area are less pronounced and eye contact is preserved. The improvement was found to be perceptually significant on video sequences coded at the ISDN rate of 64 kbps, with 48 kbps for the input (color) video signal in QCIF format.
引用
收藏
页码:435 / 455
页数:21
相关论文
共 43 条
[1]  
Aizawa K., 1989, Signal Processing: Image Communication, V1, P139, DOI 10.1016/0923-5965(89)90006-4
[2]  
AIZAWA K, 1993, MOTION ANAL IMAGE SE, pCH11
[3]  
APOSTOLOPOULOS J, COMMUNICATION
[4]  
ARAVIND R, 1993, AT T TECH J, V72
[5]  
BADIQUE E, 1990, P PCS
[6]  
BRACCINI C, 1994, JAN IT NAT COUNC RES
[7]  
BUCK M, 1993, MOTION ANAL IMAGE SE, pCH10
[8]  
CHOI CS, 1991, P INT C ACOUST SPEEC
[9]  
CRAW I, 1987, PATTERN RECOGNITION, V5
[10]  
CURINGA S, 1993, MODELING COMPUTER GR