Model-aided coding: A new approach to incorporate facial animation into motion-compensated video coding

被引:39
作者
Eisert, P [1 ]
Wiegand, T [1 ]
Girod, B [1 ]
机构
[1] Univ Erlangen Nurnberg, Telecommun Lab, D-91058 Erlangen, Germany
关键词
facial animation; model-aided coding; model-based coding; multiframe prediction;
D O I
10.1109/76.836279
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We show that traditional waveform coding and 3-D model-based coding are not competing alternatives, but should be combined to support and complement each other. Both approaches are combined such that the generality of waveform coding and the efficiency of 3-D model-based coding are available where needed. The combination is achieved by providing the block-based video coder with a second reference frame for prediction, which is synthesized by the model-based coder. The model-based coder uses a parameterized 3-D head model, specifying shape and color of a person. We therefore restrict our investigations to typical videotelephony scenarios that show head-and-shoulder scenes. Motion and deformation of the 3-D head model constitute facial expressions which are represented by facial animation parameters (FAP's) based on the MPEG-4 standard. An intensity gradient based approach that exploits the 3-D model information is used to estimate the FAP's, as well as illumination parameters, that describe changes of the brightness in the scene. Model failures and objects that are not known at the decoder are handled by standard block-based motion-compensated prediction, which is not restricted to a special scene content, but results in lower coding efficiency. A Lagrangian approach is employed to determine the most efficient prediction for each block from either the synthesized model frame or the previous decoded frame. Experiments on five video sequences show that bit-rate savings of about 35% are achieved at equal average peak signal-to-noise ratio (PSNR) when comparing the model-aided codec to TMN-10, the state-of-the-art test model of the H.263 standard. This corresponds to a gain of 2-3 dB in PSNR when encoding at the same average bit rate.
引用
收藏
页码:344 / 358
页数:15
相关论文
共 39 条
[1]  
[Anonymous], GENERIC CODING AUDIO
[2]   3-D MOTION ESTIMATION AND WIREFRAME ADAPTATION INCLUDING PHOTOMETRIC EFFECTS FOR MODEL-BASED CODING OF FACIAL IMAGE SEQUENCES [J].
BOZDAGI, G ;
TEKALP, AM ;
ONURAL, L .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1994, 4 (03) :246-256
[3]   ENTROPY-CONSTRAINED VECTOR QUANTIZATION [J].
CHOU, PA ;
LOOKABAUGH, T ;
GRAY, RM .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (01) :31-42
[4]   A SWITCHED MODEL-BASED CODER FOR VIDEO SIGNALS [J].
CHOWDHURY, MF ;
CLARK, AF ;
DOWNTON, AC ;
MORIMATSU, E ;
PEARSON, DE .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1994, 4 (03) :216-227
[5]  
DECARLO D, 1996, C COMP VIS PATT REC, P231
[6]   Analyzing facial expressions for virtual conferencing [J].
Eisert, P ;
Girod, B .
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 1998, 18 (05) :70-78
[7]  
EISERT P, 1998, P 10 IM MULT DIG SIG, P119
[8]  
EISERT P, 1997, P IEEE INT C IM PROC, V2, P418
[10]  
Foley J. D., 1990, Computer Graphics, Principles and Practice, V2nd