Audiovisual Speech Synthesis

被引:33
作者
G. Bailly
M. Bérar
F. Elisei
M. Odisio
机构
[1] Institut de la Communication Parlée UMR CNRS no 5009 INPG/Univ. Stendhal 46,
关键词
text-to-speech synthesis; audiovisual synthesis; facial animation; talking faces;
D O I
10.1023/A:1025700715107
中图分类号
学科分类号
摘要
This paper presents the main approaches used to synthesize talking faces, and provides greater detail on a handful of these approaches. An attempt is made to distinguish between facial synthesis itself (i.e. the manner in which facial movements are rendered on a computer screen), and the way these movements may be controlled and predicted using phonetic input. The two main synthesis techniques (model-based vs. image-based) are contrasted and presented by a brief description of the most illustrative existing systems. The challenging issues—evaluation, data acquisition and modeling—that may drive future models are also discussed and illustrated by our current work at ICP.
引用
收藏
页码:331 / 346
页数:15
相关论文
共 61 条
[1]  
Bailly G.(1998)Learning to speak. Sensori-motor control of speech movements Speech Communication 22 251-267
[2]  
Browman C.P.(1990)Gestural specification using dynamically-defined articulatory structures Journal of Phonetics 18 299-320
[3]  
Goldstein L.M.(2001)Active appearance models IEEE Transactions on Pattern Analysis and Machine Intelligence 23 681-685
[4]  
Cootes T.F.(2000)The Mesh-Matching algorithm: An automatic 3D mesh generator for finite element structures Journal of Biomechanics 33 1005-1009
[5]  
Edwards G.J.(1997)MPEG-4: audio/video and synthetic graphics/ audio for real-time, interactive media delivery Image Communications Journal 9 433-463
[6]  
Taylor C.J.(1998)Analyzing facial expressions for virtual conferencing IEEE Computer Graphics & Applications: Special Issue: Computer Animation forVirtual Humans 18 70-78
[7]  
Couteau B.(2002)Trainable videorealistic speech animation ACM Transactions on Graphics 21 388-398
[8]  
Payan Y.(1993)3D motion estimation in model-based facial image coding IEEE Transactions on PAMI 15 545-555
[9]  
Lavallée S.(2002)Extraction of visual features for lipreading IEEE Transactions on Pattern Analysis and Machine Intelligence 24 198-213
[10]  
Doenges P.(1976)Hearing lips and seeing voices Nature 26 746-748