Visual speech synthesis by morphing visemes

被引:68
作者
Ezzat, T [1 ]
Poggio, T
机构
[1] MIT, Artificial Intelligence Lab, Ctr Biol & Computat Learning, Cambridge, MA 02139 USA
[2] MIT, Dept Brain & Cognit Sci, Ctr Biol & Computat Learning, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
computer vision; machine learning; facial modelling; facial animation; morphing; optical flow; speech synthesis; lip synchronization;
D O I
10.1023/A:1008166717597
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
引用
收藏
页码:45 / 57
页数:13
相关论文
共 33 条
[1]  
[Anonymous], 1974, THESIS U UTAH
[2]  
Avidan S., 1997, VRST'97. ACM Symposium on Virtual Reality Software and Technology 1997, P103, DOI 10.1145/261135.261155
[3]   PERFORMANCE OF OPTICAL-FLOW TECHNIQUES [J].
BARRON, JL ;
FLEET, DJ ;
BEAUCHEMIN, SS .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 1994, 12 (01) :43-77
[4]   Feature-based image metamorphosis [J].
Beier, Thaddeus ;
Neely, Shawn .
Computer Graphics (ACM), 1992, 26 (02) :35-42
[5]  
BERGEN JR, 1990, HIERARCHICAL MOTION
[6]  
BEYMER D, 1993, 1431 MIT AI LAB
[7]  
Black A., 1997, FESTIVAL SPEECH SYNT
[8]  
BREGLER C, 1997, SIGGRAPH 97 P LOS AN
[9]   THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].
BURT, PJ ;
ADELSON, EH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540
[10]  
CHEN SE, 1993, SIGGRAPH 93, P279