Audio-visual integration in multimodal communication

被引:130
作者
Chen, T [1 ]
Rao, RR [1 ]
机构
[1] AT&T Bell Labs, Holmdel, NJ 07733 USA
关键词
image analysis; multimedia communication; speech communication; speech processing; video signal processing;
D O I
10.1109/5.664274
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also study the enabling technologies for these research topics, including automatic facial-feature tracking and audio-to-visual mapping. Recent progress in audio-visual research shows that joint processing of audio and video provides advantages that are not available when the audio and video are processed independently.
引用
收藏
页码:837 / 852
页数:16
相关论文
共 70 条
[41]   VISEMES OBSERVED BY HEARING-IMPAIRED AND NORMAL-HEARING ADULT VIEWERS [J].
OWENS, E ;
BLAZEK, B .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1985, 28 (03) :381-393
[42]  
PARKE FI, 1982, IEEE COMPUT GRAPH, V2, P61
[43]  
PETAJAN ED, P CHI88, P19
[44]  
PETAJAN ED, 1984, P IEEE GLOB TEL C AT, P265
[45]  
POTAMIANOS G, 1997, P EUR TUT WORKSH AUD
[46]  
PRASAD K, 1993, CRCTR 9326
[47]  
Rabiner L., 1993, Fundamentals of Speech Recognition
[48]  
RAO R, P ICIP 95 WASH, P556
[49]  
RAO R, 1995, P S MULT COMM VID CO, P301
[50]  
RESIBERG D, 1987, HEARING EYE PSYCHOL, P97