Audio-visual integration in multimodal communication

被引:130
作者
Chen, T [1 ]
Rao, RR [1 ]
机构
[1] AT&T Bell Labs, Holmdel, NJ 07733 USA
关键词
image analysis; multimedia communication; speech communication; speech processing; video signal processing;
D O I
10.1109/5.664274
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we review recent research that examines audio-visual integration in multimodal communication. The topics include bimodality in human speech, human and automated lip reading, facial animation, lip synchronization, joint audio-video coding, and bimodal speaker verification. We also study the enabling technologies for these research topics, including automatic facial-feature tracking and audio-to-visual mapping. Recent progress in audio-visual research shows that joint processing of audio and video provides advantages that are not available when the audio and video are processed independently.
引用
收藏
页码:837 / 852
页数:16
相关论文
共 70 条
[1]  
[Anonymous], P INT C AC SPEECH SI
[2]  
[Anonymous], SPEECHREADING HUMAN
[3]  
*BELLC, 1994, SGC1201 ITU T STUD G
[4]   CO-ARTICULATION EFFECTS IN LIPREADING [J].
BENGUEREL, AP ;
PICHORAFULLER, MK .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1982, 25 (04) :600-607
[5]  
BLOOM PJ, 1985, IEEE ASSP MAG OCT, P2
[6]  
BREGLER C, P ACM SIGGRAPH 97, P353
[7]  
BREGLER C, 1994, P INT C AC SPEECH SI, P669
[8]  
Burnham D., 1996, SPEECHREADING HUMANS, P103, DOI [10.1007/978-3-662-13015-5_7, DOI 10.1007/978-3-662-13015-5_7]
[9]   A Projection-Based Likelihood Measure for Speech Recognition in Noise [J].
Carlson, Beth A. ;
Clements, Mark A. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01) :97-102
[10]   Audio-visual interaction in multimodal communication [J].
Chellappa, R ;
Chen, TH ;
Katsaggelos, A .
IEEE SIGNAL PROCESSING MAGAZINE, 1997, 14 (04) :37-38