Noise adaptive stream weighting in audio-visual speech recognition

被引:70
作者
Heckmann, M
Berthommier, F
Kroschel, K
机构
[1] Univ Karlsruhe, Inst Nachrichtentech, D-76128 Karlsruhe, Germany
[2] Inst Natl Polytech Grenoble, ICP, F-38031 Grenoble, France
关键词
audio-visual speech recognition; adaptive weighting; robust recognition; multistream recognition; ANN/HMM;
D O I
10.1155/S1110865702206150
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI) architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.
引用
收藏
页码:1260 / 1273
页数:14
相关论文
共 38 条
[1]  
ADJOUDANI A, 1996, SPEECHREADING HUMANS, P461
[2]   How Do Humans Process and Recognize Speech? [J].
Allen, Jont B. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :567-577
[3]  
[Anonymous], SPEECH PERCEPTION EY
[4]  
[Anonymous], P ICSLP2000
[5]  
[Anonymous], 1992, RASTA PLP SPEECH ANA
[6]  
[Anonymous], 2000, SPEECH AUDIO SIGNAL
[7]  
[Anonymous], [No title captured]
[8]  
Berthommier F., 1999, PROC INT C PHONETIC, P711
[9]  
Bourlard H., 1996, P INT C SPOK LANG PR, P422
[10]  
Bregler C., 1993, P INT C AC SPEECH SI, P557