Audio-visual enhancement of speech in noise

被引:77
作者
Girin, L [1 ]
Schwartz, JL [1 ]
Feng, G [1 ]
机构
[1] Univ Grenoble 3, CNRS UMR 5009, Inst Commun Parlee, INPG, F-38040 Grenoble, France
关键词
D O I
10.1121/1.1358887
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of Lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise. (C) 2001 Acoustical Society of America.
引用
收藏
页码:3007 / 3020
页数:14
相关论文
共 52 条
[1]   LAWS FOR LIPS [J].
ABRY, C ;
BOE, LJ .
SPEECH COMMUNICATION, 1986, 5 (01) :97-104
[2]  
Abry C., 1980, LABIALIT PHON TIQUE, P217
[3]  
[Anonymous], [No title captured]
[4]   FORMANT TRAJECTORIES AS AUDIBLE GESTURES - AN ALTERNATIVE FOR SPEECH SYNTHESIS [J].
BAILLY, G ;
LABOISSIERE, R ;
SCHWARTZ, JL .
JOURNAL OF PHONETICS, 1991, 19 (01) :9-23
[5]  
BARKER J, 1998, P AVSP 98 SYDN
[6]   EFFECTS OF PHONETIC CONTEXT ON AUDIOVISUAL INTELLIGIBILITY OF FRENCH [J].
BENOI, C ;
MOHAMADI, T ;
KANDEL, S .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1994, 37 (05) :1195-1203
[7]  
Benoit C., 1992, SET FRENCH VISEMES V
[8]  
Bernstein LE, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1477, DOI 10.1109/ICSLP.1996.607895
[9]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[10]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING 2 MICROPHONE ADAPTIVE NOISE CANCELLATION [J].
BOLL, SF ;
PULSIPHER, DC .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (06) :752-753