Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)

被引:8
作者
Deligne, S [1 ]
Potamianos, G [1 ]
Neti, C [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
SAM2002: IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP PROCEEDINGS | 2002年
关键词
D O I
10.1109/SAM.2002.1191001
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we introduce a non-linear enhancement technique called Audio-Visual Codebook Dependent Cepstral Normalization (AVCDCN) and we consider its use with both audio-only and audio-visual speech recognition. AVCDCN is inspired from CDCN [1] [2], an audio-only enhancement technique that approximates the non-linear effect of noise on speech with a piece-wise constant function. Our experiments show that the use of visual information in AVCDCN allows significant performance gains over CDCN.
引用
收藏
页码:68 / 71
页数:4
相关论文
共 6 条
[1]  
ACERO A, 1990, INT CONF ACOUST SPEE, P849, DOI 10.1109/ICASSP.1990.115971
[2]  
Deng L, 2001, INT CONF ACOUST SPEE, P301, DOI 10.1109/ICASSP.2001.940827
[3]   Audio-visual enhancement of speech in noise [J].
Girin, L ;
Schwartz, JL ;
Feng, G .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06) :3007-3020
[4]  
GOECKE R, 2002, IN PRESS P ICASSP 02
[5]  
Neti C., 2000, Tech. Report
[6]  
Potamianos G, 2001, INT CONF ACOUST SPEE, P165, DOI 10.1109/ICASSP.2001.940793