Speech recognition in noisy environments using first-order vector Taylor series

被引：88

作者：

Kim, DY

Un, CK

Kim, NS

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Yusong Gu, Taejon 305701, South Korea

[2] Samsung Adv Inst Technol, Human & Comp Interact Lab, Suwon 440600, South Korea

[3] Seoul Natl Univ, Sch Elect Engn, Kwanak Gu, Seoul 151742, South Korea

来源：

SPEECH COMMUNICATION | 1998年 / 24卷 / 01期

关键词：

speech recognition; noise-robust; Taylor series;

D O I：

10.1016/S0167-6393(97)00061-7

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we generalize relations between clean and noisy speech signal using vector Taylor series (VTS) expansion for noise-robust speech recognition. We use it for both the noisy data compensation and hidden Markov model (HMM) parameter adaptation, and apply it for the cepstral domain directly, while Moreno used it to estimate the log-spectral parameters. Also, we develop a detailed procedure to estimate environmental variables in the cepstral domain using the expectation and maximization (EM) algorithms based on the maximum likelihood (ML) sense. To evaluate the developed method, we conduct speaker-independent isolated word and continuous speech recognition experiments. White Gaussian and driving car noises added to clean speech at various SM are used as disturbing sources. Using only noise statistics obtained from three frames of silence and noisy speech to be recognized, we achieve significant performance improvement. Especially, HMM parameter adaptation with VTS is more effective than the parallel model combination (PMC) based on the log-normal assumption. (C) 1998 Elsevier Science B.V. All rights reserved.

引用

页码：39 / 49

页数：11

共 14 条

[1]

[Anonymous], 1993, ACOUSTICAL ENV ROBUS

[2]

[Anonymous], 2012, ROBUSTNESS AUTOMATIC

[3] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[4] Filterbank-Energy Estimation Using Mixture and Markov Models for Recognition of Noisy Speech [J].

Erell, Adoram ;

Weintraub, Mitchel .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (01) :68-76

[5] ROBUST SPEECH RECOGNITION IN ADDITIVE AND CONVOLUTIONAL NOISE USING PARALLEL MODEL COMBINATION [J].

GALES, MJF ;

YOUNG, SJ .

COMPUTER SPEECH AND LANGUAGE, 1995, 9 (04) :289-307

[6] Probabilistic vector mapping with trajectory information for noise-robust speech recognition [J].

Kim, DY ;

Un, CK .

ELECTRONICS LETTERS, 1996, 32 (17) :1550-1551

[7]

KIM NS, 1997, P ESCA WORKSH ROB SP, P99

[8] Performance of HMM-based speech recognizers with discriminative state-weights [J].

Kwon, OW ;

Un, CK .

SPEECH COMMUNICATION, 1996, 19 (03) :197-205

[9]

MORENO PJ, 1996, THESIS CARNEGIE MELO

[10] Integrated Models of Signal and Background with Application to Speaker Identification in Noise [J].

Rose, R. C. ;

Hofstetter, E. M. ;

Reynolds, D. A. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :245-257

← 1 2 →