A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

被引：71

作者：

Li, Jinyu ^{[1
]}

Deng, Li ^{[1
]}

Yu, Dong ^{[1
]}

Gong, Yifan ^{[1
]}

Acero, Alex ^{[1
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

来源：

COMPUTER SPEECH AND LANGUAGE | 2009年 / 23卷 / 03期

关键词：

Phase-sensitive distortion model; Vector Taylor series; Joint compensation; Additive and convolutive distortions; Robust ASR; MAXIMUM-LIKELIHOOD; SPEECH RECOGNITION; NOISY ENVIRONMENTS; REGRESSION;

D O I：

10.1016/j.csl.2009.02.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present our recent development of a model-domain environment robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters arc estimated using multi-sources of information including a nonlinear environment-distortion model in the cepstral domain, the posterior probabilities of all the Gaussians in speech recognizer, and truncated vector Taylor series (VTS) approximation. Second, the estimated noise and channel parameters are used to adapt the static and dynamic portions (delta and delta-delta) of the HMM means and variances. This two-step algorithm enables joint compensation of both additive and convolutive distortions (JAC). The hallmark of our new approach is the use of a nonlinear, phase-sensitive model of acoustic distortion that captures phase asynchrony between clean speech and the mixing noise. In the experimental evaluation using the standard Aurora 2 task, the proposed Phase-JAC/VTS algorithm achieves 93.32% word accuracy using the clean-trained complex H M M backend as the baseline system for the unsupervised model adaptation. This represents high recognition performance oil this task without discriminative training of the H MM system. The experimental results show that the phase term, which was missing in all previous H M M adaptation work, contributes significantly to the achieved high recognition accuracy. (C) 2009 Elsevier Ltd. All rights reserved.

引用

页码：389 / 405

页数：17

共 38 条

[31]

Mauuary L., 1998, EUSIPCO, V1, P359

[32]

MOLAU S, 2003, P IEEE INT C SPEECH, V1, P656

[33]

Padmanabhan M., 2001, P EUR, P2359

[34]

Peinado A., 2006, SPEECH RECOGNITION D

[35]

POVEY D, 2005, P ICASSP, P1961

[36]

Rahim MG, 1996, IEEE T SPEECH AUDI P, V4, P19

[37]

Saon G, 2001, INT CONF ACOUST SPEE, P325, DOI 10.1109/ICASSP.2001.940833

[38]

SAON G, 2001, P INTERSPEECH, P629

← 1 2 3 4 →