An N-Best Candidates-Based Discriminative Training for Speech Recognition Applications

被引:25
作者
Chen, Jung-Kuei [1 ]
Soong, Frank K. [2 ]
机构
[1] Minist Commun, Telecommun Labs, Chungli, Taiwan
[2] AT&T Bell Labs, Murray Hill, NJ 07974 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1994年 / 2卷 / 01期
关键词
D O I
10.1109/89.260363
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose an N-best candidates-based discriminative training procedure for constructing high-performance HMM speech recognizers. The algorithm has two distinct features: 1) N-best hypotheses are used for training discriminative models, and 2) a new frame-level loss function is minimized to improve the separation between the correct and incorrect hypotheses. The N-best candidates are decoded based on our recently proposed tree-trellis fast search algorithm. The new frame-level loss function, which is defined as a half-wave rectified log-likelihood difference between the correct and competing hypotheses, is minimized over all training tokens. The minimization is carried out by adjusting the HMM parameters along a gradient descent direction. Two speech recognition applications have been tested, including a speaker independent, small vocabulary (ten Mandarin Chinese digits), continuous speech recognition, and a speaker-trained, large vocabulary (5000 commonly used Chinese words), isolated word recognition. Significant performance improvement over the traditional maximum likelihood trained HMM's has been obtained. In the connected Chinese digit recognition experiment, the string error rate is reduced from 17.0 to 10.8% for unknown length decoding and from 8.2 to 5.2% for known length decoding. In the large vocabulary, isolated word recognition experiment, the recognition error rate is reduced from 7.2 to 3.8%. Additionally, we have found that using more relaxed decoding constraints in preparing N-best hypotheses yields better recognition results.
引用
收藏
页码:206 / 216
页数:11
相关论文
共 19 条
[1]   A THEORY OF ADAPTIVE PATTERN CLASSIFIERS [J].
AMARI, S .
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1967, EC16 (03) :299-+
[2]  
BAHL L, 1986, P INT C AC SPEECH SI, V1, P49, DOI DOI 10.1109/ICASSP.1986.1169179>
[3]  
BAHL LR, 1988, P ICASSP 88 NEW YORK, P493
[4]  
CHANG PC, 1991, P IEEE INT C AC SPEE, P549
[5]  
CHOU W, 1992, P INT C AC SPEECH SI, V1, P473
[6]  
CHOW YL, 1990, P INT C AC SPEECH SI, P701
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[9]  
FRANCO H, 1991, P INT C ACOUST SPEEC, P357
[10]   A*-Admissible Heuristics for Rapid Lexical Access [J].
Kenny, Patrick ;
Hollan, Rene ;
Gupta, Vishwa N. ;
Lennig, Matthew ;
Mermelstein, P. ;
O'Shaughnessy, Douglas .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (01) :49-58