Hierarchical large-margin Gaussian mixture models for phonetic classification

被引:14
作者
Chang, Hung-An [1 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
来源
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2 | 2007年
关键词
hierarchical classifier; committee classifier; large margin GMM; phonetic classification;
D O I
10.1109/ASRU.2007.4430123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a hierarchical large-margin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A two-stage hierarchical. classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overall classification. Since the loss function required in the training is convex to the parameter space the problem of spurious local minima is avoided. The model achieves good performance with fewer parameters than single-level classifiers. In the TIMIT benchmark task of context-independent phonetic classification, the proposed modeling scheme achieves a state-of-the-art phonetic classification error of 16.7% on the core test set. This is an absolute reduction of 1.6% from the best previously reported result on this task, and 4-5% lower than a variety of classifiers that have been recently examined on this task.
引用
收藏
页码:272 / 277
页数:6
相关论文
共 15 条
[1]  
Garofolo J. S., 1993, LINGUIST DATA CONSOR
[2]  
GILLICK L, 1989, P ICASSP, P532
[3]  
GUNAWARDANA A, 2005, P EUR
[4]  
Halberstadt A K., 1998, P ICSLP
[5]  
HALBERSTADT AK, 1997, P EUR, P401
[6]   Minimum classification error rate methods for speech recognition [J].
Juang, BH ;
Chou, W ;
Lee, CH .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (03) :257-265
[7]   SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS [J].
LEE, KF ;
HON, HW .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1641-1648
[8]  
Li XC, 2005, HEAT TRANSF DIV ASME, V376-1, P513
[9]  
Rifkin R, 2007, INT CONF ACOUST SPEE, P881
[10]  
SCHEWCHUK JR, 1994, CMU