PHONEME CLASSIFICATION USING SEMICONTINUOUS HIDDEN MARKOV-MODELS

被引：36

作者：

HUANG, XD

机构：

[1] School of Computer Science, Carnegie-Mellon University, Pittsburgh

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 1992年 / 40卷 / 05期

关键词：

D O I：

10.1109/78.134469

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Hidden Markov models (HMM's) have been demonstrated as one of the most powerful statistical tools available for automatic speech recognition. The semicontinuous HMM (SCHMM) is a very general model including both discrete and continuous mixture HMM's as its special forms. In comparison to the conventional discrete HMM, robustness can be enhanced by using multiple codewords in deriving the semicontinuous output probability; and the VQ codebook itself can be optimized together with the HMM parameters under the assumption that each codeword is represented by a continuous probability density function (pdf). In comparison to the conventional continuous mixture HMM, the SCHMM can maintain the modeling ability of large-mixture pdf functions by using a universal set of pdf's. In addition, the number of free parameters and the computational complexity can be reduced because all of the pdf's are shared across different models. The SCHMM thus provides a good solution to the conflict between detailed acoustic modeling and insufficient training data. In this paper, SCHMM's with explicit state duration modeling are carried out for speaker-dependent phoneme classification in comparison with both the discrete HMM and the continuous HMM. Results have clearly demonstrated that the SCHMM with state duration offers significantly improved phoneme classification accuracy. In comparison with the benchmark work of the discrete HMM and the continuous HMM, the error rate was reduced by more than 30% and 20%, respectively.

引用

页码：1062 / 1067

页数：6

共 18 条

[1]

ARIKI Y, 1989, IEE ELECTRON LETT, V25

[2]

BELLAGARDA J, 1989, P ICASSP, P13

[3] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[4]

Huang X. D., 1989, Computer Speech and Language, V3, P239, DOI 10.1016/0885-2308(89)90020-X

[5]

HUANG XD, 1989, THESIS U EDINBURGH D

[6]

HUANG XD, 1988, IEEE WORKSHOP SPEECH

[7] THE DEVELOPMENT OF AN EXPERIMENTAL DISCRETE DICTATION RECOGNIZER [J].

JELINEK, F .

PROCEEDINGS OF THE IEEE, 1985, 73 (11) :1616-1624

[8]

Lee K. F., 1989, AUTOMATIC SPEECH REC

[9] CONTINUOUSLY VARIABLE DURATION HIDDEN MARKOV MODELS FOR AUTOMATIC SPEECH RECOGNITION. [J].

Levinson, S.E. .

Computer Speech and Language, 1986, 1 (01) :29-45

[10] ALGORITHM FOR VECTOR QUANTIZER DESIGN [J].

LINDE, Y ;

BUZO, A ;

GRAY, RM .

IEEE TRANSACTIONS ON COMMUNICATIONS, 1980, 28 (01) :84-95

← 1 2 →