均衡化的改进K均值聚类法

被引:13
作者
王红睿
赵黎明
裴剑
机构
[1] 吉林大学通信工程学院
关键词
矢量量化; K均值聚类法; 语音识别; 连续马尔可夫模型初值;
D O I
暂无
中图分类号
TN912.31 [语音波形编码];
学科分类号
0711 ;
摘要
为了进行连续马尔可夫模型的初值提取,提出一种各类在训练样本空间近似均衡分布的K均值聚类法。在聚类的过程中引入惩罚因子,从而限制过多的训练矢量集中于一个或几个类,使样本空间划分近似均匀。连续马尔可夫模型初值提取实验证明,该方法与标准的K均值聚类法、LBG(L inde Buzo G ray)聚类法相比,降低了矢量量化产生的全局失真,各个类在样本空间的分布更加均匀,提高了矢量量化的性能。将该方法用于孤立词识别连续马尔可夫模型的初值提取,可使各个高斯概率密度函数的参数估计更逼近其无偏估计,从而提高了马尔可夫模型初值的可靠性。
引用
收藏
页码:172 / 176
页数:5
相关论文
共 14 条
[1]  
K-M eans Optim al C lustering A lgorithm Based on Hybrid Genetic Techn ique. LU Q iang. Journal of East Ch ina Un iversityof Sc ience and Technology . 2005
[2]  
Mod ified K-M eans C lustering A lgorithm s for Use in Isolated W ord Recogn ition. W ILPON J G,RAB INER L R. IEEETransactions on Acoustics,Speech,S ignal Proc . 1985
[3]  
H idden M arkov Model Train ing w ith Contam inated Speech M aterial for D istant-Talk ingSpeech Recogn ition. MATASSONI M,OMOLOGO M. Computer Speech and Language . 2002
[4]  
Fundam entals of Speech Recogn ition. LAWRENCE R,B IING-HWANG J. . 1998
[5]  
Perform ance Evaluation of Som e C lustering A lgorithm s and Valid ity Ind ices. MAULIK U. IEEE Transactions on Pattern A-nalysis and M ach ine Intelligence . 2002
[6]  
A D iscrim inative Train ing A lgorithm forH idden M arkovModels. ASSAF B,DAVID B. IEEE Transactions on Speech andAud io . 2004
[7]  
VectorQuantization Codebook Generation Using S imu lated Annealing. FLANAGAN J K. Proc ICASSP IEEE InternationalConference on Acoustic Speech S ignal Process . 1989
[8]  
Genetic Annealing Search for Index Assignm ent in Vector Quantization. OSTROW SK I T. Pattern Recogn ition Letters . 1997
[9]  
An E ffic ientK-m eans C lustering A lgorithm s:Analysis and Imp le-m entation. KANUNGO T,MOUNT D M,NETANYAHU,NATHAN S. IEEE Transactions on Pattern Analysis and M ach ine Intelligence . 2002
[10]  
Non-Param etric Probab ility Estim ation for HMM-Based Autom atic Speech Recogn ition. LEFEVRE F. Computer Speech and Language . 2003