Principal component analysis of speech spectrogram images

被引:33
作者
Pinkowski, B
机构
[1] Computer Science Department, Western Michigan University, Kalamazoo
基金
美国国家卫生研究院;
关键词
principal components; Karhunen-Loeve transform; Fourier descriptors; cluster analysis; speech spectrogram;
D O I
10.1016/S0031-3203(96)00103-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research has demonstrated that spectrograms containing human speech utterances can be analyzed using image processing techniques to yield a high recognition rate. In particular, Fourier descriptors (FDs) have been proved very useful for characterizing the boundary of segmented isolated words containing the English semivowels /w/, /y/, /l/, and /r/. This study examines the appropriateness of FDs combined with 17 other general features for classifying objects contained in binary spectrogram images. Principal components (PCs) are used for feature reduction on a speaker-dependent data set consisting of 80 sounds representing 20 speaker-dependent words containing English semivowels. With only eight features, including four 32-point FDs and four general features obtained from principal component analysis, a 97.5% recognition rate was obtained. (C) 1997 Pattern Recognition Society.
引用
收藏
页码:777 / 787
页数:11
相关论文
共 34 条