A new optimum feature extraction and classification method for speaker recognition: GWPNN

被引:33
作者
Avci, Engin [1 ]
机构
[1] Firat Univ, Tech Educ Fac, Dept Elect & Comp Sci, TR-23119 Elazig, Turkey
关键词
English speech signal; adaptive feature extraction; wavelet packet decomposition; entropy; genetic algorithm; wavelet packet-neural networks; expert system;
D O I
10.1016/j.eswa.2005.12.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech and speaker recognition is an important topic to be performed by a computer system. In this paper, an expert speaker recognition system based on optimum wavelet packet entropy is proposed for speaker recognition by using real speech/voice signal. This study contains both the combination of the new feature extraction and classification approach by using optimum wavelet packet entropy parameter values. These optimum wavelet packet entropy values are obtained from measured real English language speech/voice signal waveforms using speech experimental set. A genetic-wavelet packet-neural network (GWPNN) model is developed in this study. GWPNN includes three layers which are genetic algorithm, wavelet packet and multi-layer perception. The genetic algorithm layer of GWPNN is used for selecting the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the four different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet packet decomposition, wavelet packet decomposition - short-time Fourier transform, wavelet packet decomposition - Born-Jordan time-frequency representation, wavelet packet decomposition - Choi-Williams time-frequency representation. The wavelet packet layer is used for optimum feature extraction in the time-frequency domain and is composed of wavelet packet decomposition and wavelet packet entropies. The multi-layer perceptron of GWPNN, which is a feed-forward neural network, is used for evaluating the fitness function of the genetic algorithm and for classification speakers. The performance of the developed system has been evaluated by using noisy English speech/voice signals. The test results showed that this system was effective in detecting real speech signals. The correct classification rate was about 85% for speaker classification. (C) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:485 / 498
页数:14
相关论文
共 34 条
[1]   Investigating spoken Arabic digits in speech recognition setting [J].
Alotaibi, YA .
INFORMATION SCIENCES, 2005, 173 (1-3) :115-139
[2]  
[Anonymous], P OD 2004 TOL SPAIN
[3]  
Avci E, 2005, LECT NOTES COMPUT SC, V3522, P594
[4]  
Avci E., 2005, ASIAN J INFORM TECHN, V4, P133, DOI [10.3724/SP.J.1146.2007.00129, DOI 10.3724/SP.J.1146.2007.00129]
[5]  
AVCI E, 2003, INT J COMPUTATIONAL, V1, P231
[6]  
AVCI E, 2005, EXPERTS SYSTEMS APPL, V29
[7]  
Bishop CM., 1995, Neural networks for pattern recognition
[8]  
BOASBOASHASH B, 1992, TIME FREQUENCY SIGNA
[9]  
Buckheit J., 1995, WAVELETS STAT
[10]  
BURRUS CS, 1998, INTRO WAVELET WAVELE