Phoneme recognition using ICA-based feature extraction and transformation

被引:54
作者
Kwon, OW
Lee, TW
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Cheongju 361763, Chungbuk, South Korea
[2] Univ Calif San Diego, Inst Neural Computat, La Jolla, CA 92093 USA
关键词
speech recognition; independent component analysis; feature extraction;
D O I
10.1016/j.sigpro.2004.03.004
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We investigate the use of independent component analysis (ICA) for speech feature extraction in speech recognition systems. Although initial research suggested that learning basis functions by ICA for encoding the speech signal in an efficient manner improved recognition accuracy, we observe that this may be true for a recognition tasks with little training data. However, when compared in a large training database to standard speech recognition features such as the mel frequency cepstral coefficients (MFCCs), the ICA-adapted basis functions perform poorly. This is mainly due to the resulting phase sensitivity of the learned speech basis functions and their time shift variance property. In contrast to image processing, phase information is not essential for speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The performance of the new feature is evaluated for phoneme recognition using the TIMIT speech database and compared with the standard MFCC feature. The phoneme recognition results show promising accuracy, which is comparable to the well-optimized MFCC features. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:1005 / 1019
页数:15
相关论文
共 29 条
[21]   Emergence of simple-cell receptive field properties by learning a sparse code for natural images [J].
Olshausen, BA ;
Field, DJ .
NATURE, 1996, 381 (6583) :607-609
[22]  
OSHAUGHESSY D, 1999, SPEECH COMMUNICATION
[23]  
PARRA L, 2001, ADV NEURAL INFORMATI, V13
[24]   Independent component analysis applied to feature extraction for robust automatic speech recognition [J].
Potamitis, L ;
Fakotakis, N ;
Kokkinakis, G .
ELECTRONICS LETTERS, 2000, 36 (23) :1977-1978
[25]  
SCHWARTZ O, 2001, ADV NEURAL INFORMATI, V13
[26]  
SOMERVUO P, 2003, P INT C AC SPEECH SI
[27]  
YOUNG SJ, 1992, P INT C AC SPEECH SI, P1569
[28]  
ZHU QF, 2001, P INT C AC SPEECH SI
[29]  
ZIEMER RE, 2002, PRINCIPLES COMMUNICA, P76