Phoneme recognition using ICA-based feature extraction and transformation

被引：54

作者：

Kwon, OW

Lee, TW

机构：

[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Cheongju 361763, Chungbuk, South Korea

[2] Univ Calif San Diego, Inst Neural Computat, La Jolla, CA 92093 USA

来源：

SIGNAL PROCESSING | 2004年 / 84卷 / 06期

关键词：

speech recognition; independent component analysis; feature extraction;

D O I：

10.1016/j.sigpro.2004.03.004

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We investigate the use of independent component analysis (ICA) for speech feature extraction in speech recognition systems. Although initial research suggested that learning basis functions by ICA for encoding the speech signal in an efficient manner improved recognition accuracy, we observe that this may be true for a recognition tasks with little training data. However, when compared in a large training database to standard speech recognition features such as the mel frequency cepstral coefficients (MFCCs), the ICA-adapted basis functions perform poorly. This is mainly due to the resulting phase sensitivity of the learned speech basis functions and their time shift variance property. In contrast to image processing, phase information is not essential for speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The performance of the new feature is evaluated for phoneme recognition using the TIMIT speech database and compared with the standard MFCC feature. The phoneme recognition results show promising accuracy, which is comparable to the well-optimized MFCC features. (C) 2004 Elsevier B.V. All rights reserved.

引用

页码：1005 / 1019

页数：15

共 29 条

[21] Emergence of simple-cell receptive field properties by learning a sparse code for natural images [J].