Spoken arabic digits recognizer using recurrent neural networks

被引：9

作者：

Alotaibi, YA ^{[1
]}

机构：

[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Engn, Riyadh 11574, Saudi Arabia

来源：

PROCEEDINGS OF THE FOURTH IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY | 2004年

关键词：

D O I：

10.1109/ISSPIT.2004.1433720

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the ten digits, zero through nine. All Arabic digits are polysyllabic (except digit zero which is a monosyllabic) words and most of them contain Arabic unique phonemes, namely, pharyngeal end emphatic subset. In this paper Arabic digits were investigated from the speech recognition problem point of view. A recurrent neural networks based speech recognition system was designed and tested with automatic Arabic digits recognition. The system is an isolated whole word speech recognizer and it was implemented both as a multi-speaker (i.e., the same set of speakers were used in both the training and testing phases) mode and speaker-independent (i.e., speakers used for training are different from those used for testing) mode. During recognition process, the digitized speech is cleaned from the noise by means of band-pass filters, the signal is also pre-emphasized, then it windowed and blocked by Hamming window, a time alignment algorithm is used to compensate for the differences in the utterances ' lengths and misalignments between phonemes, frames features are extracted by using MFCC coefficients to reduce the amount of the information in the input signal, and finally the neural network classifies the unknown digit. This recognition system achieved 99.5% correct digit recognition in the case of multi-speaker mode, and 94.5% in the case of speaker-independent mode.

引用

页码：195 / 199

页数：5

共 17 条

[11]

HAGOS E, 1985, IMPLEMENTATION ISOLA

[12]

Haykin S., 1999, Neural Networks: A Comprehensive Foundation, V2nd ed

[13] Review of Neural Networks for Speech Recognition [J].

Lippmann, Richard P. .

NEURAL COMPUTATION, 1989, 1 (01) :1-38

[14] COMPARATIVE-STUDY OF SEVERAL DISTORTION MEASURES FOR SPEECH RECOGNITION [J].

NOCERINO, N ;

SOONG, FK ;

RABINER, LR ;

KLATT, DH .

SPEECH COMMUNICATION, 1985, 4 (04) :317-331

[15] ALGORITHM FOR DETERMINING ENDPOINTS OF ISOLATED UTTERANCES [J].

RABINER, LR ;

SAMBUR, MR .

BELL SYSTEM TECHNICAL JOURNAL, 1975, 54 (02) :297-315

[16]

SUGIYAMA M, 1991, CIRC SYST 1991 IEEE, V1, P581

[17] PHONEME RECOGNITION USING TIME-DELAY NEURAL NETWORKS [J].

WAIBEL, A ;

HANAZAWA, T ;

HINTON, G ;

SHIKANO, K ;

LANG, KJ .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (03) :328-339

← 1 2 →