A novel approach to isolated word recognition

被引:70
作者
Gülmezoglu, MB [1 ]
Dzhafarov, V
Keskin, M
Barkana, A
机构
[1] Osmangazi Univ, Dept Elect & Elect Engn, Eskisehir, Turkey
[2] Anadolu Univ, Dept Math, Eskisehir, Turkey
[3] Oregon State Univ, Dept Elect & Comp Engn, Corvallis, OR 97331 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1999年 / 7卷 / 06期
关键词
common vector approach; speech recognition; subspace methods;
D O I
10.1109/89.799687
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A voice signal contains the psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word uttered by different speakers can be very different, In this paper, two theories are developed by considering two optimization criteria applied to both the training set and the test set. The first theory is well known and uses what is called Criterion 1 here and ends up with the average of all vectors belonging to the words in the training set. The second theory is a novel approach and uses what is called Criterion 2 here, and it is used to extract the common properties of all vectors belonging to the words in the training set. It is shown that Criterion 2 is superior to Criterion 1 when the training set is of concern. In Criterion 2, the individual differences are obtained by subtracting a reference vector from other vectors, and individual difference vectors are used to obtain orthogonal vector basis by using Gram-Schmidt orthogonalization method, The common vector is obtained by subtracting projections of any vector of the training set on the orthogonal vectors from this same vector, It is proved that this common vector is unique for any word class in the training set and independent of the chosen reference vector. This common vector is used in isolated word recognition, and it is also shown that Criterion 2 is superior to Criterion 1 for the test set. From the theoretical and experimental study, it is seen that the recognition rates increase as the number of speakers in the training set increases. This means that the common vector obtained from Criterion 2 represents the common properties of a spoken word better than the common or average vector obtained from Criterion 1.
引用
收藏
页码:620 / 628
页数:9
相关论文
共 28 条
[1]  
ANGIN H, 1995, COMMON VECTOR OBTAIN
[2]  
[Anonymous], P INT C AC SPEECH SI
[3]  
ARSLAN Y, 1995, APPL GRAMSCHMIDT ORT
[4]  
BARKANA A, 1995, EEAG82 SCI TECH RES
[5]  
DAVIS SB, 1990, READINGS SPEECH RECO, P65
[6]  
Deller Jr J. R., 1993, DISCRETE TIME PROCES
[7]  
Edwards C. H., 1988, ELEMENTARY LINEAR AL
[8]   DISTANCE MEASURES FOR SPEECH PROCESSING [J].
GRAY, AH ;
MARKEL, JD .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (05) :380-391
[9]   Noise compensation for linear prediction via orthogonal transformation [J].
Hu, HT .
ELECTRONICS LETTERS, 1996, 32 (16) :1444-1445
[10]  
JALANKO M, 1980, P 4 INT C PATT REC, P1006