CONSIDERATIONS IN APPLYING CLUSTERING TECHNIQUES TO SPEAKER-INDEPENDENT WORD RECOGNITION

被引：31

作者：

RABINER, LR

WILPON, JG

机构：

[1] Acoustics Research Department, Bell Laboratories, Murray Hill, New Jersey

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 1979年 / 66卷 / 03期

关键词：

D O I：

10.1121/1.383693

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker-independent word templates for an isolated word recognition system [Levinson et al., IEEE Trans. Acoust. Speech Signal Process. ASSP-27 (2), 134–141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker-independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [IEEE Trans. Acoust. Speech Signal Process. ASSP-26 (3), 34–42 (1978)] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e., the cluster center) and the elements in the cluster. Experimental data show the first method to be superior to the second method when three or more clusters per word are used in the recognition task. © 1979, American Association of Physics Teachers. All rights reserved.

引用

页码：663 / 673

页数：11

共 15 条

[1] SPEAKER-INDEPENDENT SPEECH-RECOGNITION SYSTEM BASED ON LINEAR PREDICTION [J].

GUPTA, VN ;

BRYAN, JK ;

GOWDY, JN .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (01) :27-33

[2] MINIMUM PREDICTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION [J].

ITAKURA, F .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :67-72

[3] INTERACTIVE CLUSTERING TECHNIQUES FOR SELECTING SPEAKER-INDEPENDENT REFERENCE TEMPLATES FOR ISOLATED WORD RECOGNITION [J].

LEVINSON, SE ;

RABINER, LR ;

ROSENBERG, AE ;

WILPON, JG .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :134-141

[4] EVALUATION OF A WORD RECOGNITION SYSTEM USING SYNTAX ANALYSIS [J].

LEVINSON, SE ;

ROSENBERG, AE ;

FLANAGAN, JL .

BELL SYSTEM TECHNICAL JOURNAL, 1978, 57 (05) :1619-1626

[5]

MARKEL JD, 1975, LINEAR PREDICTION SP

[6] CONSIDERATIONS IN DYNAMIC TIME WARPING ALGORITHMS FOR DISCRETE WORD RECOGNITION [J].

RABINER, LR ;

ROSENBERG, AE ;

LEVINSON, SE .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (06) :575-582

[7] CREATING REFERENCE TEMPLATES FOR SPEAKER INDEPENDENT RECOGNITION OF ISOLATED WORDS [J].

RABINER, LR .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (01) :34-42

[8]

RABINER LR, IEEE T ACOUST SPEECH

[9] MENTAL DECLINE IN ELDERLY - PHARMACOTHERAPY (ERGOT ALKALOIDS VERSUS PAPAVERINE) [J].

ROSEN, HJ .

JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 1975, 23 (04) :169-174

[10]

ROSENBERG AE, 1972, J ACOUST SOC AM A S1, V62, pS563

← 1 2 →