Detection of phonological features in continuous speech using neural networks

被引：115

作者：

King, S ^{[1
]}

Taylor, P ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland

来源：

COMPUTER SPEECH AND LANGUAGE | 2000年 / 14卷 / 04期

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1006/csla.2000.0148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We report work on the first component of a two-stage speech recognition architecture based on phonological features rather than phones. This paper reports experiments on three phonological feature systems: (1) the Sound Pattern of English (SPE) system which uses binary features, (2) a multi-valued (MV) feature system which uses traditional phonetic categories such as manner, place, etc., and (3) Government Phonology (GP) which uses a set of structured primes. All experiments used recurrent neural networks to perform feature detection. Tn these networks the input layer is a standard framewise cepstral representation, and the output layer represents the values of the features. The system effectively produces a representation of the most likely phonological features for each input frame. All experiments were carried out on the TIMIT speaker-independent database. The networks performed well in all cases, with the average accuracy for a single feature ranging from 86% and 93%. We describe these experiments in detail, and discuss the justification and potential advantages of using phonological features rather than phones for the basis of speech recognition. (C) 2000 Academic Press.

引用

页码：333 / 353

页数：21

共 37 条

[1]

Ali AMA, 1999, ISCAS '99: PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3, P118, DOI 10.1109/ISCAS.1999.778799

[2]

[Anonymous], 1996, Automatic Speech and Speaker Recognition

[3]

Bitar N., 1995, P 1995 IEEE DUAL US, P310

[4]

Bitar NN, 1996, INT CONF ACOUST SPEE, P29, DOI 10.1109/ICASSP.1996.540282

[5]

Bourlard H, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P426

[6]

Bourlard H. A., 1994, Connectionist speech recognition: a hybrid approach

[7]

BRIDLE JS, 1998, CLSP JHU SUMM WORKSH

[8]

Chomsky Noam., 1968, The sound pattern of English

[9] A STATISTICAL APPROACH TO AUTOMATIC SPEECH RECOGNITION USING THE ATOMIC SPEECH UNITS CONSTRUCTED FROM OVERLAPPING ARTICULATORY FEATURES [J].

DENG, L ;

SUN, DX .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05) :2702-2719

[10]

DENG L, 1996, P INT C SPOK LANG PR, V4, P2266

← 1 2 3 4 →