MODELING OF CONTEXTUAL EFFECTS BASED ON SPECTRAL PEAK INTERACTION

被引：2

作者：

AKAGI, M ^{[1
]}

机构：

[1] NIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPAN

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 1993年 / 93卷 / 02期

关键词：

D O I：

10.1121/1.405556

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a model of contextual effects able to cope with coarticulation problems, especially vowel neutralization. This model is designed to model the superior recognition ability mechanisms of humans and apply these mechanisms to automatic speech recognition and synthesis. It predicts target spectral peaks in reduced vowels, based on interactions between spectral peak pairs. To construct and substantiate the model, psychoacoustic experiments were carried out to measure the extent of phoneme boundary shift with a single formant stimulus as a preceding anchor. The results of the experiments were compared with the spectral peak interaction results obtained from real speech data using the model. This comparison showed that the obtained spectral peak interactions, measured through perceptual boundary shifts with a single formant anchor, are similar to the spectral peak interactions estimated by the model. Additionally, recovery simulations of reduced spectral peak trajectories with real speech data showed that the spectral peak interactions obtained from the psychoacoustic experiments can be used to predict target spectral peaks from reduced spectral peak trajectories in the same manner as the spectral peak interaction function estimated by the model. These results suggest that the model may be emulating aspects of the human mechanisms, that the contextual effects resulting from the interactions between single formant stimuli can play an important role in improving phoneme neutralization recovery, and that the neutralization recovery model can be formulated as the sum of the interactions between spectral peaks. Furthermore, the model can be implemented as a speech recognition preprocessor to reduce recognition error rates because it can overshoot spectral peak trajectories, shift spectral peaks toward their targets, and increase distances among category centers and Bhattacharyya distances between vowel categories.

引用

页码：1076 / 1086

页数：11

共 10 条

[1] EVALUATION OF A SPECTRUM TARGET PREDICTION MODEL IN SPEECH-PERCEPTION [J].

AKAGI, M .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (02) :858-865

[2]

Akagi M., 1990, Computer Speech and Language, V4, P325, DOI 10.1016/0885-2308(90)90014-W

[3] WITHIN-SERIES AND BETWEEN-SERIES CONTRAST IN VOWEL IDENTIFICATION - FULL-VOWEL VERSUS SINGLE-FORMANT ANCHORS [J].

FOX, RA .

PERCEPTION & PSYCHOPHYSICS, 1985, 38 (03) :223-226

[4]

Hirahara T., 1988, J ACOUST SOC AM, V84, pS156

[5]

HUANG CB, 1986, IEEE ICASSP, V86, P893

[6]

KATAGIRI S, 1987, AUT P M ACOUST SOC J, P95

[7] AN APPROACH TO NORMALIZATION OF COARTICULATION EFFECTS FOR VOWELS IN CONNECTED SPEECH [J].

KUWABARA, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 77 (02) :686-694

[8]

KUWAHARA H, 1975, J ACOUST SOC JPN, V31, P18

[9]

LINDBLOM BEF, 1967, J ACOUST SOC AM, V42, P686

[10] ANALYTICAL EXPRESSIONS FOR CRITICAL-BAND RATE AND CRITICAL BANDWIDTH AS A FUNCTION OF FREQUENCY [J].

ZWICKER, E ;

TERHARDT, E .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 68 (05) :1523-1525

← 1 →