On the perceptual distance between speech segments

被引：2

作者：

Ghitza, O

Sondhi, MM

机构：

[1] Acoust. and Aud. Commun. Research, Bell Laboratories, Murray Hill

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 1997年 / 101卷 / 01期

关键词：

D O I：

10.1121/1.418115

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (Speech segments are defined as pieces of a speech signal of duration 50-150 ms. For concreteness, a segment is considered to mean a diphone, i.e., a segment from the midpoint of one phoneme to the midpoint of the adjacent phoneme.) Such a distance metric would be useful for speech coding at low bit rates. Saving bits in those systems relies on a perceptual tolerance to acoustic perturbations from the original speech - perturbations whose effects typically last for several tens of milliseconds. Such a distance metric would also be useful for automatic speech recognition on the assumption that perceptual invariance to adverse signal conditions (e.g., noise, microphone, and channel distortions, room reverberation, etc.) and to phonemic variability (due to nonuniqueness of articulatory gestures) may provide a basis for robust performance. In this paper, attempts at defining such a metric will be described. The approach in addressing this question is twofold. First psychoacoustical experiments relevant to the perception of speech are conducted to measure the relative importance of various time-frequency ''tiles'' (one at a time) when all other time-frequency information is present. The psychophysical data are then used to derive rules for integrating the output of a model of auditory-nerve activity over time and frequency. (C) 1997 Acoustical Society of America.

引用

页码：522 / 529

页数：8

共 12 条

[1]

[Anonymous], 1952, 13 MIT AC LAB

[2] EFFECT OF TEMPORAL ENVELOPE SMEARING ON SPEECH RECEPTION [J].