Transform Representation of the Spectra of Acoustic Speech Segments with Applications-I: General Approach and Application to Speech Recognition

被引:13
作者
Algazi, V. Ralph [1 ,2 ]
Brown, Kathy L. [1 ]
Ready, Michael J.
Irvine, David H. [1 ]
Cadwell, Christie L.
Chung, Sang
机构
[1] Univ Calif Davis, Ctr Image Proc & Integrated Computing, Speech Res Lab, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Elect Engn & Comp Sci, Davis, CA 95616 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1993年 / 1卷 / 02期
关键词
26;
D O I
10.1109/89.222877
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present in this series of two papers a new approach for modeling and capturing the time-varying structure of the spectral envelope of speech. In this approach, we use an acoustic subword decomposition and the Karhunen-Loeve transform (UT) to extract and efficiently represent the highly correlated structure of the spectral envelope. Integration of the UT with acoustic subword modeling is a novel approach that concisely represents both steady-state and dynamic features of the spectra in a unified framework that very effectively captures acoustic-phonetic patterns. The organization of these two papers is as follows: the first paper, Part I presents the physiological and perceptual basis for the approach, the frame-based and acoustic-subword-based spectral representation, and applications to speaker-dependent recognition. The performance of the recognition algorithm based on this approach compares favorably to other existing techniques. Part II will present a frequency-domain coding technique by analysis/synthesis. This application of the new method produces good quality speech at low bit rates.
引用
收藏
页码:180 / 195
页数:16
相关论文
共 26 条