Hybrid coding: Combined harmonic and waveform coding of speech at 4 kb/s

被引:9
作者
Shlomot, E [1 ]
Cuperman, V [1 ]
Gersho, A [1 ]
机构
[1] Mindspeed Technol Conxant Syst, Newport Beach, CA 92660 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 06期
基金
美国国家科学基金会;
关键词
harmonic spectral quantization; hybrid coding of speech; low bit-rate speech coding; sinusoidal speech coding;
D O I
10.1109/89.943341
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new hybrid speech coding technique is presented in this paper, which combines a frequency-domain parametric coder (for stationary voiced and stationary unvoiced speech) with a time-domain waveform coder (for transition speech). Our hybrid coder uses a parametric representation for the excitation of a linear-prediction filter. The excitation of stationary voiced speech is a sum of harmonic cosines with interpolated magnitudes and a synthetic phase model, the excitation for stationary unvoiced speech is a spectrally shaped noise, and the excitation for transition speech is a set of signed pulses. Signal alignment when switching between the harmonic excitation of stationary voiced speech and the pulse model used for transition speech is required, and achieved by special alignment procedures. A 4 kb/s hybrid coder, which achieves high-quality reconstructed speech, is described in this paper. The 4 kb/s hybrid coder employs a neural network classifier, and a novel pitch detection and harmonic bandwidth estimation algorithm. The locations of excitation pulses for coding transitions are determined by analysis-by-synthesis. A simple and efficient dimension conversion and quantization of the harmonic spectral magnitudes of voiced speech was devised, combining the general nonsquare transform (NST) or dimension conversion and a weighted vector quantization (VQ) approach. Subjective listening tests demonstrate that the 4 kb/s hybrid coding scheme competes favorably with CELP coders at low bit-rates.(1).
引用
收藏
页码:632 / 646
页数:15
相关论文
共 38 条
[11]   MULTIBAND EXCITATION VOCODER [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (08) :1223-1235
[12]  
GRIFFIN DW, P IEEE ICASSP 85
[13]  
HEDELIN P, P IEEE ICASSP 81, P205
[14]  
*ITU T TEL STAND S, 1995, DUAL RAT SPEECH COD
[15]  
*ITU T TEL STAND S, 1998, DRAFT DESCR ANN REC
[16]   Encoding Speech Using Prototype Waveforms [J].
Kleijn, W. Bastiaan .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04) :386-399
[17]   Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4 kb/s Speech Coding [J].
LeBlanc, W. P. ;
Bhattacharya, B. ;
Mahmoud, S. A. ;
Cuperman, V. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04) :373-385
[18]  
LI C, 2001, IEEE T SPEECH AUDI P, P622
[19]  
LUPINI P, 1995, P IEEE SPEECH COD WO, P87
[20]   TIME-DOMAIN ALGORITHMS FOR HARMONIC BANDWIDTH REDUCTION AND TIME SCALING OF SPEECH SIGNALS [J].
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :121-133