Hybrid coding: Combined harmonic and waveform coding of speech at 4 kb/s

被引:9
作者
Shlomot, E [1 ]
Cuperman, V [1 ]
Gersho, A [1 ]
机构
[1] Mindspeed Technol Conxant Syst, Newport Beach, CA 92660 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2001年 / 9卷 / 06期
基金
美国国家科学基金会;
关键词
harmonic spectral quantization; hybrid coding of speech; low bit-rate speech coding; sinusoidal speech coding;
D O I
10.1109/89.943341
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new hybrid speech coding technique is presented in this paper, which combines a frequency-domain parametric coder (for stationary voiced and stationary unvoiced speech) with a time-domain waveform coder (for transition speech). Our hybrid coder uses a parametric representation for the excitation of a linear-prediction filter. The excitation of stationary voiced speech is a sum of harmonic cosines with interpolated magnitudes and a synthetic phase model, the excitation for stationary unvoiced speech is a spectrally shaped noise, and the excitation for transition speech is a set of signed pulses. Signal alignment when switching between the harmonic excitation of stationary voiced speech and the pulse model used for transition speech is required, and achieved by special alignment procedures. A 4 kb/s hybrid coder, which achieves high-quality reconstructed speech, is described in this paper. The 4 kb/s hybrid coder employs a neural network classifier, and a novel pitch detection and harmonic bandwidth estimation algorithm. The locations of excitation pulses for coding transitions are determined by analysis-by-synthesis. A simple and efficient dimension conversion and quantization of the harmonic spectral magnitudes of voiced speech was devised, combining the general nonsquare transform (NST) or dimension conversion and a weighted vector quantization (VQ) approach. Subjective listening tests demonstrate that the 4 kb/s hybrid coding scheme competes favorably with CELP coders at low bit-rates.(1).
引用
收藏
页码:632 / 646
页数:15
相关论文
共 38 条
[1]   NONSTATIONARY SPECTRAL MODELING OF VOICED SPEECH [J].
ALMEIDA, LB ;
TRIBOLET, JM .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1983, 31 (03) :664-678
[2]  
[Anonymous], 1994, NEURAL NETWORKS
[3]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[4]   CONSTRAINED-STORAGE QUANTIZATION OF MULTIPLE VECTOR SOURCES BY CODEBOOK SHARING [J].
CHAN, WY ;
GERSHO, A .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1991, 39 (01) :11-13
[5]  
CUPERMAN V, P IEEE ICASSP 95, P496
[6]   Variable-dimension vector quantization [J].
Das, A ;
Rao, AV ;
Gersho, A .
IEEE SIGNAL PROCESSING LETTERS, 1996, 3 (07) :200-202
[7]  
DAS A, P IEEE ICASSP, P863
[8]  
DAS A, 1994, P DAT COMP C, P421
[9]  
DAS A, 1995, SPEECH CODING SYNTHE
[10]  
*DIG VOIC SYST INC, 1991, IMM M VOIC COD SPEC