Matching pursuits sinusoidal speech coding

被引:7
作者
Etemoglu, ÇÖ [1 ]
Cuperman, V [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2003年 / 11卷 / 05期
关键词
matching pursuits; sinusoidal speech coding;
D O I
10.1109/TSA.2003.815520
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.
引用
收藏
页码:413 / 424
页数:12
相关论文
共 13 条
[1]  
Etemoglu ÇÖ, 2000, INT CONF ACOUST SPEE, P1371, DOI 10.1109/ICASSP.2000.861834
[2]  
ETEMOGLU CO, 2001, THESIS U CALIFORNIA
[3]   PROJECTION PURSUIT REGRESSION [J].
FRIEDMAN, JH ;
STUETZLE, W .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (376) :817-823
[4]   Speech analysis synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model [J].
George, EB ;
Smith, MJT .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (05) :389-406
[5]   Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition [J].
Ghitza, Oded .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (01) :115-132
[6]  
LUPINI P, 1995, THESIS S FRASER U BU
[7]   MATCHING PURSUITS WITH TIME-FREQUENCY DICTIONARIES [J].
MALLAT, SG ;
ZHANG, ZF .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (12) :3397-3415
[8]  
Markel J. D., 1976, LINEAR PREDICTION SP
[9]   SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION [J].
MCAULAY, RJ ;
QUATIERI, TF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04) :744-754
[10]  
MCAULAY RJ, 1995, SPEECH CODING SYNTHE, pCH4