A sinusoidal voice over packet coder tailored for the frame-erasure channel

被引：19

作者：

Lindblom, J ^{[1
]}

机构：

[1] Skype Technol, Stockholm, Sweden

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 05期

关键词：

frame-erasure; Gaussian mixture model; harmonic analysis; packet loss concealment; packet switching; speech coding; variable-dimension; vector quantization; wide-band;

D O I：

10.1109/TSA.2005.851913

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A speech coder tailored especially for the frame-erasure channel-the sinusoidal voice over packet coder (SVOPC)-is proposed. Based on a classified approach, avoiding interframe coding techniques, and synthesizing its output from slowly varying parameters, the coder is inherently robust to packet loss. SVOPC is based on quasi-harmonic modeling of the linear prediction (LP) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models. A wide-band (16-kHz sampling frequency) implementation of the coder provides synthesized speech of good subjective quality at around 20 kbps. SVOPC is evaluated by means of subjective listening tests, and compared to a reference system based on G.722.2 (the AMR wide-band codec). Under frame erasure conditions (5%-30% frame erasures generated according to a Gilbert model), SVOPC clearly outperforms G.722.2.

引用

页码：787 / 798

页数：12

共 40 条

[1] Andersen SV, 2002, 2002 IEEE SPEECH CODING WORKSHOP PROCEEDINGS, P23, DOI 10.1109/SCW.2002.1215711
[2] [Anonymous], 1996, Methods for Subjective Determination of Transmission Quality
[3] [Anonymous], 1990, The DARPA TIMIT acoustic-phonetic continuous speech corpus
[4] [Anonymous], P IEEE INF C NEW YOR
[5] BLAKE S, 1998, IETF RFC 2475
[6] Adaptive FEC-based error control for Internet telephony
Bolot, JC
Fosse-Parisis, S
Towsley, D
[J]. IEEE INFOCOM '99 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS: THE FUTURE IS NOW, 1999, : 1453 - 1460
[7] BRADEN R, 1997, IETF RFC 2205
[8] Conceiçao P, 2002, INT SER TECHNOL POLI, V2, P1
[9] DAS A, 1994, P DAT COMP C, P421
[10] Deller J.R., 1993, Discrete-time processing of speech signals

← 1 2 3 4 →