A sinusoidal voice over packet coder tailored for the frame-erasure channel

被引:19
作者
Lindblom, J [1 ]
机构
[1] Skype Technol, Stockholm, Sweden
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 05期
关键词
frame-erasure; Gaussian mixture model; harmonic analysis; packet loss concealment; packet switching; speech coding; variable-dimension; vector quantization; wide-band;
D O I
10.1109/TSA.2005.851913
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech coder tailored especially for the frame-erasure channel-the sinusoidal voice over packet coder (SVOPC)-is proposed. Based on a classified approach, avoiding interframe coding techniques, and synthesizing its output from slowly varying parameters, the coder is inherently robust to packet loss. SVOPC is based on quasi-harmonic modeling of the linear prediction (LP) residual. Both the sinusoidal amplitudes and phases are explicitly encoded using new methods based on Gaussian mixture models. A wide-band (16-kHz sampling frequency) implementation of the coder provides synthesized speech of good subjective quality at around 20 kbps. SVOPC is evaluated by means of subjective listening tests, and compared to a reference system based on G.722.2 (the AMR wide-band codec). Under frame erasure conditions (5%-30% frame erasures generated according to a Gilbert model), SVOPC clearly outperforms G.722.2.
引用
收藏
页码:787 / 798
页数:12
相关论文
共 40 条
  • [1] Andersen SV, 2002, 2002 IEEE SPEECH CODING WORKSHOP PROCEEDINGS, P23, DOI 10.1109/SCW.2002.1215711
  • [2] [Anonymous], 1996, Methods for Subjective Determination of Transmission Quality
  • [3] [Anonymous], 1990, The DARPA TIMIT acoustic-phonetic continuous speech corpus
  • [4] [Anonymous], P IEEE INF C NEW YOR
  • [5] BLAKE S, 1998, IETF RFC 2475
  • [6] Adaptive FEC-based error control for Internet telephony
    Bolot, JC
    Fosse-Parisis, S
    Towsley, D
    [J]. IEEE INFOCOM '99 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS: THE FUTURE IS NOW, 1999, : 1453 - 1460
  • [7] BRADEN R, 1997, IETF RFC 2205
  • [8] Conceiçao P, 2002, INT SER TECHNOL POLI, V2, P1
  • [9] DAS A, 1994, P DAT COMP C, P421
  • [10] Deller J.R., 1993, Discrete-time processing of speech signals