AN OBJECTIVE-MEASURE FOR PREDICTING SUBJECTIVE QUALITY OF SPEECH CODERS

被引：167

作者：

WANG, SH ^{[1
]}

SEKEY, A ^{[1
]}

GERSHO, A ^{[1
]}

机构：

[1] UNIV CALIF SANTA BARBARA,DEPT ELECT & COMP ENGN,SANTA BARBARA,CA 93106

来源：

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS | 1992年 / 10卷 / 05期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/49.138987

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A perceptually-motivated objective measure for evaluating speech quality is presented. The measure, computed from the original and coded versions of an utterance, exhibits statistically a monotonic relationship with the mean opinion score (MOS), a widely used criterion for speech coder assessment. For each 10 ms segment of an utterance, a weighted spectral vector is computed via 15 critical band filters for telephone bandwidth speech. The overall distortion, called Bark spectral distortion (BSD), is the average squared Euclidean distance between spectral vectors of the original and coded utterances. The BSD takes into account auditory frequency warping, critical-band integration, amplitude sensitivity variations with frequency, and subjective loudness. The effectiveness of the measure was validated by a regression analysis between the computed BSD values and actual MOS values obtained from a speech data set. In tests with speech distorted by a modulated noise reference unit (MNRU) and with speech coded at rates of 2.4-64 kb/s, a monotonic function of the BSD was found which predicted MOS ratings notably better than segmental SNR or cepstral distance. The standard error in estimating MOS scores with the new measure was 0.2-0.3, with the higher accuracy for low rate coders in the range of 2.4-8 kb/s. The measure offers a more consistent assessment of the effect of incremental changes in the parameter of a speech coder than is usually obtained by the designer who relies on his/her own informal listening. Preliminary results also indicate that the measure may be effective for the excitation search in analysis-by-synthesis coders.

引用

页码：819 / 829

页数：11

共 24 条

[1]

BEERY Y, 1990, ADV SPEECH CODING

[2] MODELING THE JUDGMENT OF VOWEL QUALITY DIFFERENCES [J].

BLADON, RAW ;

LINDBLOM, B .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1981, 69 (05) :1414-1422

[3]

COETZEE HJ, 1989, MAY P IEEE INT C AC, P596

[4]

DIRKS D, 1972, J SPEECH HEARING DIS, V37

[5] Relation between loudness and masking [J].

Fletcher, H ;

Munson, WA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1937, 9 (01) :1-10

[6]

FOURCIN A, 1977, RECOGNITION COMPLEX

[7] SUBJECTIVE QUALITY OF THE SAME SPEECH TRANSMISSION CONDITIONS IN 7 DIFFERENT COUNTRIES [J].

GOODMAN, DJ ;

NASH, RD .

IEEE TRANSACTIONS ON COMMUNICATIONS, 1982, 30 (04) :642-654

[8] PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].

HERMANSKY, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752

[9] OBJECTIVE QUALITY EVALUATION FOR LOW-BIT-RATE SPEECH CODING SYSTEMS [J].

KITAWAKI, N ;

NAGABUCHI, H ;

ITOH, K .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1988, 6 (02) :242-248

[10]

Klatt D. H., 1982, IEEE ICASSP, P1278

← 1 2 3 →