Exploiting nonacoustic sensors for speech encodinog

被引:42
作者
Quatieri, TF [1 ]
Brady, K
Messing, D
Campbell, JP
Campbell, WM
Brandstein, MS
Weinstein, CJ
Tardelli, JD
Gatewood, PD
机构
[1] MIT, Lincoln Lab, Lexington, MA 02420 USA
[2] ARCON Corp, Waltham, MA USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 02期
关键词
intelligibility; low-rate coding; nonacoustic sensors;
D O I
10.1109/TSA.2005.855838
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract measurements that are relatively immune to acoustic disturbances and can supplement the acoustic speech waveform. We are currently investigating methods of combining the output of these sensors for use in low-rate encoding according to their capability in representing specific speech characteristics in different frequency bands. Nonacoustic sensors have the ability to reveal certain speech attributes lost in the noisy acoustic signal; for example, low-energy consonant voice bars, nasality, and glottalized excitation. By fusing nonacoustic low-frequency and pitch content with acoustic-microphone content, we have achieved significant intelligibility performance gains using the DRT across a variety of environments over the government standard 2400-bps MELPe coder. By fusing quantized high-band 4-to-8-kHz speech, requiring only an additional 116 bps, we obtain further DRT performance gains by exploiting the ear's insensitivity to fine spectral detail in this frequency region. Index Terms-Intelligibility, low-rate coding, nonacoustic sensors.
引用
收藏
页码:533 / 544
页数:12
相关论文
共 23 条
[1]  
Brady K, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P477
[2]  
BURNETT G, 1999, P 138 M AC SOC AM CO
[3]  
CAMPBELL WM, 2003, P WORKSH MULT US AUT, P215
[4]  
COHEN MF, 1965, J ACOUST SOC AM, V37
[5]   Speech articulator measurements casing low power EM-wave sensors [J].
Holzrichter, JF ;
Burnett, GC ;
Ng, LC ;
Lea, WA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1998, 103 (01) :622-625
[6]  
HOLZRICHTER JF, 2003, UCRLJRNL14775 LAWR L
[7]  
Kang G. S., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P89
[8]   A noise reduction preprocessor for mobile voice communication [J].
Martin, R ;
Malah, D ;
Cox, RV ;
Accardi, AJ .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (08) :1046-1058
[9]  
Martin R, 2000, INT CONF ACOUST SPEE, P1479, DOI 10.1109/ICASSP.2000.861909
[10]  
McCree A, 1996, INT CONF ACOUST SPEE, P200, DOI 10.1109/ICASSP.1996.540325