Speech recognition with amplitude and frequency modulations

被引:295
作者
Zeng, FG [1 ]
Nie, K
Stickney, GS
Kong, YY
Vongphoe, M
Bhargave, A
Weit, CG
Cao, K
机构
[1] Univ Calif Irvine, Dept Anat & Neurobiol, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Dept Biomed Engn, Irvine, CA 92697 USA
[3] Univ Calif Irvine, Dept Cognit Sci, Irvine, CA 92697 USA
[4] Univ Calif Irvine, Dept Otolaryngol Head & Neck Surg, Irvine, CA 92697 USA
[5] Beijing Union Med Coll Hosp, Dept Otolaryngol Head & Neck Surg, Beijing 100730, Peoples R China
关键词
auditory analysis; cochlear implant; neural code; phase; scene analysis;
D O I
10.1073/pnas.0406460102
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited number of spectral bands may be sufficient for speech recognition in quiet, FM significantly enhances speech recognition in noise, as well as speaker and tone recognition. Additional speech reception threshold measures revealed that FM is particularly critical for speech recognition with a competing voice and is independent of spectral resolution and similarity. These results suggest that AM and FM provide independent yet complementary contributions to support robust speech recognition under realistic listening situations. Encoding FM may improve auditory scene analysis, cochlear-implant, and audiocoding performance.
引用
收藏
页码:2293 / 2298
页数:6
相关论文
共 47 条
[1]  
ALEXANDROS P, 1999, SPEECH COMMUN, V28, P195
[2]   The effects of hearing loss and noise masking on the masking release for speech in temporally complex backgrounds [J].
Bacon, SP ;
Opie, JM ;
Montoya, DY .
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 1998, 41 (03) :549-563
[3]   EVALUATION OF SPEECH PRODUCTION OF THE HEARING-IMPAIRED - SOME BENEFITS OF FORCED-CHOICE TESTING [J].
BOOTHROYD, A .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1985, 28 (02) :185-196
[4]  
Bregman A. S., 1994, AUDITORY SCENE ANAL
[5]   THE PERCEPTUAL SEGREGATION OF SIMULTANEOUS VOWELS WITH HARMONIC, SHIFTED, OR RANDOM COMPONENTS [J].
CHALIKIA, MH ;
BREGMAN, AS .
PERCEPTION & PSYCHOPHYSICS, 1993, 53 (02) :125-133
[6]   Frequency modulation detection in cochlear implant subjects [J].
Chen, HB ;
Zeng, FG .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (04) :2269-2277
[7]   THE INTERCONVERSION OF AUDIBLE AND VISIBLE PATTERNS AS A BASIS FOR RESEARCH IN THE PERCEPTION OF SPEECH [J].
COOPER, FS ;
LIBERMAN, AM ;
BORST, JM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1951, 37 (05) :318-325
[8]   PERCEPTUAL SEPARATION OF SIMULTANEOUS VOWELS - WITHIN AND ACROSS-FORMANT GROUPING BY FO [J].
CULLING, JF ;
DARWIN, CJ .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 93 (06) :3454-3467
[9]   THE ROLE OF FREQUENCY-MODULATION IN THE PERCEPTUAL SEGREGATION OF CONCURRENT VOWELS [J].
CULLING, JF ;
SUMMERFIELD, Q .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 98 (02) :837-846
[10]   Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs [J].
Dorman, MF ;
Loizou, PC ;
Rainey, D .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (04) :2403-2411