DIGITAL ANALYSIS OF LARYNGEAL CONTROL IN SPEECH PRODUCTION

被引:9
作者
FLANAGAN, JL
RABINER, LR
CHRISTOPHER, D
BOCK, DE
SHIPP, T
机构
[1] BELL TEL LABS INC, ACOUSTICS RES DEPT, MURRAY HILL, NJ 07974 USA
[2] VET ADM HOSP, SPEECH RES LAB, SAN FRANCISCO, CA 94121 USA
关键词
D O I
10.1121/1.381102
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Physiological measurements were made directly on human talkers to determine several dynamic laryngeal functions. The functions were control variables in a speech synthesizer which utilized acoustic models of the vocal cords and vocal tract. The functions were measured simultaneously and recorded on multichannel FM tape. They were the time variation of vocal-cord (glottal) opening (Ag); the electromyographic (EMG) potentials of 3 laryngeal muscles, posterior crico-arytenoid (PCA), interarytenoid (IA) and cricothyroid (CT); the subglottal air pressure (Ps); the speech output sound pressure waveform (P); and timing pulses from a digital clock. Preliminary data for 10 utterances by a man were digitized by a multiplexed A/D converter on a DDP-516 computer, and the results were stored in disk file for analysis. The bandwidth of the multitrack FM playback was 2800 Hz. Each function was sampled at 6250 sec-1 and quantized to 16 bits. Digital filtering was applied to remove DC offsets and enhance information features. The acoustic functions (Ag, Ps and P) were submitted to programmed pitch analysis. The results showed how voice periodicity can be manifested differently at the glottal and sound-output levels. A typical instance was vocal-cord vibration throughout the occluded phase of a voiced stop consonant. The EMG functions were analyzed by computing short-time energy. The results were correlated with voicing onset/offset and with voice pitch. PCA energy was correlated with voicing offset, and anticipatory to it by about 20-30 ms. IA energy was correlated with voicing onset and anticipatory to it by about 40-50 ms. CT energy was nearly directly correlated with the frequency contour for voice pitch. Direct utilization of these physiological parameters for speech synthesis was suggested.
引用
收藏
页码:446 / 455
页数:10
相关论文
共 8 条
[1]  
CHENG MJ, 1975, THESIS MIT
[2]   AUTOMATIC SYNTHESIS FROM ORDINARY ENGLISH TEXT [J].
COKER, CH ;
UMEDA, N ;
BROWMAN, CP .
IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1973, AU21 (03) :293-298
[3]  
COKER CH, UNPUBLISHED
[4]  
DUBNOWSKI JJ, 1976, IEEE T ACOUST SPEECH, V24
[5]  
Fink B.R., 1975, The human larynx: A functional study
[6]   SYNTHESIS OF SPEECH FROM A DYNAMIC MODEL OF VOCAL CORDS AND VOCAL-TRACT [J].
FLANAGAN, JL ;
ISHIZAKA, K ;
SHIPLEY, KL .
BELL SYSTEM TECHNICAL JOURNAL, 1975, 54 (03) :485-506
[7]   USE OF LARYNGEAL MEASUREMENTS AS CONTROL PARAMETERS IN A DYNAMIC SPEECH SYNTHESIZER [J].
LOFGREN, KMJ .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 :S84-S84
[8]   LARYNGEAL DYNAMICS ASSOCIATED WITH VOICE FREQUENCY CHANGE [J].
SHIPP, T ;
MCGLONE, RE .
JOURNAL OF SPEECH AND HEARING RESEARCH, 1971, 14 (04) :761-&