On the robustness of overall F0-only modifications to the perception of emotions in speech

被引:34
作者
Bulut, Murlaza [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Signal Anal & Interpretat Lab, Los Angeles, CA 90089 USA
关键词
D O I
10.1121/1.2909562
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is oil how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression. (C) 2008 Acoustical Society of America.
引用
收藏
页码:4547 / 4558
页数:12
相关论文
共 31 条
[1]  
[Anonymous], 1977, FACIAL ACTION CODING
[2]   The role of intonation in emotional expressions [J].
Bänziger, T ;
Scherer, KR .
SPEECH COMMUNICATION, 2005, 46 (3-4) :252-267
[3]  
Boersma Paul., 2007, PRAAT DOING PHONETIC
[4]   Evidence for attractors in English intonation [J].
Braun, Bettina ;
Kochanski, Greg ;
Grabe, Esther ;
Rosner, Burton S. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (06) :4006-4015
[5]  
BULUT M, 2002, INT C SPOK LANG PROC
[6]  
BULUT M, 2005, P EUR INT LISB PORT
[7]  
BURKHARDT F, 2000, ISCA WORKSH SPEECH E
[8]  
Cahn JE., 1990, J AM VOICE I O SOC, V8, P1
[9]   Modeling stylized invariance and local variability of prosody in text-to-speech synthesis [J].
Chu, Min ;
Zhao, Yong ;
Chang, Eric .
SPEECH COMMUNICATION, 2006, 48 (06) :716-726
[10]   Emotion recognition in human-computer interaction [J].
Cowie, R ;
Douglas-Cowie, E ;
Tsapatsoulis, N ;
Votsis, G ;
Kollias, S ;
Fellenz, W ;
Taylor, JG .
IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) :32-80