Modeling stylized invariance and local variability of prosody in text-to-speech synthesis

被引:11
作者
Chu, Min [1 ]
Zhao, Yong [1 ]
Chang, Eric [1 ]
机构
[1] Microsoft Res Asia, Beijing Sigma Ctr, Beijing 100080, Peoples R China
关键词
prosody; stylized invariance; local variability; soft prediction; unit selection; text-to-speech;
D O I
10.1016/j.specom.2005.10.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates the stylized invariance and local variability of prosody patterns by using a speech database containing two repetitions of 1000 sentences. The two repetitions (separated by a time span of 6 months) were recorded by a single professional speaker, who was instructed to read these sentences in the same reading style. It was observed statistically that the two repetitions have fairly wide variations in prosodic features and the variations can be up to 50% of the full dynamic range of the speaker. This shows the inadequacy of traditional prosody models that focus on capturing the universal invariance of prosody as precise as possible. In this paper, we propose to model prosody by capturing its stylized invariance and retaining local variability with a soft prediction strategy, which predicts an acceptable region rather than a single fixed point in the multi-dimensioned prosody space. A prosodic-constrained unit selection algorithm is devised under the soft prediction strategy. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:716 / 726
页数:11
相关论文
共 33 条
[1]  
[Anonymous], 1995, P 13 INT C PHON SCI
[2]  
[Anonymous], 2000, HTK BOOK HTK VERSION
[3]  
Beckman M., 1997, GUIDELINES TOBI LABE
[4]  
Carlson R., 1975, STRUCTURE PROCESS SP, P90
[5]   An RNN-based prosodic information synthesizer for Mandarin text-to-speech [J].
Chen, SH ;
Hwang, SH ;
Wang, YR .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03) :226-239
[6]  
CHU M, 2001, P 4 ISCA WORKSH SPEE
[7]  
CHU M, 2001, P ICASSP 01 SALT LAK
[8]  
Chu M., 2001, Computational Linguistics and Chinese Language Processing, V6, P61
[9]  
CHU M, 2003, P ICASSP 03 HONG KON
[10]  
DOGIL G, 2001, P EUR 2001 COP