Voice Conversion Based on Weighted Frequency Warping

被引:135
作者
Erro, Daniel [1 ]
Moreno, Asuncion [1 ]
Bonafonte, Antonio [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, ES-08034 Barcelona, Spain
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 05期
关键词
Gaussian mixture models (GMMs); harmonic plus stochastic model (HSM); speech synthesis; voice conversion; weighted frequency warping;
D O I
10.1109/TASL.2009.2038663
中图分类号
O42 [声学];
学科分类号
070206 [声学];
摘要
Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.
引用
收藏
页码:922 / 931
页数:10
相关论文
共 35 条
[1]
ABE M, 1988, P ICASSP, P655
[2]
ARSLAN LM, 1999, SPEECH COMMUN
[3]
BONAFONTE A, 2007, P 6 ISCA WORKSH SPEE
[4]
BONAFONTE A, 2006, P INT C LANG RES EV
[5]
DELPOZO A, 2008, P INT, P1457
[6]
Duxans H., 2006, P TC STAR WORKSH SPE
[7]
DISCRETE ALL-POLE MODELING [J].
ELJAROUDI, A ;
MAKHOUL, J .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (02) :411-423
[8]
ERRO D, 2007, P 6 ISCA WORKSH SPEE
[9]
Erro D., 2007, Proc. Interspeech, P1965
[10]
Erro D., 2008, THESIS U POLITECNICA