An Exemplar-Based Approach to Frequency Warping for Voice Conversion

被引:43
作者
Tian, Xiaohai [1 ]
Lee, Siu Wa [1 ]
Wu, Zhizheng [1 ]
Chng, Eng Siong [1 ]
Li, Haizhou [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Exemplar; frequency warping; residual compensation; sparse representation; voice conversion; SPARSE REPRESENTATION;
D O I
10.1109/TASLP.2017.2723721
中图分类号
O42 [声学];
学科分类号
070206 [声学];
摘要
The voice conversion's task is to modify a source speaker's voice to sound like that of a target speaker. A conversion method is considered successful when the produced speech sounds natural and similar to the target speaker. This paper presents a new voice conversion framework in which we combine frequency warping and exemplar-based method for voice conversion. Our method maintains high-resolution details during conversion by directly applying frequency warping on the high-resolution spectrum to represent the target. The warping function is generated by a sparse interpolation from a dictionary of exemplar warping functions. As the generated warping function is dependent only on a very small set of exemplars, we do away with the statistical averaging effects inherited from Gaussian mixture models. To compensate for the conversion error, we also apply residual exemplars into the conversion process. Both objective and subjective evaluations on the VOICES database validated the effectiveness of the proposed voice conversion framework. We observed a significant improvement in speech quality over the state-of-the-art parametric methods.
引用
收藏
页码:1863 / 1876
页数:14
相关论文
共 52 条
[1]
Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization [J].
Aihara, Ryo ;
Fujii, Takao ;
Nakashika, Toru ;
Takiguchi, Tetsuya ;
Ariki, Yasuo .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, :1-9
[2]
[Anonymous], P INTERSPEECH
[3]
[Anonymous], 2014, INT SPEECH COMMUNICA
[4]
Benisty H., 2011, INTERSPEECH, P669
[5]
Chappell DT, 1998, INT CONF ACOUST SPEE, P885, DOI 10.1109/ICASSP.1998.675407
[6]
Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training [J].
Chen, Ling-Hui ;
Ling, Zhen-Hua ;
Liu, Li-Juan ;
Dai, Li-Rong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1859-1872
[7]
VOICE CONVERSION USING ARTIFICIAL NEURAL NETWORKS [J].
Desai, Srinivas ;
Raghavendra, E. Veera ;
Yegnanarayana, B. ;
Black, Alan W. ;
Prahallad, Kishore .
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :3893-+
[8]
Eichner M, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P17
[9]
Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling [J].
Erro, Daniel ;
Navas, Eva ;
Hernaez, Inma .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03) :556-566
[10]
Voice Conversion Based on Weighted Frequency Warping [J].
Erro, Daniel ;
Moreno, Asuncion ;
Bonafonte, Antonio .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05) :922-931