INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora

被引:113
作者
Erro, Daniel [1 ]
Moreno, Asuncion [1 ]
Bonafonte, Antonio [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, ES-08034 Barcelona, Spain
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 05期
关键词
Frame alignment; Gaussian mixture model (GMM); nonparallel training corpus; text-independent cross-lingual voice conversion;
D O I
10.1109/TASL.2009.2038669
中图分类号
O42 [声学];
学科分类号
070206 [声学];
摘要
Most existing voice conversion systems, particularly those based on Gaussian mixture models, require a set of paired acoustic vectors from the source and target speakers to learn their corresponding transformation function. The alignment of phonetically equivalent source and target vectors is not problematic when the training corpus is parallel, which means that both speakers utter the same training sentences. However, in some practical situations, such as cross-lingual voice conversion, it is not possible to obtain such parallel utterances. With an aim towards increasing the versatility of current voice conversion systems, this paper proposes a new iterative alignment method that allows pairing phonetically equivalent acoustic vectors from nonparallel utterances from different speakers, even under cross-lingual conditions. This method is based on existing voice conversion techniques, and it does not require any phonetic or linguistic information. Subjective evaluation experiments show that the performance of the resulting voice conversion system is very similar to that of an equivalent system trained on a parallel corpus.
引用
收藏
页码:944 / 953
页数:10
相关论文
共 21 条
[1]
ABE M, 1988, P ICASSP, P655
[2]
Speaker Transformation Algorithm using Segmental Codebooks (STASC) [J].
Arslan, LM .
SPEECH COMMUNICATION, 1999, 28 (03) :211-226
[3]
Bonafonte A., 2006, P INT C LANG RES EV, P311
[4]
Duxans H., 2006, TC STAR WORKSH SPEEC
[5]
ENNAJJARY T, 2005, THESIS U RENNES 1 RE
[6]
Erro D., 2007, Proc. Interspeech, P1965
[7]
Erro D., 2008, THESIS U POLITECNICA
[8]
Erro D., 2007, SSW, P194
[9]
Kain A., 2001, THESIS OGI SCH SCI E
[10]
Lee C.-H., 2006, P INT C SPOK LANG PR, P2446