Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization

被引:5
作者
Aihara, Ryo [1 ]
Fujii, Takao [1 ]
Nakashika, Toru [2 ]
Takiguchi, Tetsuya [1 ]
Ariki, Yasuo [1 ]
机构
[1] Kobe Univ, Grad Sch Syst Informat, Nada Ku, Kobe, Hyogo 657, Japan
[2] Univ Electrocommun, Grad Sch Informat Syst, Chofu, Tokyo 182, Japan
关键词
Voice conversion; Speech synthesis; Speaker adaptation; Noise robustness; Small parallel corpus; SPARSE REPRESENTATION;
D O I
10.1186/s13636-015-0075-4
中图分类号
O42 [声学];
学科分类号
070206 [声学];
摘要
The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number of parallel exemplars. In our previous work, a VC technique using non-negative matrix factorization (NMF) for noisy environments was proposed. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speakers) for dictionary construction. In the framework of conventional Gaussian mixture model (GMM)-based VC, some approaches that do not need parallel exemplars have been proposed. However, in the framework of exemplar-based VC for noisy environments, such a method has never been proposed. In this paper, an adaptation matrix in an NMF framework is introduced to adapt the source dictionary to the target dictionary. This adaptation matrix is estimated using only a small parallel speech corpus. We refer to this method as affine NMF, and the effectiveness of this method has been confirmed by comparing its effectiveness with that of a conventional NMF-based method and a GMM-based method in noisy environments.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 26 条
[1]
Aihara Ryo, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P7894, DOI 10.1109/ICASSP.2014.6855137
[2]
AIHARA R, 2015, P ICASSP, P4899
[3]
Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization [J].
Aihara, Ryo ;
Takashima, Ryoichi ;
Takiguchi, Tetsuya ;
Ariki, Yasuo .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06) :1411-1418
[4]
A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary [J].
Aihara, Ryo ;
Takashima, Ryoichi ;
Takiguchi, Tetsuya ;
Ariki, Yasuo .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[5]
[Anonymous], 2011, P INTERSPEECH
[6]
[Anonymous], 2011, INTERSPEECH
[7]
En-Najjary T., 2004, P ICSLP, P199
[8]
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition [J].
Gemmeke, Jort F. ;
Virtanen, Tuomas ;
Hurmalainen, Antti .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07) :2067-2080
[9]
Grais E. M., 2011, P INTERSPEECH, P569
[10]
Voice Conversion Using Partial Least Squares Regression [J].
Helander, Elina ;
Virtanen, Tuomas ;
Nurminen, Jani ;
Gabbouj, Moncef .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05) :912-921