SPEECH SPECTRUM CONVERSION BASED ON SPEAKER INTERPOLATION AND MULTIFUNCTIONAL REPRESENTATION WITH WEIGHTING BY RADIAL BASIS FUNCTION NETWORKS

被引：53

作者：

IWAHASHI, N ^{[1
]}

SAGISAKA, Y ^{[1
]}

机构：

[1] ATR, INTERPRETING TELECOMMUN RES LABS, KYOTO 61902, JAPAN

来源：

SPEECH COMMUNICATION | 1995年 / 16卷 / 02期

关键词：

SPEECH SPECTRUM CONVERSION; SPEAKER ADAPTATION; VOICE CONVERSION; SPEAKER INTERPOLATION; MULTIPLE FUNCTIONAL REPRESENTATION; RADIAL BASIS FUNCTION;

D O I：

10.1016/0167-6393(94)00051-B

中图分类号：

O42 [声学];

学科分类号：

070206 [声学]; 082403 [水声工程];

摘要：

This paper describes a speech spectrum transformation method by interpolating multi-speakers' spectral patterns and multi-functional representation with Radial Basis Function networks. The interpolation is carried out using spectral parameters between pre-stored multiple speakers' utterance data to generate new spectrum patterns. Adaptation to a target speaker can be performed by this interpolation, which uses only a small amount of training data to generate new speech spectrum sequences close to those of the target speaker. Moreover, to obtain more precise adaptation by using a larger amount of training data, the transformation is represented by multiple interpolating functions. The multiple functions' outputs are weighted-summed, using weighting values given by RBF networks. The parameters of this multi-functional transformation are adapted by the gradient descent method. Adaptation experiments were carried out using four pre-stored speakers' data. Using only one word spoken by the target speaker for training, the distance between the target speaker's spectrum and the spectrum generated by the single interpolating function was reduced by about 35% compared with the distance between the target speaker's spectrum and the spectrum of the pre-stored speaker closest to the target. Using ten training words, the reduction rate increased to 48% by the multi-functional transformation.

引用

页码：139 / 151

页数：13

共 17 条

[1]

ABE M, 1991, P IEEE INT C AC SPEE, P765

[2]

ABE M, 1988, P ICASSP, P655

[3]

[Anonymous], 1988, 4148 ROYAL SIGN RAD

[4]

Atal B. S., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P81

[5]

VOICE CONVERSION [J].

CHILDERS, DG ;

WU, K ;

HICKS, DM ;

YEGNANARAYANA, B .

SPEECH COMMUNICATION, 1989, 8 (02) :147-158

[6]

HAKODA K, 1987, FALL P M AC SOC JAP, P213

[7]

IMAI S, 1980, IEICE 1980 JA, V63

[8]

Adaptive Mixtures of Local Experts [J].

Jacobs, Robert A. ;

Jordan, Michael I. ;

Nowlan, Steven J. ;

Hinton, Geoffrey E. .

NEURAL COMPUTATION, 1991, 3 (01) :79-87

[9]

SOFTWARE FOR A CASCADE-PARALLEL FORMANT SYNTHESIZER [J].

KLATT, DH .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 67 (03) :971-995

[10]

ANALYSIS, SYNTHESIS, AND PERCEPTION OF VOICE QUALITY VARIATIONS AMONG FEMALE AND MALE TALKERS [J].

KLATT, DH ;

KLATT, LC .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (02) :820-857

← 1 2 →