TRANSFORMATION OF FORMANTS FOR VOICE CONVERSION USING ARTIFICIAL NEURAL NETWORKS

被引:186
作者
NARENDRANATH, M [1 ]
MURTHY, HA [1 ]
RAJENDRAN, S [1 ]
YEGNANARAYANA, B [1 ]
机构
[1] INDIAN INST TECHNOL, DEPT COMP SCI & ENGN, MADRAS 600036, TAMIL NADU, INDIA
关键词
VOICE CONVERSION; SPEAKER CHARACTERISTICS; FORMANTS; MULTILAYER FEEDFORWARD NEURAL NETWORK;
D O I
10.1016/0167-6393(94)00058-I
中图分类号
O42 [声学];
学科分类号
070206 [声学]; 082403 [水声工程];
摘要
In this paper we propose a scheme for developing a voice conversion system that converts the speech signal uttered by a source speaker to a speech signal having the voice characteristics of the target speaker. In particular, we address the issue of transformation of the vocal tract system features from one speaker to another. Formants are used to represent the vocal tract system features and a formant vocoder is used for synthesis. The scheme consists of a formant analysis phase, followed by a learning phase in which the implicit formant transformation is captured by a neural network. The transformed formants together with the pitch contour modified to suit the average pitch of the target speaker are used to synthesize speech with the desired vocal tract system characteristics.
引用
收藏
页码:207 / 216
页数:10
相关论文
共 18 条
[1]
Abe M., 1988, ICASSP 88: 1988 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.88CH2561-9), P655, DOI 10.1109/ICASSP.1988.196671
[2]
ABE M, 1991, INT CONF ACOUST SPEE, P765, DOI 10.1109/ICASSP.1991.150451
[3]
SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF SPEECH WAVE [J].
ATAL, BS ;
HANAUER, SL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (02) :637-+
[4]
VOICE CONVERSION [J].
CHILDERS, DG ;
WU, K ;
HICKS, DM ;
YEGNANARAYANA, B .
SPEECH COMMUNICATION, 1989, 8 (02) :147-158
[5]
VOCAL QUALITY FACTORS - ANALYSIS, SYNTHESIS, AND PERCEPTION [J].
CHILDERS, DG ;
LEE, CK .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 90 (05) :2394-2410
[6]
CHILDERS DG, 1985, P IEEE INT C ACOUST
[7]
CHILDERS DG, 1987, IEEE T ACOUST SPEECH, P293
[8]
PROSODIC AND SEGMENTAL SPEAKER VARIATIONS [J].
FANT, G ;
KRUCKENBERG, A ;
NORD, L .
SPEECH COMMUNICATION, 1991, 10 (5-6) :521-531
[9]
GLOTTAL FLOW - MODELS AND INTERACTION [J].
FANT, G .
JOURNAL OF PHONETICS, 1986, 14 (3-4) :393-399
[10]
MULTILAYER FEEDFORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS [J].
HORNIK, K ;
STINCHCOMBE, M ;
WHITE, H .
NEURAL NETWORKS, 1989, 2 (05) :359-366