Close shadowing natural versus synthetic speech

被引：26

作者：

G. Bailly

机构：

[1] UMR CNRS no 5009, Inst. de la Commun. Parlée, Univ. Stendhal, 38031 Grenoble Cedex, 46, av. Félix Viallet

来源：

International Journal of Speech Technology | 2003年 / 6卷 / 1期

关键词：

Close shadowing; Evaluation; Prosody; Text-to-speech synthesis;

D O I：

10.1023/A:1021091720511

中图分类号：

学科分类号：

摘要：

Close shadowing experiments involving natural and synthetic stimuli are described. Preliminary results show that speakers are able to follow natural stimuli with an average delay of 70 ms whereas this delay typically exceeds 100 ms for stimuli produced by text-to-speech systems. A complementary experiment shows that this contrast is mainly due to the inappropriate or impoverished prosody generated by actual text-to-speech systems.

引用

页码：11 / 19

页数：8

共 29 条

[1]

Auberge V., Grepillat T., Rilliard A., Can we perceive attitudes before the end of sentences? The gating paradigm for prosodic contours, Proceedings of the European Conference on Speech Communication and Technology, pp. 871-874, (1997)

[2]

Auxiette C., Gerard C., Perceptual and motor determinants in the synchronization of music and speech, Fourth International Workshop on Rhythm Perception and Production, pp. 59-64, (1992)

[3]

Bailly G., Barbe T., Wang H., Automatic labelling of large prosodic databases: Tools, methodology and links with a text-to-speech system, ETRW Workshop on Speech Synthesis, pp. 201-204, (1990)

[4]

Boersma P., Weenink D., Praat, a System for Doing Phonetics by Computer, Version 3.4, (1996)

[5]

Carey P.W., Verbal retention after shadowing and after listening, Perception and Psychopysics, 9, pp. 79-83, (1971)

[6]

Charpentier F., Moulines E., Pitch-synchronous waveform processing techniques for text-to-speech using diphones, Speech Communication, 9, 5-6, pp. 453-467, (1990)

[7]

Chistovich L.A., Aliakrinskii V.V., Abulian V.A., Time delays in speech repetition, Voprosy Psikhologii, 1, pp. 114-119, (1960)

[8]

Dumay N., Radeau M., Rime and syllabic effects in phonological priming between French spoken words, Proceedings of the European Conference on Speech Communication and Technology, pp. 2191-2194, (1997)

[9]

Dutoit T., Pagel V., Pierret N., Bataille F., Vrecken O.V.D., The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes, Proceedings of the International Conference on Speech and Language Processing, pp. 1393-1396, (1996)

[10]

Eriksson A., Wretling P., How flexible is the human voice? A case study of mimicry, Proceedings of the European Conference on Speech Communication and Technology, pp. 1043-1046, (1997)

← 1 2 3 →