Expressive Visual Text-To-Speech Using Active Appearance Models

被引：40

作者：

Anderson, Robert ^{[1
]}

Stenger, Bjoern ^{[2
]}

Wan, Vincent ^{[2
]}

Cipolla, Roberto ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge, England

[2] Toshiba Res Europe, Cambridge, England

来源：

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年

关键词：

SYNTHETIC TALKING FACES;

D O I：

10.1109/CVPR.2013.434

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.

引用

页码：3382 / 3389

页数：8

共 29 条

[1] Albrecht I., 2005, J VIRTUAL REALITY, V8, P201, DOI DOI 10.1007/S10055-005-0153-5
[2] [Anonymous], SPEECH SYNTH WORKSH
[3] [Anonymous], 2010, SER ICMI MLMI 10, DOI DOI 10.1145/1891903.1891942
[4] [Anonymous], IEEE T AUDIO SPEECH
[5] [Anonymous], SIGGRAPH
[6] [Anonymous], INTERSPEECH
[7] [Anonymous], ACM TOG
[8] A morphable model for the synthesis of 3D faces
Blanz, V
Vetter, T
[J]. SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 187 - 194
[9] Brand M, 1999, COMP GRAPH, P21, DOI 10.1145/311535.311537
[10] Expressive speech-driven facial animation
Cao, Y
Tien, WC
Faloutsos, P
Pighin, F
[J]. ACM TRANSACTIONS ON GRAPHICS, 2005, 24 (04): : 1283 - 1302

← 1 2 3 →