Expressive Visual Text-To-Speech Using Active Appearance Models

被引:40
作者
Anderson, Robert [1 ]
Stenger, Bjoern [2 ]
Wan, Vincent [2 ]
Cipolla, Roberto [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Toshiba Res Europe, Cambridge, England
来源
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年
关键词
SYNTHETIC TALKING FACES;
D O I
10.1109/CVPR.2013.434
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
引用
收藏
页码:3382 / 3389
页数:8
相关论文
共 29 条
  • [21] User evaluation: Synthetic talking faces for interactive services
    Pandzic, IS
    Ostermann, J
    Millen, D
    [J]. VISUAL COMPUTER, 1999, 15 (7-8) : 330 - 340
  • [22] Sifakis E., 2006, ACM SIGGRAPHEUROGRAP, P261
  • [23] Taylor S.L., 2012, SPECIAL INTEREST GRO, P275
  • [24] Interactive Region-Based Linear 3D Face Models
    Tena, J. Rafael
    De la Torre, Fernando
    Matthews, Iain
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2011, 30 (04):
  • [25] Near-videorealistic synthetic talking faces: implementation and evaluation
    Theobald, BJ
    Bangham, JA
    Matthews, IA
    Cawley, GC
    [J]. SPEECH COMMUNICATION, 2004, 44 (1-4) : 127 - 140
  • [26] Wampler K, 2007, SYMPOSIUM ON COMPUTER ANIMATION 2007: ACM SIGGRAPH/ EUROGRAPHICS SYMPOSIUM PROCEEDINGS, P53
  • [27] Wang L., 2011, INTERSPEECH, P3307
  • [28] Waters K., 1995, Multimedia Tools and Applications, V1, P349, DOI 10.1007/BF01215883
  • [29] Statistical parametric speech synthesis
    Zen, Heiga
    Tokuda, Keiichi
    Black, Alan W.
    [J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1039 - 1064