Synthesizing Obama: Learning Lip Sync from Audio

被引：611

作者：

Suwajanakorn, Supasorn ^{[1
]}

Seitz, Steven M. ^{[1
]}

Kemelmacher-Shlizerman, Ira ^{[1
]}

机构：

[1] Univ Washington, Seattle, WA 98195 USA

来源：

ACM TRANSACTIONS ON GRAPHICS | 2017年 / 36卷 / 04期

关键词：

Audio; Face Synthesis; LSTM; RNN; Pig data. Videos; Audiovisual Speech; Uncanny Valley; Lip Sync; FACE; ANIMATION; VIDEO;

D O I：

10.1145/3072959.3073640

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.

引用

页数：13

共 56 条

[1] Expressive Visual Text-To-Speech Using Active Appearance Models [J].

Anderson, Robert ;

Stenger, Bjoern ;

Wan, Vincent ;

Cipolla, Roberto .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3382-3389

[2]

[Anonymous], FFMPEG NORMALIZE

[3]

[Anonymous], 2014, Generating sequences with recurrent neural networks

[4]

[Anonymous], 2015, ARXIV151205287

[5]

[Anonymous], AUDIO VISUAL SPEECH

[6]

[Anonymous], 2000, Opencv. Dr. Dobb's journal of software tools

[7]

[Anonymous], P COMP VIS PATT REC

[8]

[Anonymous], 2016, P 9 ISCA SPEECH SYNT

[9]

[Anonymous], ACM SIGGRAPH 2013 PO

[10]

[Anonymous], 2014, STYLE TRANSFER HEADS

← 1 2 3 4 5 6 →