语音合成及伪造、鉴伪技术综述

被引:20
作者
杨帅
乔凯
陈健
王林元
闫镔
机构
[1] 中国人民解放军战略支援部队信息工程大学
关键词
语音伪造; 神经网络; 频谱转换; 检测技术; 语音合成;
D O I
10.15888/j.cnki.csa.008641
中图分类号
TN912.33 [语音合成];
学科分类号
081002 [信号与信息处理];
摘要
近年来随着移动智能设备的兴起,人们越来越频繁的接触和使用语音信息,语音伪造和鉴伪成为语音处理领域中愈加重要的技术.本文首先梳理了语音合成系统的一般流程,并对语音伪造领域中主要的文本到语音(textto-speech, TTS)和语音转换(voice conversion, VC)两项技术进行系统归纳;接着,对语音鉴伪技术中常见的算法进行介绍和分类;最后,针对语音伪造和鉴伪目前存在的问题,本文从数据、模型、训练方法以及应用场景等多个角度出发提出未来可能的发展方向.
引用
收藏
页码:12 / 22
页数:11
相关论文
共 16 条
[1]
基于高斯混合模型的语音转换技术研究 [D]. 
赵玲丽 .
南京邮电大学,
2011
[2]
语音转换技术研究现状及展望 [J].
张雄伟 ;
苗晓孔 ;
曾歆 ;
孙蒙 ;
曹铁勇 .
数据采集与处理, 2019, 34 (05) :753-770
[3]
An Exemplar-Based Approach to Frequency Warping for Voice Conversion [J].
Tian, Xiaohai ;
Lee, Siu Wa ;
Wu, Zhizheng ;
Chng, Eng Siong ;
Li, Haizhou .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) :1863-1876
[4]
WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications [J].
Morise, Masanori ;
Yokomori, Fumiya ;
Ozawa, Kenji .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (07) :1877-1884
[5]
Hidden Markov Model based Speech Synthesis: A Review.[J].Kayte Sangramsing;Mundada Monica;Gujrathi Jayesh.International Journal of Computer Applications.2015, 3
[6]
Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training [J].
Chen, Ling-Hui ;
Ling, Zhen-Hua ;
Liu, Li-Juan ;
Dai, Li-Rong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1859-1872
[7]
Speech Synthesis Based on Hidden Markov Models [J].
Tokuda, Keiichi ;
Nankaku, Yoshihiko ;
Toda, Tomoki ;
Zen, Heiga ;
Yamagishi, Junichi ;
Oura, Keiichiro .
PROCEEDINGS OF THE IEEE, 2013, 101 (05) :1234-1252
[8]
Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion..[J].Zhizheng Wu;Tomi Kinnunen;Engsiong Chng;Haizhou Li 0001.IEEE Signal Process. Lett..2012, 12
[9]
INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora [J].
Erro, Daniel ;
Moreno, Asuncion ;
Bonafonte, Antonio .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05) :944-953
[10]
Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model.[J].Tomoki Toda;Alan W. Black;Keiichi Tokuda.Speech Communication.2007, 3