基于深度学习语音分离技术的研究现状与进展

被引:71
作者
刘文举 [1 ]
聂帅 [1 ]
梁山 [1 ]
张学良 [2 ]
机构
[1] 中国科学院自动化研究所模式识别国家重点实验室
[2] 内蒙古大学计算机系
关键词
神经网络; 语音分离; 计算听觉场景分析; 机器学习;
D O I
10.16383/j.aas.2016.c150734
中图分类号
TP183 [人工神经网络与计算]; TP181 [自动推理、机器学习];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
现阶段,语音交互技术日益在现实生活中得到广泛的应用,然而,由于干扰的存在,现实环境中的语音交互技术远没有达到令人满意的程度.针对加性噪音的语音分离技术是提高语音交互性能的有效途径,几十年来,全世界范围内的许多研究者为此投入了巨大的努力,提出了很多实用的方法.特别是近年来,由于深度学习研究的兴起,基于深度学习的语音分离技术日益得到了广泛关注和重视,显露出了相当光明的应用前景,逐渐成为语音分离中一个新的研究趋势.目前已有很多基于深度学习的语音分离方法被提出,但是,对于深度学习语音分离技术一直以来都缺乏一个系统的分析和总结,不同方法之间的联系和区分也很少被研究.针对这个问题,本文试图对语音分离的主要流程和整体框架进行细致的分析和总结,从特征、模型以及目标三个方面对现有的前沿研究进展进行全面而深入的综述,最后对语音分离技术进行展望.
引用
收藏
页码:819 / 833
页数:15
相关论文
共 20 条
[1]   Complex Ratio Masking for Monaural Speech Separation [J].
Williamson, Donald S. ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (03) :483-492
[2]   Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation [J].
Huang, Po-Sen ;
Kim, Minje ;
Hasegawa-Johnson, Mark ;
Smaragdis, Paris .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2136-2147
[3]   Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition [J].
Weng, Chao ;
Yu, Dong ;
Seltzer, Michael L. ;
Droppo, Jasha .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) :1670-1679
[4]   Learning Spectral Mapping for Speech Dereverberation and Denoising [J].
Han, Kun ;
Wang, Yuxuan ;
Wang, DeLiang ;
Woods, William S. ;
Merks, Ivo ;
Zhang, Tao .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (06) :982-992
[5]   Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training [J].
Narayanan, Arun ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) :92-101
[6]   On Training Targets for Supervised Speech Separation [J].
Wang, Yuxuan ;
Narayanan, Arun ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1849-1858
[7]   STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement [J].
Krawczyk, Martin ;
Gerkmann, Timo .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1931-1940
[8]   A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002
[9]   Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition [J].
Narayanan, Arun ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) :826-835
[10]  
The analysis of the simplification from the ideal ratio to binary mask in signal-to-noise ratio sense[J] . Shan Liang,WenJu Liu,Wei Jiang,Wei Xue.Speech Communication . 2013