多媒体技术研究:2014——深度学习与媒体计算

被引：13

作者：

吴飞 ^{[1
]}

朱文武 ^{[2
]}

于俊清 ^{[3
]}

机构：

[1] 浙江大学计算机学院

[2] 清华大学计算机学院

[3] 华中科技大学计算机学院

来源：

中国图象图形学报 | 2015年 / 11期

关键词：

多媒体; 海量数据; 检索与标注; 语义理解; 深度学习;

D O I：

暂无

中图分类号：

TP37 [多媒体技术与多媒体计算机]; TP391.1 [文字信息处理];

学科分类号：

081201 ; 081203 ; 0835 ;

摘要：

目的海量数据的快速增长给多媒体计算带来了深刻挑战。与传统以手工构造为核心的媒体计算模式不同,数据驱动下的深度学习(特征学习)方法成为当前媒体计算主流。方法重点分析了深度学习在检索排序与标注、多模态检索与语义理解、视频分析与理解等媒体计算方面的最新进展和所面临的挑战,并对未来的发展趋势进行展望。结果在检索排序与标注方面,基于深度学习的神经编码等方法取得了很好的效果;在多模态检索与语义理解方面,深度学习被用于弥补不同模态间的"异构鸿沟"以及底层特征与高层语义间的"语义鸿沟",基于深度学习的组合语义学习成为研究热点;在视频分析与理解方面,深度神经网络被用于学习视频的有效表示方式及动作识别,并取得了很好的效果。然而,深度学习是一种数据驱动的方法,易受数据噪声影响,对于在线增量学习方面还不成熟,如何将深度学习与众包计算相结合是一个值得期待的问题。结论该综述在深入分析现有方法的基础上,对深度学习框架下为解决异构鸿沟和语义鸿沟给出新的思路。

引用

页码：1423 / 1433

页数：11

共 52 条

[1]

Efficient estimation of word representations in vector space. TOMAS M,KAI C,GREG C, et al. International Conference on Learning Representations (ICLR 2013) . 2013

[2]

Strategies for training large scale neural network language models. Mikolov T,Deoras A,Povery D. Automatic Speech Recognition and Understanding . 2011

[3]

Deep convolutional neural networks for LVCSR. Sainath T N,Mohamed A,Kingsbury B,et al. Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing . 2013

[4]

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Hinton, Geoffrey,Deng, Li,Yu, Dong,Dahl, George,Mohamed, Abdel-Rahman,Jaitly, Navdeep,Senior, Andrew,Vanhoucke, Vincent,Nguyen, Patrick,Sainath, Tara,Kingsbury, Brian. IEEE Signal Processing Magazine . 2012

[5]

2020Visions. NORVIG P,RELMAN D A,GOLDSTEIN D B, et al. Nature . 2010

[6]

Distributed representations of words and phrases and their compositionality. Mikolov T,Sutskever I,Chen K,et al. Advances in Neural Information Processing Systems . 2013

[7]

Image classification with the fisher vector:theory and practice. Sánchez J,Perronnin F,Mensink T, et al. International Journal of Computer Vision . 2013

[8]

Two-stream convolutional networks for action recognition in videos. Simonyan K,Zisserman A. Advances in Neural Information Processing Systems . 2014

[9]

Largescale video classification with convolutional neural networks. KARPATHY A,TODERICI G,SHETTY S,et al. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 2014

[10]

Action recognition with improved trajectories. Wang H,Schmid C. International Conference on Computer Vision . 2013

← 1 2 3 4 5 6 →