多模态学习方法综述

被引:28
作者
陈鹏 [1 ,2 ]
李擎 [2 ,1 ]
张德政 [3 ,4 ]
杨宇航 [1 ]
蔡铮 [1 ]
陆子怡 [1 ]
机构
[1] 北京科技大学自动化学院
[2] 工业过程知识自动化教育部重点实验室
[3] 北京科技大学计算机与通信工程学院
[4] 材料领域知识工程北京市重点实验室
基金
国家重点研发计划;
关键词
多模态学习; 统计学习; 深度学习; 对抗学习; 特征表示;
D O I
10.13374/j.issn2095-9389.2019.03.21.003
中图分类号
TP311.13 []; TP18 [人工智能理论];
学科分类号
1201 ; 081104 ; 0812 ; 0835 ; 1405 ;
摘要
大数据是多源异构的.在信息技术飞速发展的今天,多模态数据已成为近来数据资源的主要形式.研究多模态学习方法,赋予计算机理解多源异构海量数据的能力具有重要价值.本文归纳了多模态的定义与多模态学习的基本任务,介绍了多模态学习的认知机理与发展过程.在此基础上,重点综述了多模态统计学习方法与深度学习方法.此外,本文系统归纳了近两年较为新颖的基于对抗学习的跨模态匹配与生成技术.本文总结了多模态学习的主要形式,并对未来可能的研究方向进行思考与展望.
引用
收藏
页码:557 / 569
页数:13
相关论文
共 34 条
  • [1] Multi-modal local receptive field extreme learning machine for object recognition[J] . Huaping Liu,Fengxue Li,Xinying Xu,Fuchun Sun.Neurocomputing . 2018
  • [2] Effective visual design and communication practices for research posters: Exemplars based on the theory and practice of multimedia learning and rhetoric[J] . Rhianna K. Pedwell,James A. Hardy,Susan L. Rowland.Biochemistry and Molecular Biology Education . 2017 (3)
  • [3] Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval[J] . Leiquan Wang,Weichen Sun,Zhicheng Zhao,Fei Su.Signal Processing . 2017
  • [4] Unsupervised discriminant canonical correlation analysis based on spectral clustering[J] . Sheng Wang,Jianfeng Lu,Xingjian Gu,Benjamin A. Weyori,Jing-yu Yang.Neurocomputing . 2016
  • [5] Dropout: a simple way to prevent neural networks from overfitting[J] . Nitish Srivastava,Geoffrey E. Hinton,Alex Krizhevsky,Ilya Sutskever,Ruslan Salakhutdinov.Journal of Machine Learning Research . 2014 (1)
  • [6] Transductive Multilabel Learning via Label Set Propagation
    Kong, Xiangnan
    Ng, Michael K.
    Zhou, Zhi-Hua
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (03) : 704 - 719
  • [7] Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation
    Ozerov, Alexey
    Fevotte, Cedric
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03): : 550 - 563
  • [8] Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework
    Woellmer, Martin
    Eyben, Florian
    Graves, Alex
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. COGNITIVE COMPUTATION, 2010, 2 (03) : 180 - 190
  • [9] An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels
    Steinwart, Ingo
    Hush, Don
    Scovel, Clint
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (10) : 4635 - 4643
  • [10] Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J] . Alex Graves,Jürgen Schmidhuber.Neural Networks . 2005 (5)