A review of affective computing: From unimodal analysis to multimodal fusion

被引：918

作者：

Poria, Soujanya ^{[1
]}

Cambria, Erik ^{[3
]}

Bajpai, Rajiv ^{[2
]}

Hussain, Amir ^{[1
]}

机构：

[1] Univ Stirling, Sch Nat Sci, Stirling, Scotland

[2] Nanyang Technol Univ, Temasek Labs, Singapore, Singapore

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

INFORMATION FUSION | 2017年 / 37卷

基金：

英国工程与自然科学研究理事会;

关键词：

Affective computing; Sentiment analysis; Multimodal affect analysis; Multimodal fusion; Audio; visual and text information fusion; FACIAL EXPRESSION RECOGNITION; SPEECH EMOTION RECOGNITION; SENTIMENT ANALYSIS; AUTOMATIC-ANALYSIS; ACTION UNITS; CLASSIFICATION; BODY; FEATURES; SYSTEM; AUDIO;

D O I：

10.1016/j.inffus.2017.02.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：98 / 125

页数：28

共 299 条

[1]

Alam F., 2014, P 2014 ACM MULT WORK, P15

[2]

[Anonymous], P ICDM BARC

[3]

[Anonymous], EMOTION RECOGNITION

[4]

[Anonymous], 2007, P BMVC WARW UK 10 13

[5]

[Anonymous], 2010, P ADV NEUR INF PROC

[6]

[Anonymous], 2006, Proceedings of the Conference on Empirical Methods in Natural Language Processing

[7]

[Anonymous], 2007, ROBUST SPEECH RECOGN

[8]

[Anonymous], 2011, P 11 INT C MULT INT

[9]

[Anonymous], 2004, 6 INT C MULTIMODAL I

[10]

[Anonymous], SYST MAN CYB 2007 IS

← 1 2 3 4 5 6 7 8 9 10 →