VQA: Visual Question Answering

被引：218

作者：

Agrawal, Aishwarya ^{[1
]}

Lu, Jiasen ^{[1
]}

Antol, Stanislaw ^{[1
]}

Mitchell, Margaret ^{[2
]}

Zitnick, C. Lawrence ^{[3
]}

Parikh, Devi ^{[4
]}

Batra, Dhruv ^{[4
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

[2] Microsoft Res, Redmond, WA USA

[3] Facebook AI Res, Menlo Pk, CA USA

[4] Georgia Inst Technol, Blacksburg, VA USA

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2017年 / 123卷 / 01期

基金：

美国国家科学基金会;

关键词：

Visual Question Answering;

D O I：

10.1007/s11263-016-0966-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing similar to 0.25 M images, similar to 0.76 M questions, and similar to 10 M answers (www.visuaiqa.org) and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

引用

页码：4 / 31

页数：28

共 58 条

[1]

Agrawal H., 2015, Mobile cloud visual media computing, P265

[2]

[Anonymous], INT C MAN DAT

[3]

[Anonymous], ARXIV151105099 CORR

[4]

[Anonymous], 2015, INT C COMP VIS ICCCV

[5]

[Anonymous], ACL WORKSH INT LANG

[6]

[Anonymous], 2015, CVPR

[7]

[Anonymous], 2015, NIPS

[8]

[Anonymous], HLT NAACL

[9]

[Anonymous], 2011, P 24 CVPR

[10]

[Anonymous], 2015, CVPR

← 1 2 3 4 5 6 →