VQABQ: Visual Question Answering by Basic Questions

Handle URI:
http://hdl.handle.net/10754/626564
Title:
VQABQ: Visual Question Answering by Basic Questions
Authors:
Huang, Jia-Hong ( 0000-0001-7943-2591 ) ; Alfadly, Modar ( 0000-0002-3763-3819 ) ; Ghanem, Bernard ( 0000-0002-5534-587X )
Abstract:
Taking an image and question as the input of our method, it can output the text-based answer of the query question about the given image, so called Visual Question Answering (VQA). There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the basic questions of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question. We formulate the basic questions generation problem as a LASSO optimization problem, and also propose a criterion about how to exploit these basic questions to help answer main question. Our method is evaluated on the challenging VQA dataset and yields state-of-the-art accuracy, 60.34% in open-ended task.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Electrical Engineering Program; Computer Science Program; Visual Computing Center (VCC)
Publisher:
arXiv
Issue Date:
19-Mar-2017
ARXIV:
arXiv:1703.06492
Type:
Preprint
Additional Links:
http://arxiv.org/abs/1703.06492v2; http://arxiv.org/pdf/1703.06492v2
Appears in Collections:
Other/General Submission; Computer Science Program; Electrical Engineering Program; Visual Computing Center (VCC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorHuang, Jia-Hongen
dc.contributor.authorAlfadly, Modaren
dc.contributor.authorGhanem, Bernarden
dc.date.accessioned2017-12-28T07:32:16Z-
dc.date.available2017-12-28T07:32:16Z-
dc.date.issued2017-03-19en
dc.identifier.urihttp://hdl.handle.net/10754/626564-
dc.description.abstractTaking an image and question as the input of our method, it can output the text-based answer of the query question about the given image, so called Visual Question Answering (VQA). There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the basic questions of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question. We formulate the basic questions generation problem as a LASSO optimization problem, and also propose a criterion about how to exploit these basic questions to help answer main question. Our method is evaluated on the challenging VQA dataset and yields state-of-the-art accuracy, 60.34% in open-ended task.en
dc.publisherarXiven
dc.relation.urlhttp://arxiv.org/abs/1703.06492v2en
dc.relation.urlhttp://arxiv.org/pdf/1703.06492v2en
dc.rightsArchived with thanks to arXiven
dc.titleVQABQ: Visual Question Answering by Basic Questionsen
dc.typePreprinten
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentElectrical Engineering Programen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentVisual Computing Center (VCC)en
dc.eprint.versionPre-printen
dc.identifier.arxividarXiv:1703.06492en
kaust.authorHuang, Jia-Hongen
kaust.authorAlfadly, Modaren
kaust.authorGhanem, Bernarden
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.