Show simple item record

dc.contributor.authorHuang, Jia-Hong
dc.contributor.authorAlfadly, Modar
dc.contributor.authorGhanem, Bernard
dc.date.accessioned2017-12-28T07:32:15Z
dc.date.available2017-12-28T07:32:15Z
dc.date.issued2017-09-14
dc.identifier.urihttp://hdl.handle.net/10754/626540
dc.description.abstractVisual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.
dc.publisherarXiv
dc.relation.urlhttp://arxiv.org/abs/1709.04625v1
dc.relation.urlhttp://arxiv.org/pdf/1709.04625v1
dc.rightsArchived with thanks to arXiv
dc.titleRobustness Analysis of Visual QA Models by Basic Questions
dc.typePreprint
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentEarth Science and Engineering Program
dc.contributor.departmentElectrical Engineering Program
dc.contributor.departmentVisual Computing Center (VCC)
dc.eprint.versionPre-print
dc.identifier.arxividarXiv:1709.04625
kaust.personHuang, Jia-Hong
kaust.personAlfadly, Modar
kaust.personGhanem, Bernard
refterms.dateFOA2018-06-14T09:31:33Z


Files in this item

Thumbnail
Name:
1709.04625v1.pdf
Size:
712.4Kb
Format:
PDF
Description:
Preprint

This item appears in the following Collection(s)

Show simple item record