Robustness Analysis of Visual Question Answering Models by Basic Questions

Handle URI:
http://hdl.handle.net/10754/626314
Title:
Robustness Analysis of Visual Question Answering Models by Basic Questions
Authors:
Huang, Jia-Hong ( 0000-0001-7943-2591 )
Abstract:
Visual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.
Advisors:
Ghanem, Bernard ( 0000-0002-5534-587X )
Committee Member:
Heidrich, Wolfgang ( 0000-0002-4227-8508 ) ; Michels, Dominik
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Electrical Engineering
Issue Date:
Nov-2017
Type:
Thesis
Appears in Collections:
Theses

Full metadata record

DC FieldValue Language
dc.contributor.advisorGhanem, Bernarden
dc.contributor.authorHuang, Jia-Hongen
dc.date.accessioned2017-12-07T12:49:26Z-
dc.date.available2017-12-07T12:49:26Z-
dc.date.issued2017-11-
dc.identifier.urihttp://hdl.handle.net/10754/626314-
dc.description.abstractVisual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.en
dc.language.isoenen
dc.subjectVQAen
dc.subjectNLPen
dc.subjectBasic Questionen
dc.subjectMain Questionen
dc.subjectRobustnessen
dc.subjectRScoreen
dc.titleRobustness Analysis of Visual Question Answering Models by Basic Questionsen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen
dc.contributor.committeememberHeidrich, Wolfgangen
dc.contributor.committeememberMichels, Dominiken
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.nameMaster of Scienceen
dc.person.id146201en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.