Show simple item record

dc.contributor.authorAbdelkarim, Sherif
dc.contributor.authorAchlioptas, Panos
dc.contributor.authorHuang, Jiaji
dc.contributor.authorLi, Boyang
dc.contributor.authorChurch, Kenneth
dc.contributor.authorElhoseiny, Mohamed
dc.date.accessioned2020-04-12T13:24:23Z
dc.date.available2020-04-12T13:24:23Z
dc.date.issued2020-03-25
dc.identifier.urihttp://hdl.handle.net/10754/662491
dc.description.abstractScaling up the vocabulary and complexity of current visual understanding systems is necessary in order to bridge the gap between human and machine visual intelligence. However, a crucial impediment to this end lies in the difficulty of generalizing to data distributions that come from real-world scenarios. Typically such distributions follow Zipf's law which states that only a small portion of the collected object classes will have abundant examples (head); while most classes will contain just a few (tail). In this paper, we propose to study a novel task concerning the generalization of visual relationships that are on the distribution's tail, i.e. we investigate how to help AI systems to better recognize rare relationships like <S:dog, P:riding, O:horse>, where the subject S, predicate P, and/or the object O come from the tail of the corresponding distributions. To achieve this goal, we first introduce two large-scale visual-relationship detection benchmarks built upon the widely used Visual Genome and GQA datasets. We also propose an intuitive evaluation protocol that gives credit to classifiers who prefer concepts that are semantically close to the ground truth class according to wordNet- or word2vec-induced metrics. Finally, we introduce a visiolinguistic version of a Hubless loss which we show experimentally that it consistently encourages classifiers to be more predictive of the tail classes while still being accurate on head classes. Our code and models are available on http://bit.ly/LTVRR.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/2004.00436
dc.rightsArchived with thanks to arXiv
dc.titleLong-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss
dc.typePreprint
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentKing Abdullah University of Science and Technology
dc.eprint.versionPre-print
dc.contributor.institutionStanford University
dc.contributor.institutionBaidu Research
dc.contributor.institutionNanyang Technological University
dc.identifier.arxivid2004.00436
kaust.personAbdelkarim, Sherif
kaust.personElhoseiny, Mohamed
refterms.dateFOA2020-04-12T13:25:49Z


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
4.473Mb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record