Show simple item record

dc.contributor.authorGao, Jialin
dc.contributor.authorSun, Xin
dc.contributor.authorXu, Mengmeng
dc.contributor.authorZhou, Xi
dc.contributor.authorGhanem, Bernard
dc.date.accessioned2021-10-14T07:30:03Z
dc.date.available2021-10-14T07:30:03Z
dc.date.issued2021-10-12
dc.identifier.urihttp://hdl.handle.net/10754/672843
dc.description.abstractTemporal language grounding in videos aims to localize the temporal span relevant to the given query sentence. Previous methods treat it either as a boundary regression task or a span extraction task. This paper will formulate temporal language grounding into video reading comprehension and propose a Relation-aware Network (RaNet) to address it. This framework aims to select a video moment choice from the predefined answer set with the aid of coarse-and-fine choice-query interaction and choice-choice relation construction. A choice-query interactor is proposed to match the visual and textual information simultaneously in sentence-moment and token-moment levels, leading to a coarse-and-fine cross-modal interaction. Moreover, a novel multi-choice relation constructor is introduced by leveraging graph convolution to capture the dependencies among video moment choices for the best choice selection. Extensive experiments on ActivityNet-Captions, TACoS, and Charades-STA demonstrate the effectiveness of our solution. Codes will be released soon.
dc.description.sponsorshipFirst of all, I would like to give my heartfelt thanks to all the people who have ever helped me in this paper. The support from CloudWalk Technology Co., Ltd is gratefully acknowledged. This work was also supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the Visual Computing Center (VCC) funding.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/2110.05717.pdf
dc.rightsArchived with thanks to arXiv
dc.titleRelation-aware Video Reading Comprehension for Temporal Language Grounding
dc.typePreprint
dc.contributor.departmentElectrical and Computer Engineering Program
dc.contributor.departmentElectrical and Computer Engineering
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.eprint.versionPre-print
dc.contributor.institutionCooperative Medianet Innovation Center, Shanghai Jiao Tong University
dc.contributor.institutionCloudWalk Technology Co., Ltd, China.
dc.identifier.arxivid2110.05717
kaust.personXu, Mengmeng
kaust.personGhanem, Bernard
refterms.dateFOA2021-10-14T07:31:26Z
kaust.acknowledged.supportUnitOffice of Sponsored Research
kaust.acknowledged.supportUnitVisual Computing Center (VCC)


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
1.781Mb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record