Show simple item record

dc.contributor.authorEscorcia, Victor
dc.contributor.authorSoldan, Mattia
dc.contributor.authorSivic, Josef
dc.contributor.authorGhanem, Bernard
dc.contributor.authorRussell, Bryan
dc.date.accessioned2019-12-18T10:11:55Z
dc.date.available2019-12-18T10:11:55Z
dc.date.issued2019-07-30
dc.identifier.urihttp://hdl.handle.net/10754/660659
dc.description.abstractIn this paper, we introduce the task of retrieving relevant video moments from a large corpus of untrimmed, unsegmented videos given a natural language query. Our task poses unique challenges as a system must efficiently identify both the relevant videos and localize the relevant moments in the videos. This task is in contrast to prior work that localizes relevant moments in a single video or searches a large collection of already-segmented videos. For our task, we introduce Clip Alignment with Language (CAL), a model that aligns features for a natural language query to a sequence of short video clips that compose a candidate moment in a video. Our approach goes beyond prior work that aggregates video features over a candidate moment by allowing for finer clip alignment. Moreover, our approach is amenable to efficient indexing of the resulting clip-level representations, which makes it suitable for moment localization in large video collections. We evaluate our approach on three recently proposed datasets for temporal localization of moments in video with natural language extended to our video corpus moment retrieval setting: DiDeMo, Charades-STA, and ActivityNet-captions. We show that our CAL model outperforms the recently proposed Moment Context Network (MCN) on all criteria across all datasets on our proposed task, obtaining an 8%-85% and 11%-47% boost for average recall and median rank, respectively, and achieves 5x faster retrieval and 8x smaller index size with a 500K video corpus.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/1907.12763
dc.rightsArchived with thanks to arXiv
dc.titleTemporal Localization of Moments in Video Collections with Natural Language
dc.typePreprint
dc.contributor.departmentElectrical Engineering Program
dc.contributor.departmentKing Abdullah University of Science and Technology.
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.eprint.versionPre-print
dc.contributor.institutionAdobe Research
dc.contributor.institutionINRIA
dc.identifier.arxivid1907.12763
kaust.personEscorcia, Victor
kaust.personSoldan, Mattia
kaust.personGhanem, Bernard
refterms.dateFOA2019-12-18T10:12:38Z


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
2.839Mb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record