Efficient Temporal Action Localization in Videos

Handle URI:
http://hdl.handle.net/10754/627678
Title:
Efficient Temporal Action Localization in Videos
Authors:
Alwassel, Humam ( 0000-0002-4036-7670 )
Abstract:
State-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which are the most relevant to the actions being searched. To address this need, we propose the new problem of action spotting in videos, which we define as finding a specific action in a video while observing a small portion of that video. Inspired by the observation that humans are extremely efficient and accurate in spotting and finding action instances in a video, we propose Action Search, a novel Recurrent Neural Network approach that mimics the way humans spot actions. Moreover, to address the absence of data recording the behavior of human annotators, we put forward the Human Searches dataset, which compiles the search sequences employed by human annotators spotting actions in the AVA and THUMOS14 datasets. We consider temporal action localization as an application of the action spotting problem. Experiments on the THUMOS14 dataset reveal that our model is not only able to explore the video efficiently (observing on average 17.3% of the video) but it also accurately finds human activities with 30.8% mAP (0.5 tIoU), outperforming state-of-the-art methods
Advisors:
Ghanem, Bernard ( 0000-0002-5534-587X )
Committee Member:
Heidrich, Wolfgang ( 0000-0002-4227-8508 ) ; Wonka, Peter ( 0000-0003-0627-9746 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
17-Apr-2018
Type:
Thesis
Appears in Collections:
Theses

Full metadata record

DC FieldValue Language
dc.contributor.advisorGhanem, Bernarden
dc.contributor.authorAlwassel, Humamen
dc.date.accessioned2018-04-29T06:44:10Z-
dc.date.available2018-04-29T06:44:10Z-
dc.date.issued2018-04-17-
dc.identifier.urihttp://hdl.handle.net/10754/627678-
dc.description.abstractState-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which are the most relevant to the actions being searched. To address this need, we propose the new problem of action spotting in videos, which we define as finding a specific action in a video while observing a small portion of that video. Inspired by the observation that humans are extremely efficient and accurate in spotting and finding action instances in a video, we propose Action Search, a novel Recurrent Neural Network approach that mimics the way humans spot actions. Moreover, to address the absence of data recording the behavior of human annotators, we put forward the Human Searches dataset, which compiles the search sequences employed by human annotators spotting actions in the AVA and THUMOS14 datasets. We consider temporal action localization as an application of the action spotting problem. Experiments on the THUMOS14 dataset reveal that our model is not only able to explore the video efficiently (observing on average 17.3% of the video) but it also accurately finds human activities with 30.8% mAP (0.5 tIoU), outperforming state-of-the-art methodsen
dc.language.isoenen
dc.subjectvideo understandingen
dc.subjectaction localizationen
dc.subjectaction spottingen
dc.titleEfficient Temporal Action Localization in Videosen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen
dc.contributor.committeememberHeidrich, Wolfgangen
dc.contributor.committeememberWonka, Peteren
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id142704en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.