Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization
KAUST DepartmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Computer Science Program
Electrical Engineering Program
Visual Computing Center (VCC)
KAUST Grant NumberOSR-CRG2017-3405
Online Publication Date2018-10-05
Print Publication Date2018
Permanent link to this recordhttp://hdl.handle.net/10754/630233
MetadataShow full item record
AbstractState-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which are the most relevant to the actions being searched for. To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video. Inspired by the observation that humans are extremely efficient and accurate in spotting and finding action instances in video, we propose Action Search, a novel Recurrent Neural Network approach that mimics the way humans spot actions. Moreover, to address the absence of data recording the behavior of human annotators, we put forward the Human Searches dataset, which compiles the search sequences employed by human annotators spotting actions in the AVA and THUMOS14 datasets. We consider temporal action localization as an application of the action spotting problem. Experiments on the THUMOS14 dataset reveal that our model is not only able to explore the video efficiently (observing on average 17.3% of the video) but it also accurately finds human activities with 30.8% mAP.
CitationAlwassel H, Caba Heilbron F, Ghanem B (2018) Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization. Lecture Notes in Computer Science: 253–269. Available: http://dx.doi.org/10.1007/978-3-030-01240-3_16.
SponsorsThis publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-CRG2017-3405.
Conference/Event name15th European Conference on Computer Vision, ECCV 2018
Showing items related by title, author, creator and subject.
DAPs: Deep Action Proposals for Action UnderstandingEscorcia, Victor; Caba Heilbron, Fabian; Niebles, Juan Carlos; Ghanem, Bernard (Lecture Notes in Computer Science, Springer Nature, 2016-09-17) [Conference Paper]Object proposals have contributed significantly to recent advances in object understanding in images. Inspired by the success of this approach, we introduce Deep Action Proposals (DAPs), an effective and efficient algorithm for generating temporal action proposals from long videos. We show how to take advantage of the vast capacity of deep learning models and memory cells to retrieve from untrimmed videos temporal segments, which are likely to contain actions. A comprehensive evaluation indicates that our approach outperforms previous work on a large scale action benchmark, runs at 134 FPS making it practical for large-scale scenarios, and exhibits an appealing ability to generalize, i.e. to retrieve good quality temporal proposals of actions unseen in training.
Diagnosing Error in Temporal Action DetectorsAlwassel, Humam; Caba Heilbron, Fabian; Escorcia, Victor; Ghanem, Bernard (Lecture Notes in Computer Science, Springer Nature, 2018-10-07) [Conference Paper]Despite the recent progress in video understanding and the continuous rate of improvement in temporal action localization throughout the years, it is still unclear how far (or close?) we are to solving the problem. To this end, we introduce a new diagnostic tool to analyze the performance of temporal action detectors in videos and compare different methods beyond a single scalar metric. We exemplify the use of our tool by analyzing the performance of the top rewarded entries in the latest ActivityNet action localization challenge. Our analysis shows that the most impactful areas to work on are: strategies to better handle temporal context around the instances, improving the robustness w.r.t. the instance absolute and relative size, and strategies to reduce the localization errors. Moreover, our experimental analysis finds the lack of agreement among annotator is not a major roadblock to attain progress in the field. Our diagnostic tool is publicly available to keep fueling the minds of other researchers with additional insights about their algorithms.
Vision-based Human Action Classification Using Adaptive Boosting AlgorithmZerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane (IEEE Sensors Journal, Institute of Electrical and Electronics Engineers (IEEE), 2018-05-07) [Article]Precise recognition of human action is a key enabler for the development of many applications including autonomous robots for medical diagnosis and surveillance of elderly people in home environment. This paper addresses the human action recognition based on variation in body shape. Specifically, we divide the human body into five partitions that correspond to five partial occupancy areas. For each frame, we calculated area ratios and used them as input data for recognition stage. Here, we consider six classes of activities namely: walking, standing, bending, lying, squatting, and sitting. In this paper, we proposed an efficient human action recognition scheme, which takes advantages of superior discrimination capacity of AdaBoost algorithm. We validated the effectiveness of this approach by using experimental data from two publicly available databases fall detection databases from the University of Rzeszow’s and the Universidad de Málaga fall detection datasets. We provided comparisons of the proposed approach with state-of-the-art classifiers based on the neural network, K-nearest neighbor, support vector machine and naïve Bayes and showed that we achieve better results in discriminating human gestures.