End-to-end, single-stream temporal action detection in untrimmed videos
Type
Conference PaperKAUST Department
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) DivisionElectrical Engineering Program
VCC Analytics Research Group
Date
2019-05-01Permanent link to this record
http://hdl.handle.net/10754/663479
Metadata
Show full item recordAbstract
In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs. By design, our single-pass network is very efficient and can operate at 701 frames per second, while simultaneously outperforming the state-of-the-art methods for temporal action detection on THUMOS’14.Citation
Buch, S., Escorcia, V., Ghanem, B., & Niebles, J. C. (2017). End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos. Procedings of the British Machine Vision Conference 2017. doi:10.5244/c.31.93Conference/Event name
28th British Machine Vision Conference, BMVC 2017ISBN
190172560X9781901725605
DOI
10.5244/c.31.93Additional Links
http://www.bmva.org/bmvc/2017/papers/paper093/index.htmlae974a485f413a2113503eed53cd6c53
10.5244/c.31.93