End-to-end, single-stream temporal action detection in untrimmed videos
dc.contributor.author | Buch, Shyamal | |
dc.contributor.author | Escorcia, Victor | |
dc.contributor.author | Ghanem, Bernard | |
dc.contributor.author | Fei-Fei, Li | |
dc.contributor.author | Niebles, Juan Carlos | |
dc.date.accessioned | 2020-06-09T13:34:00Z | |
dc.date.available | 2020-06-09T13:34:00Z | |
dc.date.issued | 2019-05-01 | |
dc.identifier.citation | Buch, S., Escorcia, V., Ghanem, B., & Niebles, J. C. (2017). End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos. Procedings of the British Machine Vision Conference 2017. doi:10.5244/c.31.93 | |
dc.identifier.isbn | 190172560X | |
dc.identifier.isbn | 9781901725605 | |
dc.identifier.doi | 10.5244/c.31.93 | |
dc.identifier.uri | http://hdl.handle.net/10754/663479 | |
dc.description.abstract | In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs. By design, our single-pass network is very efficient and can operate at 701 frames per second, while simultaneously outperforming the state-of-the-art methods for temporal action detection on THUMOS’14. | |
dc.publisher | British Machine Vision Association and Society for Pattern Recognition | |
dc.relation.url | http://www.bmva.org/bmvc/2017/papers/paper093/index.html | |
dc.rights | Archived with thanks to British Machine Vision Association | |
dc.title | End-to-end, single-stream temporal action detection in untrimmed videos | |
dc.type | Conference Paper | |
dc.contributor.department | Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division | |
dc.contributor.department | Electrical Engineering Program | |
dc.contributor.department | VCC Analytics Research Group | |
dc.conference.date | 2017-09-04 to 2017-09-07 | |
dc.conference.name | 28th British Machine Vision Conference, BMVC 2017 | |
dc.conference.location | London, GBR | |
dc.eprint.version | Post-print | |
dc.contributor.institution | Stanford Vision and Learning Lab., Dept. of Computer Science, Stanford University, United States | |
kaust.person | Escorcia, Victor | |
kaust.person | Ghanem, Bernard | |
dc.identifier.eid | 2-s2.0-85084013937 |
This item appears in the following Collection(s)
-
Conference Papers
-
Electrical and Computer Engineering Program
For more information visit: https://cemse.kaust.edu.sa/ece -
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/