Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Abstract
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given limited GPU memory, training TAL end to end (i.e., from videos to predictions) on long videos is a significant challenge. Most methods can only train on pre-extracted features without optimizing them for the localization problem, consequently limiting localization performance. In this work, to extend the potential in TAL networks, we propose a novel end-to-end method Re 2 TAL, which rewires pretrained video backbones for reversible TAL. Re 2 TAL builds a backbone with reversible modules, where the input can be recovered from the output such that the bulky intermediate activations can be cleared from memory during training. Instead of designing one single type of reversible module, we propose a network rewiring mechanism, to transform any module with a residual connection to a reversible module without changing any parameters. This provides two benefits: (1) a large variety of reversible networks are easily obtained from existing and even future model designs, and (2) the reversible models require much less training effort as they reuse the pre-trained parameters of their original non-reversible versions. Re 2 TAL, only using the RGB modality, reaches 37.01% average mAP on ActivityNet-v1.3, a new state-of-the-art record, and mAP 64.9% at tIoU=0.5 on THUMOS-14, outperforming all other RGB-only methods.

Citation
Zhao, C., Liu, S., Mangalam, K., & Ghanem, B. (2023). Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr52729.2023.01025

Acknowledgements
This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the Visual Computing Center (VCC) funding and SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence.

Publisher
IEEE

Conference/Event Name
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

DOI
10.1109/cvpr52729.2023.01025

arXiv
2211.14053

Additional Links
https://ieeexplore.ieee.org/document/10204207/

Relations
Is Supplemented By:
  • [Software]
    Title: coolbay/Re2TAL: Repository for the CVPR23 paper Re^2TAL. Publication Date: 2023-03-22. github: coolbay/Re2TAL Handle: 10754/694077

Permanent link to this record

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2023-08-27 07:55:03
Published as conference paper
2022-12-01 08:49:39
* Selected version