Weakly-supervised object detection via mining pseudo ground truth bounding-boxes

Abstract
Recently, weakly-supervised object detection has attracted much attention, since it does not require expensive bounding-box annotations while training the network. Although significant progress has also been made, there is still a large gap on the performance between weakly-supervised and fully-supervised object detection. To mitigate this gap, some works try to use the pseudo ground truths generated by a weakly-supervised detector to train a supervised detector. However, such approaches incline to find the most representative parts instead of the whole body of an object, and only seek one ground truth bounding-box per class even though many same-class instances exist in an image. To address these issues, we propose a weakly-supervised to fully-supervised framework (W2F), where a weakly-supervised detector is implemented using multiple instance learning. And then, we propose a pseudo ground-truth excavation (PGE) algorithm to find the accurate pseudo ground truth bounding-box for each instance. Moreover, the pseudo ground-truth adaptation (PGA) algorithm is designed to further refine those pseudo ground truths mined by PGE algorithm. Finally, the mined pseudo ground truths are used as supervision to train a fully-supervised detector. Additionally, we also propose an iterative ground-truth learning (IGL) approach, which enhances the quality of the pseudo ground truths by using the predictions of the fully-supervised detector iteratively. Extensive experiments on the challenging PASCAL VOC 2007 and 2012 benchmarks strongly demonstrate the effectiveness of our method. We obtain 53.1% and 49.4% mAP on VOC2007 and VOC2012 respectively, which is a significant improvement over previous state-of-the-art methods.

Citation
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recognition 84: 68–81. Available: http://dx.doi.org/10.1016/j.patcog.2018.07.005.

Acknowledgements
The authors would like to thank Peng Tang for his valuable discussions. This work was done when Yongqiang Zhang worked at KAUST as a visiting Ph.D. student. Yongqiang Zhang is supported by funding from Harbin Institute of Technology (HIT) and King Abdullah University of Science and Technology (KAUST). Mingli Ding, and Yongqiang Li are supported by funding from Harbin Institute of Technology (HIT). Bernard Ghanem and Yancheng Bai are supported by funding from King Abdullah University of Science and Technology (KAUST).

Publisher
Elsevier BV

Journal
Pattern Recognition

DOI
10.1016/j.patcog.2018.07.005

Additional Links
https://www.sciencedirect.com/science/article/pii/S0031320318302346

Permanent link to this record