Show simple item record

dc.contributor.authorAlcazar, Juan Leon
dc.contributor.authorHeilbron, Fabian Caba
dc.contributor.authorMai, Long
dc.contributor.authorPerazzi, Federico
dc.contributor.authorLee, Joon-Young
dc.contributor.authorArbelaez, Pablo
dc.contributor.authorGhanem, Bernard
dc.date.accessioned2021-09-13T06:18:18Z
dc.date.available2021-06-07T06:54:25Z
dc.date.available2021-09-13T06:18:18Z
dc.date.issued2021-06
dc.identifier.citationAlcazar, J. L., Heilbron, F. C., Mai, L., Perazzi, F., Lee, J.-Y., Arbelaez, P., & Ghanem, B. (2021). APES: Audiovisual Person Search in Untrimmed Video. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). doi:10.1109/cvprw53098.2021.00188
dc.identifier.doi10.1109/cvprw53098.2021.00188
dc.identifier.urihttp://hdl.handle.net/10754/669426
dc.description.abstractHumans are arguably one of the most important subjects in video streams, many real-world applications such as video summarization or video editing workflows often require the automatic search and retrieval of a person of interest. Despite tremendous efforts in the person re-identification and retrieval domains, few works have developed audiovisual search strategies. In this paper, we present the Audiovisual Person Search dataset (APES), a new dataset composed of untrimmed videos whose audio (voices) and visual (faces) streams are densely annotated. APES contains over 1.9K identities labeled along 36 hours of video, making it the largest dataset available for untrimmed audiovisual person search. A key property of APES is that it includes dense temporal annotations that link faces to speech segments of the same identity. To showcase the potential of our new dataset, we propose an audiovisual baseline and benchmark for person retrieval. Our study shows that modeling audiovisual cues benefits the recognition of people’s identities.
dc.publisherIEEE
dc.relation.urlhttps://ieeexplore.ieee.org/document/9523077/
dc.relation.urlhttps://openaccess.thecvf.com/content/CVPR2021W/MULA/html/Alcazar_APES_Audiovisual_Person_Search_in_Untrimmed_Video_CVPRW_2021_paper.html
dc.rightsArchived with thanks to IEEE
dc.titleAPES: Audiovisual Person Search in Untrimmed Video
dc.typeConference Paper
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.contributor.departmentElectrical and Computer Engineering Program
dc.contributor.departmentVCC Analytics Research Group
dc.conference.date19-25 June 2021
dc.conference.name2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
dc.conference.locationNashville, TN, USA
dc.eprint.versionPost-print
dc.contributor.institutionAdobe Research
dc.contributor.institutionUniversidad de los Andes
dc.identifier.arxivid2106.01667
kaust.personAlcazar, Juan Leon
kaust.personGhanem, Bernard
refterms.dateFOA2021-06-07T06:55:19Z
dc.date.posted2021-06-03


Files in this item

Thumbnail
Name:
Alcazar_APES_Audiovisual_Person_Search_in_Untrimmed_Video_CVPRW_2021_paper.pdf
Size:
2.328Mb
Format:
PDF
Description:
Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record

VersionItemEditorDateSummary

*Selected version