Show simple item record

dc.contributor.authorAlcázar, Juan León
dc.contributor.authorCaba, Fabian
dc.contributor.authorMai, Long
dc.contributor.authorPerazzi, Federico
dc.contributor.authorLee, Joon-Young
dc.contributor.authorArbeláez, Pablo
dc.contributor.authorGhanem, Bernard
dc.date.accessioned2020-06-17T08:19:51Z
dc.date.available2020-06-17T08:19:51Z
dc.date.issued2020-08-05
dc.identifier.citationAlcazar, J. L., Caba, F., Mai, L., Perazzi, F., Lee, J.-Y., Arbelaez, P., & Ghanem, B. (2020). Active Speakers in Context. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr42600.2020.01248
dc.identifier.isbn978-1-7281-7169-2
dc.identifier.issn1063-6919
dc.identifier.doi10.1109/CVPR42600.2020.01248
dc.identifier.urihttp://hdl.handle.net/10754/663635
dc.description.abstractCurrent methods for active speaker detection focus on modeling audiovisual information from a single speaker. This strategy can be adequate for addressing single-speaker scenarios, but it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our new model learns pairwise and temporal relations from a structured ensemble of audiovisual observations. Our experiments show that a structured feature ensemble already beneï¬ ts active speaker detection performance. We also ï¬ nd that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving an mAP of 87.1%. Moreover, ablation studies verify that this result is a direct consequence of our long-term multi-speaker analysis.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.urlhttps://ieeexplore.ieee.org/document/9157027/
dc.relation.urlhttps://ieeexplore.ieee.org/document/9157027/
dc.relation.urlhttps://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9157027
dc.rightsArchived with thanks to IEEE
dc.titleActive Speakers in Context
dc.typeConference Paper
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentElectrical Engineering Program
dc.contributor.departmentVCC Analytics Research Group
dc.conference.date13-19 June 2020
dc.conference.name2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.conference.locationSeattle, WA, USA
dc.eprint.versionPost-print
dc.contributor.institutionUniversidad de Los Andes
dc.contributor.institutionAdobe Research
dc.identifier.arxivid2005.09812
kaust.personGhanem, Bernard
refterms.dateFOA2020-06-17T08:20:50Z
dc.date.published-online2020-08-05
dc.date.published-print2020-06
dc.date.posted2020-05-20


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
6.031Mb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record