Type
PreprintKAUST Department
Computer Science ProgramComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Machine Intelligence & kNowledge Engineering Lab
Date
2019-12-24Permanent link to this record
http://hdl.handle.net/10754/661042
Metadata
Show full item recordAbstract
Crowdsourcing is a relatively economic and efficient solution to collect annotations from the crowd through online platforms. Answers collected from workers with different expertise may be noisy and unreliable, and the quality of annotated data needs to be further maintained. Various solutions have been attempted to obtain high-quality annotations. However, they all assume that workers' label quality is stable over time (always at the same level whenever they conduct the tasks). In practice, workers' attention level changes over time, and the ignorance of which can affect the reliability of the annotations. In this paper, we focus on a novel and realistic crowdsourcing scenario involving attention-aware annotations. We propose a new probabilistic model that takes into account workers' attention to estimate the label quality. Expectation propagation is adopted for efficient Bayesian inference of our model, and a generalized Expectation Maximization algorithm is derived to estimate both the ground truth of all tasks and the label-quality of each individual crowd worker with attention. In addition, the number of tasks best suited for a worker is estimated according to changes in attention. Experiments against related methods on three real-world and one semi-simulated datasets demonstrate that our method quantifies the relationship between workers' attention and label-quality on the given tasks, and improves the aggregated labels.Citation
Tu, J., Yu, G., Wang, J., Domeniconi, C., & Zhang, X. (2020). Attention-Aware Answers of the Crowd. Proceedings of the 2020 SIAM International Conference on Data Mining, 451–459. doi:10.1137/1.9781611976236.51Publisher
arXivarXiv
1912.11238Additional Links
https://arxiv.org/pdf/1912.11238ae974a485f413a2113503eed53cd6c53
10.1137/1.9781611976236.51