Accurately Estimating User Cardinalities and Detecting Super Spreaders over Time
KAUST DepartmentComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Machine Intelligence & kNowledge Engineering Lab
Permanent link to this recordhttp://hdl.handle.net/10754/661680
MetadataShow full item record
AbstractOnline monitoring user cardinalities in graph streams is fundamental for many applications such as anomaly detection. These graph streams may contain edge duplicates and have a large number of user-item pairs, which makes it infeasible to exactly compute user cardinalities due to limited computational and memory resources. Existing methods are designed to approximately estimate user cardinalities, but their accuracy highly depends on complex parameters and they cannot provide anytime-available estimation. To address these problems, we develop novel bit/register sharing algorithms, which use a bit/register array to build a compact sketch of all users' connected items. Our algorithms exploit the dynamic properties of the bit/register arrays (e.g., the fraction of zero bits in the bit array) to significantly improve the estimation accuracy, and have low time complexity O(1) to update the estimations for a new user-item pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. Furthermore, we extend our methods to detect super spreaders with large cardinalities in real-time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.
CitationJia, P., Wang, P., Zhang, Y., Zhang, X., Tao, J., Ding, J., … Towsley, D. (2020). Accurately Estimating User Cardinalities and Detecting Super Spreaders over Time. IEEE Transactions on Knowledge and Data Engineering, 1–1. doi:10.1109/tkde.2020.2975625
SponsorsThe research presented in this paper is supported in part by National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (U1736205, 61603290), Shenzhen Basic Research Grant (JCYJ20170816100819428), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034).