Utilizing Dynamic Properties of Sharing Bits and Registers to Estimate User Cardinalities Over Time
KAUST DepartmentComputer Science
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Permanent link to this recordhttp://hdl.handle.net/10754/655955
MetadataShow full item record
AbstractOnline monitoring user cardinalities (or degrees) in graph streams is fundamental for many applications. For example in a bipartite graph representing user-website visiting activities, user cardinalities (the number of distinct visited websites) are monitored to report network anomalies. These real-world graph streams may contain user-item duplicates and have a huge number of distinct user-item pairs, therefore, it is infeasible to exactly compute user cardinalities when memory and computation resources are limited. Existing methods are designed to approximately estimate user cardinalities, whose accuracy highly depends on parameters that are not easy to set. Moreover, these methods cannot provide anytime-available estimation, as the user cardinalities are computed at the end of the data stream. Realtime applications such as anomaly detection require that user cardinalities are estimated on the fly. To address these problems, we develop novel bit and register sharing algorithms, which use a bit array and a register array to build a compact sketch of all users' connected items respectively. Compared with previous bit and register sharing methods, our algorithms exploit the dynamic properties of the bit and register arrays (e.g., the fraction of zero bits in the bit array at each time) to significantly improve the estimation accuracy, and have low time complexity (O(1)) to update the estimations each time they observe a new useritem pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.
SponsorsThe research presented in this paper is supported in part by National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (U1301254, 61603290, 61602371), the Ministry of Education&China Mobile Research Fund (MCM20160311), the Natural Science Foundation of Jiangsu Province (SBK2014021758), 111 International Collaboration Program of China, the Prospective Joint Research of Industry-Academia-Research Joint Innovation Funding of Jiangsu Province (BY2014074), Shenzhen Basic Research Grant (JCYJ20160229195940462, JCYJ20170816100819428), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034).
Conference/Event name2019 IEEE 35th International Conference on Data Engineering (ICDE)