MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs

Handle URI:
http://hdl.handle.net/10754/625894
Title:
MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs
Authors:
Wang, Pinghui; Zhao, Junzhou; Zhang, Xiangliang ( 0000-0002-3574-5665 ) ; Li, Zhenguo; Cheng, Jiefeng; Lui, John C.S.; Towsley, Don; Tao, Jing; Guan, Xiaohong
Abstract:
Counting 3-, 4-, and 5-node graphlets in graphs is important for graph mining applications such as discovering abnormal/evolution patterns in social and biology networks. In addition, it is recently widely used for computing similarities between graphs and graph classification applications such as protein function prediction and malware detection. However, it is challenging to compute these metrics for a large graph or a large set of graphs due to the combinatorial nature of the problem. Despite recent efforts in counting triangles (a 3-node graphlet) and 4-node graphlets, little attention has been paid to characterizing 5-node graphlets. In this paper, we develop a computationally efficient sampling method to estimate 5-node graphlet counts. We not only provide fast sampling methods and unbiased estimators of graphlet counts, but also derive simple yet exact formulas for the variances of the estimators which is of great value in practice-the variances can be used to bound the estimates' errors and determine the smallest necessary sampling budget for a desired accuracy. We conduct experiments on a variety of real-world datasets, and the results show that our method is several orders of magnitude faster than the state-of-the-art methods with the same accuracy.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Wang P, Zhao J, Zhang X, Li Z, Cheng J, et al. (2017) MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs. IEEE Transactions on Knowledge and Data Engineering: 1–1. Available: http://dx.doi.org/10.1109/TKDE.2017.2756836.
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Journal:
IEEE Transactions on Knowledge and Data Engineering
Issue Date:
26-Sep-2017
DOI:
10.1109/TKDE.2017.2756836
Type:
Article
ISSN:
1041-4347
Sponsors:
The authors wish to thank the anonymous reviewers for their helpful feedback. In addition, the authors also wish to thank Mr. Yiyan Qi and Miss Xiaotong Ren for discussions. This work was supported in part by Army Research Office Contract W911NF-12-1-0385, and ARL under Cooperative Agreement W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the ARL, or the U.S. Government. The research presented in this paper is supported in part by National Natural Science Foundation of China (U1301254, 61603290, 61602371), the Ministry of Education & China Mobile Research Fund (MCM20160311), the Natural Science Foundation of Jiangsu Province (SBK2014021758), 111 International Collaboration Program of China, the Prospective Joint Research of Industry-Academia-Research Joint Innovation Funding of Jiangsu Province (BY2014074), Shenzhen Basic Research Grant (JCYJ20160229195940462), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034). Junzhou Zhao is the corresponding author.
Additional Links:
http://ieeexplore.ieee.org/document/8051106/
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorWang, Pinghuien
dc.contributor.authorZhao, Junzhouen
dc.contributor.authorZhang, Xiangliangen
dc.contributor.authorLi, Zhenguoen
dc.contributor.authorCheng, Jiefengen
dc.contributor.authorLui, John C.S.en
dc.contributor.authorTowsley, Donen
dc.contributor.authorTao, Jingen
dc.contributor.authorGuan, Xiaohongen
dc.date.accessioned2017-10-17T11:47:40Z-
dc.date.available2017-10-17T11:47:40Z-
dc.date.issued2017-09-26en
dc.identifier.citationWang P, Zhao J, Zhang X, Li Z, Cheng J, et al. (2017) MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs. IEEE Transactions on Knowledge and Data Engineering: 1–1. Available: http://dx.doi.org/10.1109/TKDE.2017.2756836.en
dc.identifier.issn1041-4347en
dc.identifier.doi10.1109/TKDE.2017.2756836en
dc.identifier.urihttp://hdl.handle.net/10754/625894-
dc.description.abstractCounting 3-, 4-, and 5-node graphlets in graphs is important for graph mining applications such as discovering abnormal/evolution patterns in social and biology networks. In addition, it is recently widely used for computing similarities between graphs and graph classification applications such as protein function prediction and malware detection. However, it is challenging to compute these metrics for a large graph or a large set of graphs due to the combinatorial nature of the problem. Despite recent efforts in counting triangles (a 3-node graphlet) and 4-node graphlets, little attention has been paid to characterizing 5-node graphlets. In this paper, we develop a computationally efficient sampling method to estimate 5-node graphlet counts. We not only provide fast sampling methods and unbiased estimators of graphlet counts, but also derive simple yet exact formulas for the variances of the estimators which is of great value in practice-the variances can be used to bound the estimates' errors and determine the smallest necessary sampling budget for a desired accuracy. We conduct experiments on a variety of real-world datasets, and the results show that our method is several orders of magnitude faster than the state-of-the-art methods with the same accuracy.en
dc.description.sponsorshipThe authors wish to thank the anonymous reviewers for their helpful feedback. In addition, the authors also wish to thank Mr. Yiyan Qi and Miss Xiaotong Ren for discussions. This work was supported in part by Army Research Office Contract W911NF-12-1-0385, and ARL under Cooperative Agreement W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the ARL, or the U.S. Government. The research presented in this paper is supported in part by National Natural Science Foundation of China (U1301254, 61603290, 61602371), the Ministry of Education & China Mobile Research Fund (MCM20160311), the Natural Science Foundation of Jiangsu Province (SBK2014021758), 111 International Collaboration Program of China, the Prospective Joint Research of Industry-Academia-Research Joint Innovation Funding of Jiangsu Province (BY2014074), Shenzhen Basic Research Grant (JCYJ20160229195940462), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034). Junzhou Zhao is the corresponding author.en
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.relation.urlhttp://ieeexplore.ieee.org/document/8051106/en
dc.rights(c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.en
dc.subjectElectronic mailen
dc.subjectgraph miningen
dc.subjectgraphlet kernelen
dc.subjectKernelen
dc.subjectMalwareen
dc.subjectProteinsen
dc.subjectSampling methodsen
dc.subjectsubgraph samplingen
dc.titleMOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphsen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalIEEE Transactions on Knowledge and Data Engineeringen
dc.eprint.versionPost-printen
dc.contributor.institutionMOE Key Laboratory for Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, Shaanxi Chinaen
dc.contributor.institutionComputer Science & Engineering, The Chinese University of Hong Kong, Shatin, Shatin/NT Hong Kong 00000en
dc.contributor.institutionHuawei Noah's Ark Lab, Hong Kong, n/a Hong Kongen
dc.contributor.institutionCloud Security Lab, Tencent, Shenzhen, Shenzhen Hong Kongen
dc.contributor.institutionComputer Science, University of Massachusetts, Amherst, Massachusetts United Statesen
dc.contributor.institutionNSKEYLAB, Xi'an Jiaotong University, 12480 Xi'an, Shaanxi Chinaen
dc.contributor.institutionSchool of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi Chinaen
kaust.authorZhang, Xiangliangen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.