Practical characterization of large networks using neighborhood information

Handle URI:
http://hdl.handle.net/10754/627275
Title:
Practical characterization of large networks using neighborhood information
Authors:
Wang, Pinghui; Zhao, Junzhou; Ribeiro, Bruno; Lui, John C. S.; Towsley, Don; Guan, Xiaohong
Abstract:
Characterizing large complex networks such as online social networks through node querying is a challenging task. Network service providers often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network of interest. Various ad hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for large networks where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web APIs available on most major large networks for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than state-of-the-art methods. We also explore methods to uncover shortest paths between a subset of nodes and detect high degree nodes by sampling only a small fraction of the network of interest. Our results demonstrate that utilizing neighborhood information yields methods that are two orders of magnitude faster than state-of-the-art methods.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Wang P, Zhao J, Ribeiro B, Lui JCS, Towsley D, et al. (2018) Practical characterization of large networks using neighborhood information. Knowledge and Information Systems. Available: http://dx.doi.org/10.1007/s10115-018-1167-0.
Publisher:
Springer Nature
Journal:
Knowledge and Information Systems
Issue Date:
14-Feb-2018
DOI:
10.1007/s10115-018-1167-0
Type:
Article
ISSN:
0219-1377; 0219-3116
Sponsors:
The authors wish to thank the anonymous reviewers for their helpful feedback. This work was supported in part by Army Research Office Contract W911NF-12-1-0385, and ARL under Cooperative Agreement W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the ARL, or the U.S. Government. The work was also supported in part by National Natural Science Foundation of China (61603290, 61602371, U1301254), Ministry of Education & China Mobile Research Fund (MCM20160311), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Zhejiang Province of China (LGG18F020016), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034, 2017JM6095), Shenzhen Basic Research Grant (JCYJ20160229195940462).
Additional Links:
http://link.springer.com/article/10.1007/s10115-018-1167-0
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorWang, Pinghuien
dc.contributor.authorZhao, Junzhouen
dc.contributor.authorRibeiro, Brunoen
dc.contributor.authorLui, John C. S.en
dc.contributor.authorTowsley, Donen
dc.contributor.authorGuan, Xiaohongen
dc.date.accessioned2018-03-11T06:54:14Z-
dc.date.available2018-03-11T06:54:14Z-
dc.date.issued2018-02-14en
dc.identifier.citationWang P, Zhao J, Ribeiro B, Lui JCS, Towsley D, et al. (2018) Practical characterization of large networks using neighborhood information. Knowledge and Information Systems. Available: http://dx.doi.org/10.1007/s10115-018-1167-0.en
dc.identifier.issn0219-1377en
dc.identifier.issn0219-3116en
dc.identifier.doi10.1007/s10115-018-1167-0en
dc.identifier.urihttp://hdl.handle.net/10754/627275-
dc.description.abstractCharacterizing large complex networks such as online social networks through node querying is a challenging task. Network service providers often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network of interest. Various ad hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for large networks where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web APIs available on most major large networks for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than state-of-the-art methods. We also explore methods to uncover shortest paths between a subset of nodes and detect high degree nodes by sampling only a small fraction of the network of interest. Our results demonstrate that utilizing neighborhood information yields methods that are two orders of magnitude faster than state-of-the-art methods.en
dc.description.sponsorshipThe authors wish to thank the anonymous reviewers for their helpful feedback. This work was supported in part by Army Research Office Contract W911NF-12-1-0385, and ARL under Cooperative Agreement W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the ARL, or the U.S. Government. The work was also supported in part by National Natural Science Foundation of China (61603290, 61602371, U1301254), Ministry of Education & China Mobile Research Fund (MCM20160311), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Zhejiang Province of China (LGG18F020016), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034, 2017JM6095), Shenzhen Basic Research Grant (JCYJ20160229195940462).en
dc.publisherSpringer Natureen
dc.relation.urlhttp://link.springer.com/article/10.1007/s10115-018-1167-0en
dc.subjectCrawlingen
dc.subjectGraph samplingen
dc.subjectOnline social networken
dc.subjectRandom walken
dc.titlePractical characterization of large networks using neighborhood informationen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalKnowledge and Information Systemsen
dc.contributor.institutionShenzhen Research Institute of Xi’an Jiaotong University, Shenzhen, Chinaen
dc.contributor.institutionMOE Key Laboratory for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, Chinaen
dc.contributor.institutionSchool of Computer Science, Purdue University, West Lafayette, USAen
dc.contributor.institutionDepartment of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kongen
dc.contributor.institutionDepartment of Computer Science, University of Massachusetts Amherst, Amherst, USAen
dc.contributor.institutionCenter for Intelligent and Networked Systems, Tsinghua University, Beijing, Chinaen
kaust.authorZhao, Junzhouen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.