Finding Community Structures In Social Activity Data

Handle URI:
http://hdl.handle.net/10754/554139
Title:
Finding Community Structures In Social Activity Data
Authors:
Peng, Chengbin ( 0000-0002-7445-2638 )
Abstract:
Social activity data sets are increasing in number and volume. Finding community structure in such data is valuable in many applications. For example, understand- ing the community structure of social networks may reduce the spread of epidemics or boost advertising revenue; discovering partitions in tra c networks can help to optimize routing and to reduce congestion; finding a group of users with common interests can allow a system to recommend useful items. Among many aspects, qual- ity of inference and e ciency in finding community structures in such data sets are of paramount concern. In this thesis, we propose several approaches to improve com- munity detection in these aspects. The first approach utilizes the concept of K-cores to reduce the size of the problem. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection. It first applies a traditional algorithm that is relatively slow to the K-core, and then uses a fast heuristic to infer community labels for the remaining nodes. The second approach is to scale the algorithm to multi-processor systems. We de- vise a scalable community detection algorithm for large networks based on stochastic block models. It is an alternating iterative algorithm using a maximum likelihood ap- proach. Compared with traditional inference algorithms for stochastic block models, our algorithm can scale to large networks and run on multi-processor systems. The time complexity is linear in the number of edges of the input network. The third approach is to improve the quality. We propose a framework for non- negative matrix factorization that allows the imposition of linear or approximately linear constraints on each factor. An example of the applications is to find community structures in bipartite networks, which is useful in recommender systems. Our algorithms are compared with the results in recent papers and their quality and e ciency are verified by experiments.
Advisors:
Keyes, David Elliot ( 0000-0002-4052-7224 )
Committee Member:
Moshkov, Mikhail ( 0000-0003-0085-9483 ) ; Zhang, Xiangliang ( 0000-0002-3574-5665 ) ; Ketcheson, David I. ( 0000-0002-1212-126X ) ; Wang, Suojin
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
19-May-2015
Type:
Dissertation
Appears in Collections:
Dissertations; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorKeyes, David Ellioten
dc.contributor.authorPeng, Chengbinen
dc.date.accessioned2015-05-19T06:55:53Zen
dc.date.available2015-05-19T06:55:53Zen
dc.date.issued2015-05-19en
dc.identifier.urihttp://hdl.handle.net/10754/554139en
dc.description.abstractSocial activity data sets are increasing in number and volume. Finding community structure in such data is valuable in many applications. For example, understand- ing the community structure of social networks may reduce the spread of epidemics or boost advertising revenue; discovering partitions in tra c networks can help to optimize routing and to reduce congestion; finding a group of users with common interests can allow a system to recommend useful items. Among many aspects, qual- ity of inference and e ciency in finding community structures in such data sets are of paramount concern. In this thesis, we propose several approaches to improve com- munity detection in these aspects. The first approach utilizes the concept of K-cores to reduce the size of the problem. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection. It first applies a traditional algorithm that is relatively slow to the K-core, and then uses a fast heuristic to infer community labels for the remaining nodes. The second approach is to scale the algorithm to multi-processor systems. We de- vise a scalable community detection algorithm for large networks based on stochastic block models. It is an alternating iterative algorithm using a maximum likelihood ap- proach. Compared with traditional inference algorithms for stochastic block models, our algorithm can scale to large networks and run on multi-processor systems. The time complexity is linear in the number of edges of the input network. The third approach is to improve the quality. We propose a framework for non- negative matrix factorization that allows the imposition of linear or approximately linear constraints on each factor. An example of the applications is to find community structures in bipartite networks, which is useful in recommender systems. Our algorithms are compared with the results in recent papers and their quality and e ciency are verified by experiments.en
dc.language.isoenen
dc.subjectCommunity Detectionen
dc.subjectScalabilityen
dc.subjectParallel Computingen
dc.subjectConstrained NMFen
dc.subjectSocial Networksen
dc.titleFinding Community Structures In Social Activity Dataen
dc.typeDissertationen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberMoshkov, Mikhailen
dc.contributor.committeememberZhang, Xiangliangen
dc.contributor.committeememberKetcheson, David I.en
dc.contributor.committeememberWang, Suojinen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameDoctor of Philosophyen
dc.person.id113325en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.