Mizan: Optimizing Graph Mining in Large Parallel Systems

Handle URI:
http://hdl.handle.net/10754/217609
Title:
Mizan: Optimizing Graph Mining in Large Parallel Systems
Authors:
Kalnis, Panos ( 0000-0002-5060-1360 ) ; Awara, Karim; Jamjoom, Hani; Khayyat, Zuhair
Abstract:
Extracting information from graphs, from nding shortest paths to complex graph mining, is essential for many ap- plications. Due to the shear size of modern graphs (e.g., social networks), processing must be done on large paral- lel computing infrastructures (e.g., the cloud). Earlier ap- proaches relied on the MapReduce framework, which was proved inadequate for graph algorithms. More recently, the message passing model (e.g., Pregel) has emerged. Although the Pregel model has many advantages, it is agnostic to the graph properties and the architecture of the underlying com- puting infrastructure, leading to suboptimal performance. In this paper, we propose Mizan, a layer between the users' code and the computing infrastructure. Mizan considers the structure of the input graph and the architecture of the in- frastructure in order to: (i) decide whether it is bene cial to generate a near-optimal partitioning of the graph in a pre- processing step, and (ii) choose between typical point-to- point message passing and a novel approach that puts com- puting nodes in a virtual overlay ring. We deployed Mizan on a small local Linux cluster, on the cloud (256 virtual machines in Amazon EC2), and on an IBM Blue Gene/P supercomputer (1024 CPUs). We show that Mizan executes common algorithms on very large graphs 1-2 orders of mag- nitude faster than MapReduce-based implementations and up to one order of magnitude faster than implementations relying on Pregel-like hash-based graph partitioning.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Publisher:
King Abdullah University of Science and Technology
Issue Date:
Mar-2012
Type:
Technical Report
Appears in Collections:
Technical Reports; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorKalnis, Panosen
dc.contributor.authorAwara, Karimen
dc.contributor.authorJamjoom, Hanien
dc.contributor.authorKhayyat, Zuhairen
dc.date.accessioned2012-04-04T06:33:27Z-
dc.date.available2012-04-04T06:33:27Z-
dc.date.issued2012-03en
dc.identifier.urihttp://hdl.handle.net/10754/217609en
dc.description.abstractExtracting information from graphs, from nding shortest paths to complex graph mining, is essential for many ap- plications. Due to the shear size of modern graphs (e.g., social networks), processing must be done on large paral- lel computing infrastructures (e.g., the cloud). Earlier ap- proaches relied on the MapReduce framework, which was proved inadequate for graph algorithms. More recently, the message passing model (e.g., Pregel) has emerged. Although the Pregel model has many advantages, it is agnostic to the graph properties and the architecture of the underlying com- puting infrastructure, leading to suboptimal performance. In this paper, we propose Mizan, a layer between the users' code and the computing infrastructure. Mizan considers the structure of the input graph and the architecture of the in- frastructure in order to: (i) decide whether it is bene cial to generate a near-optimal partitioning of the graph in a pre- processing step, and (ii) choose between typical point-to- point message passing and a novel approach that puts com- puting nodes in a virtual overlay ring. We deployed Mizan on a small local Linux cluster, on the cloud (256 virtual machines in Amazon EC2), and on an IBM Blue Gene/P supercomputer (1024 CPUs). We show that Mizan executes common algorithms on very large graphs 1-2 orders of mag- nitude faster than MapReduce-based implementations and up to one order of magnitude faster than implementations relying on Pregel-like hash-based graph partitioning.en
dc.language.isoenen
dc.publisherKing Abdullah University of Science and Technologyen
dc.titleMizan: Optimizing Graph Mining in Large Parallel Systemsen
dc.typeTechnical Reporten
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.