Analysis and Modeling of Social In uence in High Performance Computing Workloads

Handle URI:
http://hdl.handle.net/10754/209388
Title:
Analysis and Modeling of Social In uence in High Performance Computing Workloads
Authors:
Zheng, Shuai
Abstract:
High Performance Computing (HPC) is becoming a common tool in many research areas. Social influence (e.g., project collaboration) among increasing users of HPC systems creates bursty behavior in underlying workloads. This bursty behavior is increasingly common with the advent of grid computing and cloud computing. Mining the user bursty behavior is important for HPC workloads prediction and scheduling, which has direct impact on overall HPC computing performance. A representative work in this area is the Mixed User Group Model (MUGM), which clusters users according to the resource demand features of their submissions, such as duration time and parallelism. However, MUGM has some difficulties when implemented in real-world system. First, representing user behaviors by the features of their resource demand is usually difficult. Second, these features are not always available. Third, measuring the similarities among users is not a well-defined problem. In this work, we propose a Social Influence Model (SIM) to identify, analyze, and quantify the level of social influence across HPC users. The advantage of the SIM model is that it finds HPC communities by analyzing user job submission time, thereby avoiding the difficulties of MUGM. An offline algorithm and a fast-converging, computationally-efficient online learning algorithm for identifying social groups are proposed. Both offline and online algorithms are applied on several HPC and grid workloads, including Grid 5000, EGEE 2005 and 2007, and KAUST Supercomputing Lab (KSL) BGP data. From the experimental results, we show the existence of a social graph, which is characterized by a pattern of dominant users and followers. In order to evaluate the effectiveness of identified user groups, we show the pattern discovered by the offline algorithm follows a power-law distribution, which is consistent with those observed in mainstream social networks. We finally conclude the thesis and discuss future directions of our work.
Advisors:
Keyes, David E. ( 0000-0002-4052-7224 )
Committee Member:
Ahmadia, Aron ( 0000-0002-2573-2481 ) ; Zhang, Xiangliang ( 0000-0002-3574-5665 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
Jun-2011
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorKeyes, David E.en
dc.contributor.authorZheng, Shuaien
dc.date.accessioned2012-02-04T08:36:22Z-
dc.date.available2012-02-04T08:36:22Z-
dc.date.issued2011-06en
dc.identifier.urihttp://hdl.handle.net/10754/209388en
dc.description.abstractHigh Performance Computing (HPC) is becoming a common tool in many research areas. Social influence (e.g., project collaboration) among increasing users of HPC systems creates bursty behavior in underlying workloads. This bursty behavior is increasingly common with the advent of grid computing and cloud computing. Mining the user bursty behavior is important for HPC workloads prediction and scheduling, which has direct impact on overall HPC computing performance. A representative work in this area is the Mixed User Group Model (MUGM), which clusters users according to the resource demand features of their submissions, such as duration time and parallelism. However, MUGM has some difficulties when implemented in real-world system. First, representing user behaviors by the features of their resource demand is usually difficult. Second, these features are not always available. Third, measuring the similarities among users is not a well-defined problem. In this work, we propose a Social Influence Model (SIM) to identify, analyze, and quantify the level of social influence across HPC users. The advantage of the SIM model is that it finds HPC communities by analyzing user job submission time, thereby avoiding the difficulties of MUGM. An offline algorithm and a fast-converging, computationally-efficient online learning algorithm for identifying social groups are proposed. Both offline and online algorithms are applied on several HPC and grid workloads, including Grid 5000, EGEE 2005 and 2007, and KAUST Supercomputing Lab (KSL) BGP data. From the experimental results, we show the existence of a social graph, which is characterized by a pattern of dominant users and followers. In order to evaluate the effectiveness of identified user groups, we show the pattern discovered by the offline algorithm follows a power-law distribution, which is consistent with those observed in mainstream social networks. We finally conclude the thesis and discuss future directions of our work.en
dc.language.isoenen
dc.titleAnalysis and Modeling of Social In uence in High Performance Computing Workloadsen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberAhmadia, Aronen
dc.contributor.committeememberZhang, Xiangliangen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.