Show simple item record

dc.contributor.advisorKalnis, Panos
dc.contributor.authorAbdelhamid, Ehab
dc.date.accessioned2017-06-19T07:47:13Z
dc.date.available2019-07-06T00:00:00Z
dc.date.issued2017-06-19
dc.identifier.citationAbdelhamid, E. (2017). Scalable Frequent Subgraph Mining. KAUST Research Repository. https://doi.org/10.25781/KAUST-M007S
dc.identifier.doi10.25781/KAUST-M007S
dc.identifier.urihttp://hdl.handle.net/10754/625049
dc.description.abstractA graph is a data structure that contains a set of nodes and a set of edges connecting these nodes. Nodes represent objects while edges model relationships among these objects. Graphs are used in various domains due to their ability to model complex relations among several objects. Given an input graph, the Frequent Subgraph Mining (FSM) task finds all subgraphs with frequencies exceeding a given threshold. FSM is crucial for graph analysis, and it is an essential building block in a variety of applications, such as graph clustering and indexing. FSM is computationally expensive, and its existing solutions are extremely slow. Consequently, these solutions are incapable of mining modern large graphs. This slowness is caused by the underlying approaches of these solutions which require finding and storing an excessive amount of subgraph matches. This dissertation proposes a scalable solution for FSM that avoids the limitations of previous work. This solution is composed of four components. The first component is a single-threaded technique which, for each candidate subgraph, needs to find only a minimal number of matches. The second component is a scalable parallel FSM technique that utilizes a novel two-phase approach. The first phase quickly builds an approximate search space, which is then used by the second phase to optimize and balance the workload of the FSM task. The third component focuses on accelerating frequency evaluation, which is a critical step in FSM. To do so, a machine learning model is employed to predict the type of each graph node, and accordingly, an optimized method is selected to evaluate that node. The fourth component focuses on mining dynamic graphs, such as social networks. To this end, an incremental index is maintained during the dynamic updates. Only this index is processed and updated for the majority of graph updates. Consequently, search space is significantly pruned and efficiency is improved. The empirical evaluation shows that the proposed components significantly outperform existing solutions, scale to a large number of processors and process graphs that previous techniques cannot handle, such as large and dynamic graphs.
dc.language.isoen
dc.subjectgraph
dc.subjectparallel processing
dc.subjectFrequent subgraph mining
dc.subjectincremental indexing
dc.titleScalable Frequent Subgraph Mining
dc.typeDissertation
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.rights.embargodate2019-07-06
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberGomes, Diogo A.
dc.contributor.committeememberGao, Xin
dc.contributor.committeememberZaki, Mohammed J.
thesis.degree.disciplineComputer Science
thesis.degree.nameDoctor of Philosophy
dc.rights.accessrightsAt the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation became available to the public after the expiration of the embargo on 2019-07-06.
refterms.dateFOA2019-07-06T00:00:00Z


Files in this item

Thumbnail
Name:
Thesis[1].pdf
Size:
2.560Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record