Type
DissertationAuthors
Abdelhamid, Ehab
Advisors
Kalnis, Panos
Committee members
Gomes, Diogo A.
Gao, Xin

Zaki, Mohammed J.
Program
Computer ScienceDate
2017-06-19Embargo End Date
2019-07-06Permanent link to this record
http://hdl.handle.net/10754/625049
Metadata
Show full item recordAccess Restrictions
At the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation became available to the public after the expiration of the embargo on 2019-07-06.Abstract
A graph is a data structure that contains a set of nodes and a set of edges connecting these nodes. Nodes represent objects while edges model relationships among these objects. Graphs are used in various domains due to their ability to model complex relations among several objects. Given an input graph, the Frequent Subgraph Mining (FSM) task finds all subgraphs with frequencies exceeding a given threshold. FSM is crucial for graph analysis, and it is an essential building block in a variety of applications, such as graph clustering and indexing. FSM is computationally expensive, and its existing solutions are extremely slow. Consequently, these solutions are incapable of mining modern large graphs. This slowness is caused by the underlying approaches of these solutions which require finding and storing an excessive amount of subgraph matches. This dissertation proposes a scalable solution for FSM that avoids the limitations of previous work. This solution is composed of four components. The first component is a single-threaded technique which, for each candidate subgraph, needs to find only a minimal number of matches. The second component is a scalable parallel FSM technique that utilizes a novel two-phase approach. The first phase quickly builds an approximate search space, which is then used by the second phase to optimize and balance the workload of the FSM task. The third component focuses on accelerating frequency evaluation, which is a critical step in FSM. To do so, a machine learning model is employed to predict the type of each graph node, and accordingly, an optimized method is selected to evaluate that node. The fourth component focuses on mining dynamic graphs, such as social networks. To this end, an incremental index is maintained during the dynamic updates. Only this index is processed and updated for the majority of graph updates. Consequently, search space is significantly pruned and efficiency is improved. The empirical evaluation shows that the proposed components significantly outperform existing solutions, scale to a large number of processors and process graphs that previous techniques cannot handle, such as large and dynamic graphs.Citation
Abdelhamid, E. (2017). Scalable Frequent Subgraph Mining. KAUST Research Repository. https://doi.org/10.25781/KAUST-M007Sae974a485f413a2113503eed53cd6c53
10.25781/KAUST-M007S