Link Label Prediction in Signed Citation Network

Handle URI:
http://hdl.handle.net/10754/607760
Title:
Link Label Prediction in Signed Citation Network
Authors:
Akujuobi, Uchenna ( 0000-0002-7102-7994 )
Abstract:
Link label prediction is the problem of predicting the missing labels or signs of all the unlabeled edges in a network. For signed networks, these labels can either be positive or negative. In recent years, different algorithms have been proposed such as using regression, trust propagation and matrix factorization. These approaches have tried to solve the problem of link label prediction by using ideas from social theories, where most of them predict a single missing label given that labels of other edges are known. However, in most real-world social graphs, the number of labeled edges is usually less than that of unlabeled edges. Therefore, predicting a single edge label at a time would require multiple runs and is more computationally demanding. In this thesis, we look at link label prediction problem on a signed citation network with missing edge labels. Our citation network consists of papers from three major machine learning and data mining conferences together with their references, and edges showing the relationship between them. An edge in our network is labeled either positive (dataset relevant) if the reference is based on the dataset used in the paper or negative otherwise. We present three approaches to predict the missing labels. The first approach converts the label prediction problem into a standard classification problem. We then, generate a set of features for each edge and then adopt Support Vector Machines in solving the classification problem. For the second approach, we formalize the graph such that the edges are represented as nodes with links showing similarities between them. We then adopt a label propagation method to propagate the labels on known nodes to those with unknown labels. In the third approach, we adopt a PageRank approach where we rank the nodes according to the number of incoming positive and negative edges, after which we set a threshold. Based on the ranks, we can infer an edge would be positive if it goes a node above the threshold. Experimental results on our citation network corroborate the efficacy of these approaches. With each edge having a label, we also performed additional network analysis where we extracted a subnetwork of the dataset relevant edges and nodes in our citation network, and then detected different communities from this extracted sub-network. To understand the different detected communities, we performed a case study on several dataset communities. The study shows a relationship between the major topic areas in a dataset community and the data sources in the community.
Advisors:
Zhang, Xiangliang ( 0000-0002-3574-5665 )
Committee Member:
Moshkov, Mikhail ( 0000-0003-0085-9483 ) ; Gao, Xin ( 0000-0002-7108-3574 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science
Program:
Computer Science
Issue Date:
12-Apr-2016
Type:
Thesis
Appears in Collections:
Theses; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorZhang, Xiangliangen
dc.contributor.authorAkujuobi, Uchennaen
dc.date.accessioned2016-05-03T12:40:19Zen
dc.date.available2016-05-03T12:40:19Zen
dc.date.issued2016-04-12en
dc.identifier.urihttp://hdl.handle.net/10754/607760en
dc.description.abstractLink label prediction is the problem of predicting the missing labels or signs of all the unlabeled edges in a network. For signed networks, these labels can either be positive or negative. In recent years, different algorithms have been proposed such as using regression, trust propagation and matrix factorization. These approaches have tried to solve the problem of link label prediction by using ideas from social theories, where most of them predict a single missing label given that labels of other edges are known. However, in most real-world social graphs, the number of labeled edges is usually less than that of unlabeled edges. Therefore, predicting a single edge label at a time would require multiple runs and is more computationally demanding. In this thesis, we look at link label prediction problem on a signed citation network with missing edge labels. Our citation network consists of papers from three major machine learning and data mining conferences together with their references, and edges showing the relationship between them. An edge in our network is labeled either positive (dataset relevant) if the reference is based on the dataset used in the paper or negative otherwise. We present three approaches to predict the missing labels. The first approach converts the label prediction problem into a standard classification problem. We then, generate a set of features for each edge and then adopt Support Vector Machines in solving the classification problem. For the second approach, we formalize the graph such that the edges are represented as nodes with links showing similarities between them. We then adopt a label propagation method to propagate the labels on known nodes to those with unknown labels. In the third approach, we adopt a PageRank approach where we rank the nodes according to the number of incoming positive and negative edges, after which we set a threshold. Based on the ranks, we can infer an edge would be positive if it goes a node above the threshold. Experimental results on our citation network corroborate the efficacy of these approaches. With each edge having a label, we also performed additional network analysis where we extracted a subnetwork of the dataset relevant edges and nodes in our citation network, and then detected different communities from this extracted sub-network. To understand the different detected communities, we performed a case study on several dataset communities. The study shows a relationship between the major topic areas in a dataset community and the data sources in the community.en
dc.language.isoenen
dc.subjectlink label predictionen
dc.subjectcitation networken
dc.subjectsigned networken
dc.subjectlabel propagationen
dc.subjectpageRanken
dc.subjectSVMen
dc.titleLink Label Prediction in Signed Citation Networken
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Scienceen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberMoshkov, Mikhailen
dc.contributor.committeememberGao, Xinen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id133261en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.