Learning from Scholarly Attributed Graphs for Scientific Discovery
Type
DissertationAuthors
Akujuobi, Uchenna Thankgod
Advisors
Zhang, Xiangliang
Committee members
Moshkov, Mikhail
Hoehndorf, Robert

Zhang, Min

Program
Computer ScienceDate
2020-10-18Permanent link to this record
http://hdl.handle.net/10754/665605
Metadata
Show full item recordAbstract
Research and experimentation in various scientific fields are based on the knowledge and ideas from scholarly literature. The advancement of research and development has, thus, strengthened the importance of literary analysis and understanding. However, in recent years, researchers have been facing massive scholarly documents published at an exponentially increasing rate. Analyzing this vast number of publications is far beyond the capability of individual researchers. This dissertation is motivated by the need for large scale analyses of the exploding number of scholarly literature for scientific knowledge discovery. In the first part of this dissertation, the interdependencies between scholarly literature are studied. First, I develop Delve – a data-driven search engine supported by our designed semi-supervised edge classification method. This system enables users to search and analyze the relationship between datasets and scholarly literature. Based on the Delve system, I propose to study information extraction as a node classification problem in attributed networks. Specifically, if we can learn the research topics of documents (nodes in a network), we can aggregate documents by topics and retrieve information specific to each topic (e.g., top-k popular datasets). Node classification in attributed networks has several challenges: a limited number of labeled nodes, effective fusion of topological structure and node/edge attributes, and the co-existence of multiple labels for one node. Existing node classification approaches can only address or partially address a few of these challenges. This dissertation addresses these challenges by proposing semi-supervised multi-class/multi-label node classification models to integrate node/edge attributes and topological relationships. The second part of this dissertation examines the problem of analyzing the interdependencies between terms in scholarly literature. I present two algorithms for the automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, drugs, and genes extracted from databases of biomedical publications. The automatic hypothesis generation problem is modeled as a future connectivity prediction in a dynamic attributed graph. The key is to capture the temporal evolution of node-pair (term-pair) relations. Experiment results and case study analyses highlight the effectiveness of the proposed algorithms compared to the baselines’ extension.Citation
Akujuobi, U. T. (2020). Learning from Scholarly Attributed Graphs for Scientific Discovery. KAUST Research Repository. https://doi.org/10.25781/KAUST-6IIR9ae974a485f413a2113503eed53cd6c53
10.25781/KAUST-6IIR9
Scopus Count
Related items
Showing items related by title, author, creator and subject.
-
Representation learning with deep extreme learning machines for efficient image set classificationUzair, Muhammad; Shafait, Faisal; Ghanem, Bernard; Mian, Ajmal (Neural Computing and Applications, Springer Nature, 2016-12-09) [Article]Efficient and accurate representation of a collection of images, that belong to the same class, is a major research challenge for practical image set classification. Existing methods either make prior assumptions about the data structure, or perform heavy computations to learn structure from the data itself. In this paper, we propose an efficient image set representation that does not make any prior assumptions about the structure of the underlying data. We learn the nonlinear structure of image sets with deep extreme learning machines that are very efficient and generalize well even on a limited number of training samples. Extensive experiments on a broad range of public datasets for image set classification show that the proposed algorithm consistently outperforms state-of-the-art image set classification methods both in terms of speed and accuracy.
-
SupportNet: a novel incremental learning framework through deep learning and support dataLi, Yu; Li, Zhongxiao; Ding, Lizhong; Hu, Yuhui; Chen,Wei; Gao, Xin (Cold Spring Harbor Laboratory, 2018-05-08) [Preprint]Motivation: In most biological data sets, the amount of data is regularly growing and the number of classes is continuously increasing. To deal with the new data from the new classes, one approach is to train a classification model, e.g., a deep learning model, from scratch based on both old and new data. This approach is highly computationally costly and the extracted features are likely very different from the ones extracted by the model trained on the old data alone, which leads to poor model robustness. Another approach is to fine tune the trained model from the old data on the new data. However, this approach often does not have the ability to learn new knowledge without forgetting the previously learned knowledge, which is known as the catastrophic forgetting problem. To our knowledge, this problem has not been studied in the field of bioinformatics despite its existence in many bioinformatic problems. Results: Here we propose a novel method, SupportNet, to solve the catastrophic forgetting problem efficiently and effectively. SupportNet combines the strength of deep learning and support vector machine (SVM), where SVM is used to identify the support data from the old data, which are fed to the deep learning model together with the new data for further training so that the model can review the essential information of the old data when learning the new information. Two powerful consolidation regularizers are applied to ensure the robustness of the learned model. Comprehensive experiments on various tasks, including enzyme function prediction, subcellular structure classification and breast tumor classification, show that SupportNet drastically outperforms the state-of-the-art incremental learning methods and reaches similar performance as the deep learning model trained from scratch on both old and new data. Availability: Our program is accessible at: \url{https://github.com/lykaust15/SupportNet}.
-
Learning To Stop While Learning To PredictChen, Xinshi; Dai, Hanjun; Li, Yu; Gao, Xin; Song, Le (2020) [Conference Paper]There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping criteria for outputting results at different iterations, many algorithm-inspired deep models are restricted to a "fixed-depth" for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances, either to avoid "over-thinking", or because we want to compute less for operations converged already. In this paper, we tackle this varying depth problem using a steerable architecture, where a feed-forward deep model and a variational stopping policy are learned together to sequentially determine the optimal number of layers for each input instance. Training such architecture is very challenging. We provide a variational Bayes perspective and design a novel and effective training procedure which decomposes the task into an oracle model learning stage and an imitation stage. Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks.