Theses and Dissertations
http://hdl.handle.net/10754/124545
2020-10-26T10:22:45ZLearning from Scholarly Attributed Graphs for Scientific Discovery
http://hdl.handle.net/10754/665605
Learning from Scholarly Attributed Graphs for Scientific Discovery
Akujuobi, Uchenna Thankgod
Research and experimentation in various scientific fields are based on the knowledge and ideas from scholarly literature. The advancement of research and development has, thus, strengthened the importance of literary analysis and understanding. However, in recent years, researchers have been facing massive scholarly documents published at an exponentially increasing rate. Analyzing this vast number of publications is far beyond the capability of individual researchers.
This dissertation is motivated by the need for large scale analyses of the exploding number of scholarly literature for scientific knowledge discovery. In the first part of this dissertation, the interdependencies between scholarly literature are studied. First, I develop Delve â€“ a data-driven search engine supported by our designed semi-supervised edge classification method. This system enables users to search and analyze the relationship between datasets and scholarly literature. Based on the Delve system, I propose to study information extraction as a node classification problem in attributed networks. Specifically, if we can learn the research topics of documents (nodes in a network), we can aggregate documents by topics and retrieve information specific to each topic (e.g., top-k popular datasets).
Node classification in attributed networks has several challenges: a limited number of labeled nodes, effective fusion of topological structure and node/edge attributes, and the co-existence of multiple labels for one node. Existing node classification approaches can only address or partially address a few of these challenges. This dissertation addresses these challenges by proposing semi-supervised multi-class/multi-label node classification models to integrate node/edge attributes and topological relationships.
The second part of this dissertation examines the problem of analyzing the interdependencies between terms in scholarly literature. I present two algorithms for the automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, drugs, and genes extracted from databases of biomedical publications. The automatic hypothesis generation problem is modeled as a future connectivity prediction in a dynamic attributed graph. The key is to capture the temporal evolution of node-pair (term-pair) relations. Experiment results and case study analyses highlight the effectiveness of the proposed algorithms compared to the baselinesâ€™ extension.
2020-10-18T00:00:00ZDynamic Programming Multi-Objective Combinatorial Optimization
http://hdl.handle.net/10754/665627
Dynamic Programming Multi-Objective Combinatorial Optimization
Mankowski, Michal
In this dissertation, we consider extensions of dynamic programming for combinatorial
optimization. We introduce two exact multi-objective optimization algorithms:
the multi-stage optimization algorithm that optimizes the problem relative to the
ordered sequence of objectives (lexicographic optimization) and the bi-criteria optimization
algorithm that simultaneously optimizes the problem relative to two objectives
(Pareto optimization). We also introduce a counting algorithm to count optimal
solution before and after every optimization stage of multi-stage optimization.
We propose a fairly universal approach based on so-called circuits without repetitions
in which each element is generated exactly one time. Such circuits represent the
sets of elements under consideration (the sets of feasible solutions) and are used by
counting, multi-stage, and bi-criteria optimization algorithms. For a given optimization
problem, we should describe an appropriate circuit and cost functions. Then, we
can use the designed algorithms for which we already have proofs of their correctness
and ways to evaluate the required number of operations and the time.
We construct conventional (which work directly with elements) circuits without
repetitions for matrix chain multiplication, global sequence alignment, optimal paths
in directed graphs, binary search trees, convex polygon triangulation, line breaking
(text justi cation), one-dimensional clustering, optimal bitonic tour, and segmented
least squares. For these problems, we evaluate the number of operations and the
time required by the optimization and counting algorithms, and consider the results
of computational experiments.
If we cannot nd a conventional circuit without repetitions for a problem, we can
either create custom algorithms for optimization and counting from scratch or can transform a circuit with repetitions into a so-called syntactical circuit, which is a circuit
without repetitions that works not with elements but with formulas representing
these elements. We apply both approaches to the optimization of matchings in trees
and apply the second approach to the 0/1 knapsack problem.
We also brie
y introduce our work in operation research with applications to
health care. This work extends our interest in the optimization eld from developing
new methods included in this dissertation towards the practical application.
2020-10-18T00:00:00ZA Real-Time Monitoring of Fluids Properties in Tubular Architectures
http://hdl.handle.net/10754/665532
A Real-Time Monitoring of Fluids Properties in Tubular Architectures
Nour, Maha A.
Real-time monitoring of fluid properties in tubular systems, such as viscosity, flow rate, and pressure, is essential for industries utilizing the liquid medium. Today such fluid characteristics are studied off-line using laboratory facilities that can provide accurate results. Nonetheless, it is inadequate to match the pace demanded by the industries. Therefore, off-line measurements are slow and ineffective. On the other hand, commercially available real-time monitoring sensors for fluid properties are generally large and bulky, generating considerable pressure reduction and energy loss in tubular systems. Furthermore, they produce significant and persistent damage to the tubular systems during the installation process because of their bulkiness. To address these challenges, industries have realigned their attention on non-destructive testing and noninvasive methodologies installed on the outer tubular surface to avoid flow disturbance and shutting systems for installations. Although, such monitoring sensors showed greater performance in monitoring and inspecting pipe health conditions, they are not effective for monitoring the properties of the fluids. It is limited to flowmeter applications and does not include fluid characteristics such as viscometers. Therefore, developing a convenient real-time integrated sensory system for monitoring different fluid properties in a tubular system is critical.
In this dissertation, a fully compliant compact sensory system is designed, developed, examined and optimized for monitoring fluid properties in tubular architectures. The proposed sensor system consists of a physically flexible platform connected to the inner surface of tubes to adopt the different diameters and curvature shapes with unnoticeable flow disruption. Also, it utilizes the microchannel bridge to serve in the macro application inside pipe systems. It has an array of pressure sensors located bellow the microchannel as the primary measurement unit for the device. The dissertation is supported by simulation and modeling for a deeper understanding of the system behavior. In the last stage, the sensory module is integrated with electronics for a fully compliant stand-alone system.
2020-10-01T00:00:00ZMaximizing I/O Bandwidth for Out-of-Core HPC Applications on Homogeneous and Heterogeneous Large-Scale Systems
http://hdl.handle.net/10754/665396
Maximizing I/O Bandwidth for Out-of-Core HPC Applications on Homogeneous and Heterogeneous Large-Scale Systems
Alturkestani, Tariq
Out-of-Core simulation systems often produce a massive amount of data that cannot
t on the aggregate fast memory of the compute nodes, and they also require to
read back these data for computation. As a result, I/O data movement can be a
bottleneck in large-scale simulations. Advances in memory architecture have made
it feasible and a ordable to integrate hierarchical storage media on large-scale systems,
starting from the traditional Parallel File Systems (PFSs) to intermediate fast
disk technologies (e.g., node-local and remote-shared NVMe and SSD-based Burst
Bu ers) and up to CPU main memory and GPU High Bandwidth Memory (HBM).
However, while adding additional and faster storage media increases I/O bandwidth,
it pressures the CPU, as it becomes responsible for managing and moving data between
these layers of storage. Simulation systems are thus vulnerable to being blocked
by I/O operations. The Multilayer Bu er System (MLBS) proposed in this research
demonstrates a general and versatile method for overlapping I/O with computation
that helps to ameliorate the strain on the processors through asynchronous access.
The main idea consists in decoupling I/O operations from computational phases using
dedicated hardware resources to perform expensive context switches. MLBS monitors
I/O tra c in each storage layer allowing fair utilization of shared resources. By
continually prefetching up and down across all hardware layers of the memory and
storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated
applications and shifts it closer to a memory-bound or compute-bound regime. The evaluation on the Cray XC40 Shaheen-2 supercomputer for a representative I/Obound
application, seismic inversion, shows that MLBS outperforms state-of-the-art
PFSs, i.e., Lustre, Data Elevator and DataWarp by 6.06X, 2.23X, and 1.90X, respectively.
On the IBM-built Summit supercomputer, using 2048 compute nodes equipped
with a total of 12288 GPUs, MLBS achieves up to 1.4X performance speedup compared
to the reference PFS-based implementation. MLBS is also demonstrated on
applications from cosmology, combustion, and a classic out-of-core computational
physics and linear algebra routines.
2020-09-30T00:00:00Z