### Recent Submissions

• #### Learning from Scholarly Attributed Graphs for Scientific Discovery

(2020-10-18) [Dissertation]
Committee members: Moshkov, Mikhail; Hoehndorf, Robert; Zhang, Min
Research and experimentation in various scientific fields are based on the knowledge and ideas from scholarly literature. The advancement of research and development has, thus, strengthened the importance of literary analysis and understanding. However, in recent years, researchers have been facing massive scholarly documents published at an exponentially increasing rate. Analyzing this vast number of publications is far beyond the capability of individual researchers. This dissertation is motivated by the need for large scale analyses of the exploding number of scholarly literature for scientific knowledge discovery. In the first part of this dissertation, the interdependencies between scholarly literature are studied. First, I develop Delve – a data-driven search engine supported by our designed semi-supervised edge classification method. This system enables users to search and analyze the relationship between datasets and scholarly literature. Based on the Delve system, I propose to study information extraction as a node classification problem in attributed networks. Specifically, if we can learn the research topics of documents (nodes in a network), we can aggregate documents by topics and retrieve information specific to each topic (e.g., top-k popular datasets). Node classification in attributed networks has several challenges: a limited number of labeled nodes, effective fusion of topological structure and node/edge attributes, and the co-existence of multiple labels for one node. Existing node classification approaches can only address or partially address a few of these challenges. This dissertation addresses these challenges by proposing semi-supervised multi-class/multi-label node classification models to integrate node/edge attributes and topological relationships. The second part of this dissertation examines the problem of analyzing the interdependencies between terms in scholarly literature. I present two algorithms for the automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, drugs, and genes extracted from databases of biomedical publications. The automatic hypothesis generation problem is modeled as a future connectivity prediction in a dynamic attributed graph. The key is to capture the temporal evolution of node-pair (term-pair) relations. Experiment results and case study analyses highlight the effectiveness of the proposed algorithms compared to the baselines’ extension.
• #### Dynamic Programming Multi-Objective Combinatorial Optimization

(2020-10-18) [Dissertation]
Committee members: Keyes, David E.; Shihada, Basem; Boros, Endre
In this dissertation, we consider extensions of dynamic programming for combinatorial optimization. We introduce two exact multi-objective optimization algorithms: the multi-stage optimization algorithm that optimizes the problem relative to the ordered sequence of objectives (lexicographic optimization) and the bi-criteria optimization algorithm that simultaneously optimizes the problem relative to two objectives (Pareto optimization). We also introduce a counting algorithm to count optimal solution before and after every optimization stage of multi-stage optimization. We propose a fairly universal approach based on so-called circuits without repetitions in which each element is generated exactly one time. Such circuits represent the sets of elements under consideration (the sets of feasible solutions) and are used by counting, multi-stage, and bi-criteria optimization algorithms. For a given optimization problem, we should describe an appropriate circuit and cost functions. Then, we can use the designed algorithms for which we already have proofs of their correctness and ways to evaluate the required number of operations and the time. We construct conventional (which work directly with elements) circuits without repetitions for matrix chain multiplication, global sequence alignment, optimal paths in directed graphs, binary search trees, convex polygon triangulation, line breaking (text justi cation), one-dimensional clustering, optimal bitonic tour, and segmented least squares. For these problems, we evaluate the number of operations and the time required by the optimization and counting algorithms, and consider the results of computational experiments. If we cannot nd a conventional circuit without repetitions for a problem, we can either create custom algorithms for optimization and counting from scratch or can transform a circuit with repetitions into a so-called syntactical circuit, which is a circuit without repetitions that works not with elements but with formulas representing these elements. We apply both approaches to the optimization of matchings in trees and apply the second approach to the 0/1 knapsack problem. We also brie y introduce our work in operation research with applications to health care. This work extends our interest in the optimization eld from developing new methods included in this dissertation towards the practical application.
• #### Guest Editorial Special Issue on “Wireless Networks Empowered by Reconfigurable Intelligent Surfaces”

(IEEE Journal on Selected Areas in Communications, Institute of Electrical and Electronics Engineers (IEEE), 2020-10-16) [Article]
Future wireless networks will be as pervasive as the air we breathe, not only connecting us but embracing us through a web of systems that support personal and societal well-being. That is, the ubiquity, speed and low latency of such networks will allow currently disparate devices and services to become a distributed intelligent communications, sensing, and computing platform.
• #### Survey of energy-autonomous solar cell receivers for satellite–air–ground–ocean optical wireless communication

(Progress in Quantum Electronics, Elsevier BV, 2020-10-14) [Article]
With the advent of the Internet of Things, energy- and bandwidth-related issues are becoming increasingly prominent in the context of supporting the massive connectivity of various smart devices. To this end, we propose that solar cells with the dual functions of energy harvesting and signal acquisition are critical for alleviating energy-related issues and enabling optical wireless communication (OWC) across the satellite–air–ground–ocean (SAGO) boundaries. Moreover, we present the first comprehensive survey on solar cell-based OWC technology. First, the historical evolution of this technology is summarized, from its beginnings to recent advances, to provide the relative merits of a variety of solar cells for simultaneous energy harvesting and OWC in different application scenarios. Second, the performance metrics, circuit design, and architectural design for energy-autonomous solar cell receivers are provided to help understand the basic principles of this technology. Finally, with a view to its future application to SAGO communication networks, we note the challenges and future trends of research related to this technology in terms of channel characterization, light source development, photodetector development, modulation and multiplexing techniques, and network implementations.
• #### Interleukin-26 activates macrophages and facilitates killing of Mycobacterium tuberculosis

(Scientific Reports, Springer Science and Business Media LLC, 2020-10-14) [Article]
Abstract Tuberculosis-causing Mycobacterium tuberculosis (Mtb) is transmitted via airborne droplets followed by a primary infection of macrophages and dendritic cells. During the activation of host defence mechanisms also neutrophils and T helper 1 (TH1) and TH17 cells are recruited to the site of infection. The TH17 cell-derived interleukin (IL)-17 in turn induces the cathelicidin LL37 which shows direct antimycobacterial effects. Here, we investigated the role of IL-26, a TH1- and TH17-associated cytokine that exhibits antimicrobial activity. We found that both IL-26 mRNA and protein are strongly increased in tuberculous lymph nodes. Furthermore, IL-26 is able to directly kill Mtb and decrease the infection rate in macrophages. Binding of IL-26 to lipoarabinomannan might be one important mechanism in extracellular killing of Mtb. Macrophages and dendritic cells respond to IL-26 with secretion of tumor necrosis factor (TNF)-α and chemokines such as CCL20, CXCL2 and CXCL8. In dendritic cells but not in macrophages cytokine induction by IL-26 is partly mediated via Toll like receptor (TLR) 2. Taken together, IL-26 strengthens the defense against Mtb in two ways: firstly, directly due to its antimycobacterial properties and secondly indirectly by activating innate immune mechanisms.
• #### Flexible Cross-Modal Hashing

(IEEE Transactions on Neural Networks and Learning Systems, Institute of Electrical and Electronics Engineers (IEEE), 2020-10-14) [Article]
Hashing has been widely adopted for large-scale data retrieval in many domains due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across modalities is readily available. This assumption is unrealistic in practical applications. In addition, existing methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (FlexCMH) to learn effective hashing codes from weakly paired data, whose correspondence across modalities is partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the structure of each cluster and, thus, to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes the potential correspondence, the crossmodal hashing functions derived from the correspondence, and a hashing quantitative loss in a unified objective function. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions and reinforce the reciprocal effects of the two objectives. Experiments on public multimodal data sets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it, indeed, offers a high degree of flexibility for practical cross-modal hashing tasks.
• #### Semantic similarity and machine learning with ontologies.

(Briefings in bioinformatics, Oxford University Press (OUP), 2020-10-13) [Article]
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
• #### FAME: 3D Shape Generation via Functionality-Aware Model Evolution

(IEEE Transactions on Visualization and Computer Graphics, IEEE, 2020-10-12) [Article]
We introduce a modeling tool which can evolve a set of 3D objects in a functionality-aware manner. Our goal is for the evolution to generate large and diverse sets of plausible 3D objects for data augmentation, constrained modeling, as well as open-ended exploration to possibly inspire new designs. Starting with an initial population of 3D objects belonging to one or more functional categories, we evolve the shapes through part re-combination to produce generations of hybrids or crossbreeds between parents from the heterogeneous shape collection. Evolutionary selection of offsprings is guided both by a functional plausibility score derived from functionality analysis of shapes in the initial population and user preference, as in a design gallery. Since cross-category hybridization may result in offsprings not belonging to any of the known functional categories, we develop a means for functionality partial matching to evaluate functional plausibility on partial shapes. We show a variety of plausible hybrid shapes generated by our functionality-aware model evolution, which can complement existing datasets as training data and boost the performance of contemporary data-driven segmentation schemes, especially in challenging cases.
• #### Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data.

(BMC bioinformatics, Springer Science and Business Media LLC, 2020-10-08) [Article]
BACKGROUND:Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. RESULTS:We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. CONCLUSION:The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.
• #### Optimal Gradient Compression for Distributed and Federated Learning

(arXiv, 2020-10-07) [Preprint]
Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case analysis, we introduce an efficient compression operator, Sparse Dithering, which is very close to the lower bound. In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound. Thus, our new compression schemes significantly outperform the state of the art. We conduct numerical experiments to illustrate this improvement.
• #### Multi-typed Objects Multi-view Multi-instance Multi-label Learning

(arXiv, 2020-10-06) [Preprint]
Multi-typed objects Multi-view Multi-instance Multi-label Learning (M4L) deals with interconnected multi-typed objects (or bags) that are made of diverse instances, represented with heterogeneous feature views and annotated with a set of non-exclusive but semantically related labels. M4L is more general and powerful than the typical Multi-view Multi-instance Multi-label Learning (M3L), which only accommodates single-typed bags and lacks the power to jointly model the naturally interconnected multi-typed objects in the physical world. To combat with this novel and challenging learning task, we develop a joint matrix factorization based solution (M4L-JMF). Particularly, M4L-JMF firstly encodes the diverse attributes and multiple inter(intra)-associations among multi-typed bags into respective data matrices, and then jointly factorizes these matrices into low-rank ones to explore the composite latent representation of each bag and its instances (if any). In addition, it incorporates a dispatch and aggregation term to distribute the labels of bags to individual instances and reversely aggregate the labels of instances to their affiliated bags in a coherent manner. Experimental results on benchmark datasets show that M4L-JMF achieves significantly better results than simple adaptions of existing M3L solutions on this novel problem.
• #### Stereo Event-Based Particle Tracking Velocimetry for 3D Fluid Flow Reconstruction

(Springer International Publishing, 2020-10-06) [Conference Paper]
Existing Particle Imaging Velocimetry techniques require the use of high-speed cameras to reconstruct time-resolved fluid flows. These cameras provide high-resolution images at high frame rates, which generates bandwidth and memory issues. By capturing only changes in the brightness with a very low latency and at low data rate, event-based cameras have the ability to tackle such issues. In this paper, we present a new framework that retrieves dense 3D measurements of the fluid velocity field using a pair of event-based cameras. First, we track particles inside the two event sequences in order to estimate their 2D velocity in the two sequences of images. A stereo-matching step is then performed to retrieve their 3D positions. These intermediate outputs are incorporated into an optimization framework that also includes physically plausible regularizers, in order to retrieve the 3D velocity field. Extensive experiments on both simulated and real data demonstrate the efficacy of our approach.
• #### Smaller generalization error derived for deep compared to shallow residual neural networks

(arXiv, 2020-10-05) [Preprint]
Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $\bar z_{\ell+1}=\bar z_\ell + \text{Re}\sum_{k=1}^K\bar b_{\ell k}e^{{\rm i}\omega_{\ell k}\bar z_\ell}+ \text{Re}\sum_{k=1}^K\bar c_{\ell k}e^{{\rm i}\omega'_{\ell k}\cdot x}$. An optimal distribution for the frequencies $(\omega_{\ell k},\omega'_{\ell k})$ of the random Fourier features $e^{{\rm i}\omega_{\ell k}\bar z_\ell}$ and $e^{{\rm i}\omega'_{\ell k}\cdot x}$ is derived. The derivation is based on the corresponding generalization error to approximate function values $f(x)$. The generalization error turns out to be smaller than the estimate ${\|\hat f\|^2_{L^1(\mathbb{R}^d)}}/{(LK)}$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $LK$, in the case the $L^\infty$-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $\hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network that shows promising results.
• #### Lower Bounds and Optimal Algorithms for Personalized Federated Learning

(arXiv, 2020-10-05) [Preprint]
In this work, we consider the optimization formulation of personalized federated learning recently introduced by Hanzely and Richt\'arik (2020) which was shown to give an alternative explanation to the workings of local {\tt SGD} methods. Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity. Our second contribution is the design of several optimal methods matching these lower bounds in almost all regimes. These are the first provably optimal methods for personalized federated learning. Our optimal methods include an accelerated variant of {\tt FedProx}, and an accelerated variance-reduced version of {\tt FedAvg}/Local {\tt SGD}. We demonstrate the practical superiority of our methods through extensive numerical experiments.
• #### Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

(arXiv, 2020-10-05) [Preprint]
Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation(HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.
• #### Flexible and reconfigurable radio frequency electronics realized by high-throughput screen printing of vanadium dioxide switches

(Microsystems & Nanoengineering, Springer Science and Business Media LLC, 2020-10-04) [Article]
Abstract Smart materials that can change their properties based on an applied stimulus are in high demand due to their suitability for reconfigurable electronics, such as tunable filters or antennas. In particular, materials that undergo a metal–insulator transition (MIT), for example, vanadium dioxide (VO2) (M), are highly attractive due to their tunable electrical and optical properties at a low transition temperature of 68 °C. Although deposition of this material on a limited scale has been demonstrated through vacuum-based fabrication methods, its scalable application for large-area and high-volume processes is still challenging. Screen printing can be a viable option because of its high-throughput fabrication process on flexible substrates. In this work, we synthesize high-purity VO2 (M) microparticles and develop a screen-printable VO2 ink, enabling the large-area and high-resolution printing of VO2 switches on various substrates. The electrical properties of screen-printed VO2 switches at the microscale are thoroughly investigated under both thermal and electrical stimuli, and the switches exhibit a low ON resistance of 1.8 ohms and an ON/OFF ratio of more than 300. The electrical performance of the printed switches does not degrade even after multiple bending cycles and for bending radii as small as 1 mm. As a proof of concept, a fully printed and mechanically flexible band-pass filter is demonstrated that utilizes these printed switches as reconfigurable elements. Based on the ON and OFF conditions of the VO2 switches, the filter can reconfigure its operating frequency from 3.95 to 3.77 GHz without any degradation in performance during bending.
• #### Regulation of kinase activity by combined action of juxtamembrane and C-terminal regions of receptors

(Cold Spring Harbor Laboratory, 2020-10-01) [Preprint]
Despite the kinetically-favorable, ATP-rich intracellular environment, the mechanism by which receptor tyrosine kinases (RTKs) repress activation prior to extracellular stimulation is poorly understood. RTKs are activated through a precise sequence of phosphorylation reactions starting with a tyrosine on the activation loop (A-loop) of the intracellular kinase domain (KD). This forms an essential mono-phosphorylated active intermediate state on the path to further phosphorylation of the receptor. We show that this state is subjected to stringent control imposed by the peripheral juxtamembrane (JM) and C-terminal tail (CT) regions. This entails interplay between the intermolecular interaction between JM with KD, which stabilizes the asymmetric active KD dimer, and the opposing intramolecular binding of CT to KD. A further control step is provided by the previously unobserved direct binding between JM and CT. Mutations in JM and CT sites that perturb regulation are found in numerous pathologies, revealing novel sites for potential pharmaceutical intervention.
• #### Extrapolating low-frequency prestack land data with deep learning

(Society of Exploration Geophysicists, 2020-10-01) [Conference Paper]
Missing low-frequency content in seismic data is a common challenge for seismic inversion. Long wavelengths are necessary to reveal large structures in the subsurface and to build an acceptable starting point for later iterations of full-waveform inversion (FWI). High-frequency land seismic data are particularly challenging due to the elastic nature of the Earth contrasting with acoustic air at the typically rugged free surface, which makes the use of low frequencies even more vital to the inversion. We propose a supervised deep learning framework for bandwidth extrapolation of prestack elastic data in the time domain. We utilize a Convolutional Neural Network (CNN) with a UNet-inspired architecture to convert portions of band-limited shot gathers from 5-15 Hz to 0-5 Hz band. In the synthetic experiment, we train the network on 192x192 patches of wavefields simulated for different cross-sections of the elastic SEAM Arid model with free-surface. Then, we test the network on unseen shot gathers from the same model to demonstrate the viability of the approach. The results show promise for future field data applications.
• #### A Real-Time Monitoring of Fluids Properties in Tubular Architectures

(2020-10) [Dissertation]