### Recent Submissions

• #### Optimal Gradient Compression for Distributed and Federated Learning

(arXiv, 2020-10-07) [Preprint]
Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case analysis, we introduce an efficient compression operator, Sparse Dithering, which is very close to the lower bound. In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound. Thus, our new compression schemes significantly outperform the state of the art. We conduct numerical experiments to illustrate this improvement.
• #### Multi-typed Objects Multi-view Multi-instance Multi-label Learning

(arXiv, 2020-10-06) [Preprint]
Multi-typed objects Multi-view Multi-instance Multi-label Learning (M4L) deals with interconnected multi-typed objects (or bags) that are made of diverse instances, represented with heterogeneous feature views and annotated with a set of non-exclusive but semantically related labels. M4L is more general and powerful than the typical Multi-view Multi-instance Multi-label Learning (M3L), which only accommodates single-typed bags and lacks the power to jointly model the naturally interconnected multi-typed objects in the physical world. To combat with this novel and challenging learning task, we develop a joint matrix factorization based solution (M4L-JMF). Particularly, M4L-JMF firstly encodes the diverse attributes and multiple inter(intra)-associations among multi-typed bags into respective data matrices, and then jointly factorizes these matrices into low-rank ones to explore the composite latent representation of each bag and its instances (if any). In addition, it incorporates a dispatch and aggregation term to distribute the labels of bags to individual instances and reversely aggregate the labels of instances to their affiliated bags in a coherent manner. Experimental results on benchmark datasets show that M4L-JMF achieves significantly better results than simple adaptions of existing M3L solutions on this novel problem.
• #### Smaller generalization error derived for deep compared to shallow residual neural networks

(arXiv, 2020-10-05) [Preprint]
Estimates of the generalization error are proved for a residual neural network with $L$ random Fourier features layers $\bar z_{\ell+1}=\bar z_\ell + \text{Re}\sum_{k=1}^K\bar b_{\ell k}e^{{\rm i}\omega_{\ell k}\bar z_\ell}+ \text{Re}\sum_{k=1}^K\bar c_{\ell k}e^{{\rm i}\omega'_{\ell k}\cdot x}$. An optimal distribution for the frequencies $(\omega_{\ell k},\omega'_{\ell k})$ of the random Fourier features $e^{{\rm i}\omega_{\ell k}\bar z_\ell}$ and $e^{{\rm i}\omega'_{\ell k}\cdot x}$ is derived. The derivation is based on the corresponding generalization error to approximate function values $f(x)$. The generalization error turns out to be smaller than the estimate ${\|\hat f\|^2_{L^1(\mathbb{R}^d)}}/{(LK)}$ of the generalization error for random Fourier features with one hidden layer and the same total number of nodes $LK$, in the case the $L^\infty$-norm of $f$ is much less than the $L^1$-norm of its Fourier transform $\hat f$. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network that shows promising results.
• #### Lower Bounds and Optimal Algorithms for Personalized Federated Learning

(arXiv, 2020-10-05) [Preprint]
In this work, we consider the optimization formulation of personalized federated learning recently introduced by Hanzely and Richt\'arik (2020) which was shown to give an alternative explanation to the workings of local {\tt SGD} methods. Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity. Our second contribution is the design of several optimal methods matching these lower bounds in almost all regimes. These are the first provably optimal methods for personalized federated learning. Our optimal methods include an accelerated variant of {\tt FedProx}, and an accelerated variance-reduced version of {\tt FedAvg}/Local {\tt SGD}. We demonstrate the practical superiority of our methods through extensive numerical experiments.
• #### Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

(arXiv, 2020-10-05) [Preprint]
Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation(HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.
• #### Genetic mapping of the early responses to salt stress in Arabidopsis thaliana

(Cold Spring Harbor Laboratory, 2020-10-04) [Preprint]
Salt stress decreases plant growth prior to significant ion accumulation in the shoot. However, the processes underlying this rapid reduction in growth are still unknown. To understand the changes in salt stress responses through time and at multiple physiological levels, examining different plant processes within a single setup is required. Recent advances in phenotyping has allowed the image-based estimation of plant growth, morphology, colour and photosynthetic activity. In this study, we examined the salt stress-induced responses of 191 Arabidopsis accessions from one hour to seven days after treatment using high-throughput phenotyping. Multivariate analyses and machine learning algorithms identified that quantum yield measured in the light-adapted state (Fv'/Fm') greatly affected growth maintenance in the early phase of salt stress, while maximum quantum yield (QY max) was crucial at a later stage. In addition, our genome-wide association study (GWAS) identified 770 loci that were specific to salt stress, in which two loci associated with QY max and Fv'/Fm' were selected for validation using T-DNA insertion lines. We characterised an unknown protein kinase found in the QY max locus, which reduced photosynthetic efficiency and growth maintenance under salt stress. Understanding the molecular context of the identified candidate genes will provide valuable insights into the early plant responses to salt stress. Furthermore, our work incorporates high-throughput phenotyping, multivariate analyses and GWAS, uncovering details of temporal stress responses, while identifying associations across different traits and time points, which likely constitute the genetic components of salinity tolerance.
• #### Regulation of kinase activity by combined action of juxtamembrane and C-terminal regions of receptors

(Cold Spring Harbor Laboratory, 2020-10-01) [Preprint]
Despite the kinetically-favorable, ATP-rich intracellular environment, the mechanism by which receptor tyrosine kinases (RTKs) repress activation prior to extracellular stimulation is poorly understood. RTKs are activated through a precise sequence of phosphorylation reactions starting with a tyrosine on the activation loop (A-loop) of the intracellular kinase domain (KD). This forms an essential mono-phosphorylated active intermediate state on the path to further phosphorylation of the receptor. We show that this state is subjected to stringent control imposed by the peripheral juxtamembrane (JM) and C-terminal tail (CT) regions. This entails interplay between the intermolecular interaction between JM with KD, which stabilizes the asymmetric active KD dimer, and the opposing intramolecular binding of CT to KD. A further control step is provided by the previously unobserved direct binding between JM and CT. Mutations in JM and CT sites that perturb regulation are found in numerous pathologies, revealing novel sites for potential pharmaceutical intervention.
• #### Analysis of 3D localization in underwater optical wireless networks with uncertain anchor positions

(Science China Information Sciences, Springer Science and Business Media LLC, 2020-09-30) [Article]
Localization accuracy is of paramount importance for the proper operation of underwater optical wireless sensor networks (UOWSNs). However, underwater localization is prone to hostile environmental impediments such as drifts owing to the surface and deep currents. These cause uncertainty in the deployed anchor node positions and pose daunting challenges to achieve accurate location estimations. Therefore, this paper analyzes the performance of three-dimensional (3D) localization for UOWSNs and derives a closed-form expression for the Cramer Rao lower bound (CRLB) by using time of arrival (ToA) and angle of arrival (AoA) measurements under the presence of uncertainty in anchor node positions. Numerical results validate the analytical findings by comparing the localization accuracy in scenarios with and without anchor nodes position uncertainty. Results are also compared with the linear least square (LLS) method and weighted LLS (WLLS) method.
• #### MnO6 Octahedral Tilt Control of Emergent Phenomena at LaMnO3/SrMnO3 Interfaces

(arXiv, 2020-09-30) [Preprint]
Emergent phases at the interfaces in strongly correlated oxide hetero structures display novelproperties not akin to those of constituting materials. The interfacial ferromagnetism in LaMnO3/SrMnO3 (LMOm/SMOn) superlattices (SLs) is usually considered to be a result of the interfacial charge transfer. We report a decisive role of atomic interface structure in the development of emergent magnetism and phonon transport in (LMO)m/(SMO)n SLs (m/n=1, 2). The observed common octahedral network with MnO6-tilt-free interfaces in m/n=1 SLs suppresses interfacial electron transfer and enhances thermal (phonon) conductivity. For m/n=2 SLs two distinct LMO and SMO lattices result in an MnO6 tilt mismatch, which enhances the emergent ferromagnetism and suppresses thermal conductivity. Furthermore, the interface thermal conductance increases strongly from 0.29 up to 1.75 GW/m2K in SLs with (m/n=2) and without (m/n=1) tilt mismatch, respectively. Experimental results, fully supported by first principle calculations, emphasize a fundamental role of electron-spin-lattice interplay at interfaces and open new avenues of lattice engineering of emergent phases.
• #### Error Compensated Distributed SGD Can Be Accelerated

(arXiv, 2020-09-30) [Preprint]
Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that can work for any compressor satisfying a certain contraction property, which includes both unbiased (after appropriate scaling) and biased compressors such as RandK and TopK. Applied naively, gradient compression introduces errors that either slow down convergence or lead to divergence. A popular technique designed to tackle this issue is error compensation/error feedback. Due to the difficulties associated with analyzing biased compressors, it is not known whether gradient compression with error compensation can be combined with Nesterov's acceleration. In this work, we show for the first time that error compensated gradient compression methods can be accelerated. In particular, we propose and study the error compensated loopless Katyusha method, and establish an accelerated linear convergence rate under standard assumptions. We show through numerical experiments that the proposed method converges with substantially fewer communication rounds than previous error compensated algorithms.
• #### Preventing pressure oscillations does not fix local linear stability issues of entropy-based split-form high-order schemes

(arXiv, 2020-09-28) [Preprint]
Recently, it was discovered that the entropy-conserving/dissipative high-order split-form discontinuous Galerkin discretizations have robustness issues when trying to solve the simple density wave propagation example for the compressible Euler equations. The issue is related to missing local linear stability, i.e. the stability of the discretization towards perturbations added to a stable base flow. This is strongly related to an anti-diffusion mechanism, that is inherent in entropy-conserving two-point fluxes, which are a key ingredient for the high-order discontinuous Galerkin extension. In this paper, we investigate if pressure equilibrium preservation is a remedy to these recently found local linear stability issues of entropy-conservative/dissipative high-order split-form discontinuous Galerkin methods for the compressible Euler equations. Pressure equilibrium preservation describes the property of a discretization to keep pressure and velocity constant for pure density wave propagation. We present the full theoretical derivation, analysis, and show corresponding numerical results to underline our findings. The source code to reproduce all numerical experiments presented in this article is available online (DOI: 10.5281/zenodo.4054366).
• #### Machine learning for UAV-Based networks

(arXiv, 2020-09-24) [Preprint]
Unmanned aerial vehicles (UAVs) are considered as one of the promising technologies for the next-generation wireless communication networks. Their mobility and their ability to establish a line of sight (LOS) links with the users made them key solutions for many potential applications. In the same vein, artificial intelligence is growing rapidly nowadays and has been very successful, particularly due to the massive amount of the available data. As a result, a significant part of the research community has started to integrate intelligence at the core of UAVs networks by applying machine learning (ML) algorithms in solving several problems in relation to drones. In this article, we provide a comprehensive overview of some potential applications of ML in UAV-Based networks. We will also highlight the limits of the existing works and outline some potential future applications of ML for UAVs networks.
• #### Towards accelerated rates for distributed optimization over time-varying networks

(arXiv, 2020-09-23) [Preprint]
We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision $\varepsilon > 0$ in $O(\sqrt{\kappa_g}\chi\log^2(1/\varepsilon))$ communication steps and $O(\sqrt{\kappa_g}\log(1/\varepsilon))$ gradient computations at each node, where $\kappa_g$ is the global function number and $\chi$ characterizes connectivity of the communication network. In the case of a static network, $\chi = 1/\gamma$ where $\gamma$ denotes the normalized spectral gap of communication matrix $\mathbf{W}$. The complexity bound includes $\kappa_g$, which can be significantly better than the worst-case condition number among the nodes.
• #### Terahertz Massive MIMO with Holographic Reconfigurable Intelligent Surfaces

(arXiv, 2020-09-23) [Preprint]
We propose a holographic version of a reconfigurable intelligent surface (RIS) and investigate its application to terahertz (THz) massive multiple-input multiple-output systems. Capitalizing on the miniaturization of THz electronic components, RISs can be implemented by densely packing subwavelength unit cells, so as to realize continuous or quasi-continuous apertures and to enable holographic communications. In this paper, in particular, we derive the beam pattern of a holographic RIS. Our analysis reveals that the beam pattern of an ideal holographic RIS can be well approximated by that of an ultra-dense RIS, which has a more practical hardware architecture. In addition, we propose a closedloop channel estimation (CE) scheme to effectively estimate the broadband channels that characterize THz massive MIMO systems aided by holographic RISs. The proposed CE scheme includes a downlink coarse CE stage and an uplink finer-grained CE stage. The uplink pilot signals are judiciously designed for obtaining good CE performance. Moreover, to reduce the pilot overhead, we introduce a compressive sensing-based CE algorithm, which exploits the dual sparsity of THz MIMO channels in both the angular and delay domain. Simulation results demonstrate the superiority of holographic RISs over the nonholographic ones, and the effectiveness of the proposed CE scheme.
• #### Channel Estimation for Distributed Intelligent Reflecting Surfaces Assisted Multi-User MISO Systems

(arXiv, 2020-09-22) [Preprint]
Intelligent reflecting surfaces (IRSs)-assisted wireless communication promises improved system performance, while posing new challenges in channel estimation (CE) due to the passive nature of the reflecting elements. Although a few CE protocols for IRS-assisted multiple-input single-output (MISO) systems have appeared, they either require long channel training times or are developed under channel sparsity assumptions. Moreover, existing works focus on a single IRS, whereas in practice multiple such surfaces should be installed to truly benefit from the concept of reconfiguring propagation environments. In light of these challenges, this paper tackles the CE problem for the distributed IRSs-assisted multi-user MISO system. An optimal CE protocol requiring relatively low training overhead is developed using Bayesian techniques under the practical assumption that the BS-IRSs channels are dominated by the line-of-sight (LoS) components. An optimal solution for the phase shifts vectors required at all IRSs during CE is determined and the minimum mean square error (MMSE) estimates of the BSusers direct channels and the IRSs-users channels are derived. Simulation results corroborate the normalized MSE (NMSE) analysis and establish the advantage of the proposed protocol as compared to benchmark scheme in terms of training overhead.
• #### Selection dynamics for deep neural networks

(Journal of Differential Equations, Elsevier BV, 2020-09-21) [Article]
This paper presents a partial differential equation framework for deep residual neural networks and for the associated learning problem. This is done by carrying out the continuum limits of neural networks with respect to width and depth. We study the wellposedness, the large time solution behavior, and the characterization of the steady states of the forward problem. Several useful time-uniform estimates and stability/instability conditions are presented. We state and prove optimality conditions for the inverse deep learning problem, using standard variational calculus, the Hamilton-Jacobi-Bellmann equation and the Pontryagin maximum principle. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between neural networks, PDE theory, variational analysis, optimal control, and deep learning.
• #### Robust Beam Position Estimation with Photon Counting Detector Arrays in Free-Space Optical Communications

(Center for Open Science, 2020-09-21) [Preprint]
Optical beam center position on an array of detectors is an important parameter that is essential for estimating the angle-of-arrival of the incoming signal beam. In this paper, we have examined the beam position estimation problem for photon-counting detector arrays, and to this end, we have derived and analyzed the Cramer-Rao lower bounds on the mean-square error of the unbiased estimators of the beam position. Furthermore, we have also derived the Cramer-Rao lower bounds of other beam parameters such as peak intensity, and the intensity of background radiation on the array. In this sense, we have considered a robust estimation of the beam position in which none of the parameters are assumed to be known beforehand. Additionally, we have derived the Cramer-Rao lower bounds of beam parameters for observations based on both pilot and data symbols of a pulse position modulation (PPM) scheme. Finally, we have considered a two-step estimation problem in which the peak intensity and background radiation are estimated using a method of moments estimator, and the beam center position is estimated with the help of a maximum likelihood estimator.
• #### Optimal correlation order in super-resolution optical fluctuation microscopy

(arXiv, 2020-09-21) [Preprint]
Here, we show that, contrary to the common opinion, the super-resolution optical fluctuation microscopy might not lead to ideally infinite super-resolution enhancement with increasing of the order of measured cumulants. Using information analysis for estimating error bounds on the determination of point sources positions, we show that reachable precision per measurement might be saturated with increasing of the order of the measured cumulants in the super-resolution regime. In fact, there is an optimal correlation order beyond which there is practically no improvement for objects of three and more point sources. However, for objects of just two sources, one still has an intuitively expected resolution increase with the cumulant order.
• #### On NOMA-Based mmWave Communications

(arXiv, 2020-09-15) [Preprint]
Non-orthogonal multiple access (NOMA) and millimeter-wave (mmWave) communication are two promising techniques to increase the system capacity in the fifth-generation (5G) mobile network. The former can achieve high spectral efficiency by modulating the information in power domain and the latter can provide extremely large spectrum resources. Fluctuating two-ray (FTR) channel model has already been proved to accurately agree with the small-scale fading effects in mmWave communications in experiments. In this paper, the performance of NOMA-based communications over FTR channels in mmWave communication systems is investigated in terms of outage probability (OP) and ergodic capacity (EC). Specifically, we consider the scenario that one base station (BS) transmits signals to two users simultaneously under NOMA scheme. The BS and users are all equipped with a single antenna. Two power allocation strategies are considered: the first one is a general (fixed) power allocation scheme under which we derive the OP and EC of NOMA users in closed form; the other one is an optimal power allocation scheme that can achieve the maximum sum rate for the whole system. Under the second scheme, not only the closed-form OP and EC but also the upper and lower bounds of EC are derived. Furthermore, we also derive the asymptotic expression for the OP in high average SNR region to investigate the diversity order under these two schemes. Finally, we show the correctness and accuracy of our derived expressions by Monte-Carlo simulation.
• #### Arginine citrullination of proteins as a specific response mechanism in Arabidopsis thaliana

(Cold Spring Harbor Laboratory, 2020-09-13) [Preprint]
Arginine citrullination, also referred to as arginine deimination, is a post-translational modification involved in an increasing number of physiological processes in animals, including histone modifications and transcriptional regulation, and in severe diseases such as rheumatoid arthritis and neurodegenerative conditions. It occurs when arginine side chains are deiminated and converted into side chains of the amino acid citrulline, a process catalysed by a family of Ca2+-dependent peptidyl arginine deiminases (PADs). PADs have been discovered in several mammalian species and in other vertebrates, like birds and fish, but have not been observed in bacteria, lower eukaryotes or higher plants. Here we show, firstly, that the Arabidopsis thaliana proteome does contain citrullinated proteins; secondly and importantly, that the citrullination signature changes in response to cold stress. Among the citrullinated proteins are DNA- or RNA-binding proteins thus implying a role for it the control of the transcriptional programming in plant cells. Thirdly, through sequence and structural analysis, we identify one arabidopsis protein, currently annotated as agmatine deiminase (At5g08170), as a candidate protein arginine deiminase. Finally, we show biochemical evidence that AT5G08170 can citrullinate peptides from LHP1-interacting factor 2 (AT4G00830) an RNA-binding protein that has been identified as citrullinated in cell suspension cultures of Arabidopsis thaliana roots. In addition, we show that, in vitro, agmatine deiminase can undergo auto-citrullination. In conclusion, our work established the presence of protein arginine citrullination in higher plants and assigns it a role in post-translational modifications during abiotic stress responses.