Preprints
Recent Submissions

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression(arXiv, 20210720) [Preprint]Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradienttype methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} $O\bigg(\sqrt{\Big(1+\sqrt{\frac{\omega^3}{n}}\Big)\frac{L}{\epsilon}} + \omega\big(\frac{1}{\epsilon}\big)^{\frac{1}{3}}\bigg)$, which improves upon the stateoftheart nonaccelerated rate $O\left((1+\frac{\omega}{n})\frac{L}{\epsilon} + \frac{\omega^2+n}{\omega+n}\frac{1}{\epsilon}\right)$ of DIANA (Khaled et al., 2020b) for distributed general convex problems, where $\epsilon$ is the target error, $L$ is the smooth parameter of the objective, $n$ is the number of machines/devices, and $\omega$ is the compression parameter (larger $\omega$ means more compression can be applied, and no compression implies $\omega=0$). Our results show that as long as the number of devices $n$ is large (often true in distributed/federated learning), or the compression $\omega$ is not very high, CANITA achieves the faster convergence rate $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$, i.e., the number of communication rounds is $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$ (vs. $O\big(\frac{L}{\epsilon}\big)$ achieved by previous works). As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).

BICNet: A Bayesian Approach for Estimating Task Effects on Intrinsic Connectivity Networks in fMRI Data(arXiv, 20210719) [Preprint]Intrinsic connectivity networks (ICNs) are specific dynamic functional brain networks that are consistently found under various conditions including rest and task. Studies have shown that some stimuli actually activate intrinsic connectivity through either suppression, excitation, moderation or modification. Nevertheless, the structure of ICNs and taskrelated effects on ICNs are not yet fully understood. In this paper, we propose a Bayesian Intrinsic Connectivity Network (BICNet) model to identify the ICNs and quantify the taskrelated effects on the ICN dynamics. Using an extended Bayesian dynamic sparse latent factor model, the proposed BICNet has the following advantages: (1) it simultaneously identifies the individual ICNs and grouplevel ICN spatial maps; (2) it robustly identifies ICNs by jointly modeling restingstate functional magnetic resonance imaging (rfMRI) and taskrelated functional magnetic resonance imaging (tfMRI); (3) compared to independent component analysis (ICA)based methods, it can quantify the difference of ICNs amplitudes across different states; (4) it automatically performs feature selection through the sparsity of the ICNs rather than adhoc thresholding. The proposed BICNet was applied to the rfMRI and language tfMRI data from the Human Connectome Project (HCP) and the analysis identified several ICNs related to distinct language processing functions.

Nearly Unstable IntegerValued ARCH Process and Unit Root Testing(arXiv, 20210716) [Preprint]This paper introduces a Nearly Unstable INtegervalued AutoRegressive Conditional Heteroskedasticity (NUINARCH) process for dealing with count time series data. It is proved that a proper normalization of the NUINARCH process endowed with a Skorohod topology weakly converges to a CoxIngersollRoss diffusion. The asymptotic distribution of the conditional least squares estimator of the correlation parameter is established as a functional of certain stochastic integrals. Numerical experiments based on Monte Carlo simulations are provided to verify the behavior of the asymptotic distribution under finite samples. These simulations reveal that the nearly unstable approach provides satisfactory and better results than those based on the stationarity assumption even when the true process is not that close to nonstationarity. A unit root test is proposed and its TypeI error and power are examined via Monte Carlo simulations. As an illustration, the proposed methodology is applied to the daily number of deaths due to COVID19 in the United Kingdom.

Multivariate ConwayMaxwellPoisson Distribution: Sarmanov Method and DoublyIntractable Bayesian Inference(arXiv, 20210715) [Preprint]In this paper, a multivariate count distribution with ConwayMaxwell (COM)Poisson marginals is proposed. To do this, we develop a modification of the Sarmanov method for constructing multivariate distributions. Our multivariate COMPoisson (MultCOMP) model has desirable features such as (i) it admits a flexible covariance matrix allowing for both negative and positive nondiagonal entries; (ii) it overcomes the limitation of the existing bivariate COMPoisson distributions in the literature that do not have COMPoisson marginals; (iii) it allows for the analysis of multivariate counts and is not just limited to bivariate counts. Inferential challenges are presented by the likelihood specification as it depends on a number of intractable normalizing constants involving the model parameters. These obstacles motivate us to propose a Bayesian inferential approach where the resulting doublyintractable posterior is dealt with via the exchange algorithm and the Grouped Independence MetropolisHastings algorithm. Numerical experiments based on simulations are presented to illustrate the proposed Bayesian approach. We analyze the potential of the MultCOMP model through a real data application on the numbers of goals scored by the home and away teams in the Premier League from 2018 to 2021. Here, our interest is to assess the effect of a lack of crowds during the COVID19 pandemic on the wellknown home team advantage. A MultCOMP model fit shows that there is evidence of a decreased number of goals scored by the home team, not accompanied by a reduced score from the opponent. Hence, our analysis suggests a smaller home team advantage in the absence of crowds, which agrees with the opinion of several football experts.

Voltage Controlled Domain Wall Motion based Neuron and Stochastic Magnetic Tunnel Junction Synapse for Neuromorphic Computing Applications(Institute of Electrical and Electronics Engineers (IEEE), 20210715) [Preprint]The present work discusses the proposal of a spintronic neuromorphic system with spin orbit torque driven domain wall motionbased neuron and synapse. We propose a voltagecontrolled magnetic anisotropy domain wall motion based magnetic tunnel junction neuron. We investigate how the electric field at the gate (pinning site), generated by the voltage signals from preneurons, modulates the domain wall motion, which reflects in the nonlinear switching behaviour of neuron magnetization. For the implementation of synaptic weights, we propose 3terminal MTJ with stochastic domain wall motion in the free layer. We incorporate intrinsic pinning effects by creating triangular notches on the sides of the free layer. The pinning of domain wall and intrinsic thermal noise of device lead to the stochastic behaviour of domain wall motion. The control of this stochasticity by the spin orbit torque is shown to realize the potentiation and depression of the synaptic weight. The micromagnetics and spin transport studies in synapse and neuron are carried out by developing a coupled micromagnetic NonEquilibrium Green’s Function (MuMagNEGF) model. The minimization of the writing current pulse width by leveraging the thermal noise and demagnetization energy is also presented. Finally, we discuss the implementation of digit recognition by the proposed system using a spike time dependent algorithm.

A Field Guide to Federated Optimization(arXiv, 20210714) [Preprint]Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer realworld performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.

Disentangling marine microbial networks across space(Cold Spring Harbor Laboratory, 20210713) [Preprint]Although microbial interactions underpin ocean ecosystem functions, they remain barely known. Different studies have analyzed microbial interactions using static association networks based on omicsdata. However, microbial associations are dynamic and can change across physicochemical gradients and spatial scales, which needs to be considered to understand the ocean ecosystem better. We explored associations between archaea, bacteria, and picoeukaryotes along the water column from the surface to the deep ocean across the northern subtropical to the southern temperate ocean and the Mediterranean Sea by defining samplespecific subnetworks. Quantifying spatial association recurrence, we found the lowest fraction of global associations in the bathypelagic zone, while associations endemic of certain regions increased with depth. Overall, our results highlight the need to study the dynamic nature of plankton networks and our approach represents a step forward towards a better comprehension of the biogeography of microbial interactions across ocean regions and depth layers.

Optimal Decentralized Algorithms for Saddle Point Problems over TimeVarying Networks(arXiv, 20210713) [Preprint]Decentralized optimization methods have been in the focus of optimization community due to their scalability, increasing popularity of parallel algorithms and many applications. In this work, we study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network. The network topology may change from time to time, which models realworld network malfunctions. We obtain lower complexity bounds for algorithms in this setup and develop optimal methods which meet the lower bounds.

Narrow Precursor Mass Range for DIAMS Enhances Protein Identification and Quantification(MDPI AG, 20210713) [Preprint]Data independent acquisition  mass spectrometry (DIAMS) is becoming widely utilised for robust and accurate quantification of samples in quantitative proteomics. Here, we describe the systematic evaluation of the effects of DIA precursor mass range on total protein identification and quantification. We show that a narrow mass range of precursors (~250 m/z) for DIAMS enables a higher number of protein identifications. Subsequent application of DIA with narrow precursor range (from 400 to 650 m/z) on Arabidopsis sample with spikein of known proteins identified 34.7% more proteins than in conventional DIA (cDIA) with a wide precursor range of 4001200 m/z. When combining several DIAMS analyses with narrow precursor ranges (i.e., 400650, 650900 and 9001200 m/z), we were able to quantify 10,099 protein groups with a median coefficient of variation of <6%. These findings represent a 59.4% increase in the number of proteins quantified than with cDIA analysis. This is particularly important for low abundance proteins, as exemplified by the 6protein mix spikein. In cDIA only 5 out of the 6protein mix were quantified while our approach allowed accurate quantitation of all six proteins.

Details Preserving Deep Collaborative FilteringBased Method for Image Denoising(arXiv, 20210711) [Preprint]In spite of the improvements achieved by the several denoising algorithms over the years, many of them still fail at preserving the fine details of the image after denoising. This is as a result of the smoothout effect they have on the images. Most neural networkbased algorithms have achieved better quantitative performance than the classical denoising algorithms. However, they also suffer from qualitative (visual) performance as a result of the smoothout effect. In this paper, we propose an algorithm to address this shortcoming. We propose a deep collaborative filteringbased (DeepCoFiB) algorithm for image denoising. This algorithm performs collaborative denoising of image patches in the sparse domain using a set of optimized neural network models. This results in a fast algorithm that is able to excellently obtain a tradeoff between noise removal and details preservation. Extensive experiments show that the DeepCoFiB performed quantitatively (in terms of PSNR and SSIM) and qualitatively (visually) better than many of the stateoftheart denoising algorithms.

Collaborative FilteringBased Method for LowResolution and Details Preserving Image Denoising(arXiv, 20210710) [Preprint]Over the years, progressive improvements in denoising performance have been achieved by several image denoising algorithms that have been proposed. Despite this, many of these stateoftheart algorithms tend to smooth out the denoised image resulting in the loss of some image details after denoising. Many also distort images of lower resolution resulting in a partial or complete structural loss. In this paper, we address these shortcomings by proposing a collaborative filteringbased (CoFiB) denoising algorithm. Our proposed algorithm performs weighted sparse domain collaborative denoising by taking advantage of the fact that similar patches tend to have similar sparse representations in the sparse domain. This gives our algorithm the intelligence to strike a balance between image detail preservation and noise removal. Our extensive experiments showed that our proposed CoFiB algorithm does not only preserve the image details but also perform excellently for images of any given resolution where many denoising algorithms tend to struggle, specifically at low resolutions.

DenseSparse Deep CNN Training for Image Denoising(arXiv, 20210710) [Preprint]Recently, deep learning (DL) methods such as convolutional neural networks (CNNs) have gained prominence in the area of image denoising. This is owing to their proven ability to surpass stateoftheart classical image denoising algorithms such as BM3D. Deep denoising CNNs (DnCNNs) use many feedforward convolution layers with added regularization methods of batch normalization and residual learning to improve denoising performance significantly. However, this comes at the expense of a huge number of trainable parameters. In this paper, we address this issue by reducing the number of parameters while achieving a comparable level of performance. We derive motivation from the improved performance obtained by training networks using the densesparsedense (DSD) training approach. We extend this training approach to a reduced DnCNN (RDnCNN) network resulting in a faster denoising network with significantly reduced parameters and comparable performance to the DnCNN.

The Bayesian Learning Rule(arXiv, 20210709) [Preprint]We show that many machinelearning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a widerange of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deeplearning algorithms such as stochasticgradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.

ANCER: Anisotropic Certification via Samplewise Volume Maximization(arXiv, 20210709) [Preprint]Randomized smoothing has recently emerged as an effective tool that enables certification of deep neural network classifiers at scale. All prior art on randomized smoothing has focused on isotropic $\ell_p$ certification, which has the advantage of yielding certificates that can be easily compared among isotropic methods via $\ell_p$norm radius. However, isotropic certification limits the region that can be certified around an input to worstcase adversaries, \ie it cannot reason about other "close", potentially large, constant prediction safe regions. To alleviate this issue, (i) we theoretically extend the isotropic randomized smoothing $\ell_1$ and $\ell_2$ certificates to their generalized anisotropic counterparts following a simplified analysis. Moreover, (ii) we propose evaluation metrics allowing for the comparison of general certificates  a certificate is superior to another if it certifies a superset region  with the quantification of each certificate through the volume of the certified region. We introduce ANCER, a practical framework for obtaining anisotropic certificates for a given test set sample via volume maximization. Our empirical results demonstrate that ANCER achieves stateoftheart $\ell_1$ and $\ell_2$ certified accuracy on both CIFAR10 and ImageNet at multiple radii, while certifying substantially larger regions in terms of volume, thus highlighting the benefits of moving away from isotropic analysis. Code used in our experiments is available in https://github.com/MotasemAlfarra/ANCER.

Thymosin β4 is an Endogenous Iron Chelator and Molecular Switcher of Ferroptosis(Research Square Platform LLC, 20210709) [Preprint]Abstract Thymosin β4 (Tβ4) was extracted forty years ago $^{1}$ from calf thymus. Since then, it has been identified as a Gactin binding protein involved in blood clothing, tissue regeneration, angiogenesis, and antiinflammatory processes. Tβ4 has also been implicated in tumor metastasis and neurodegeneration. However, the precise roles and mechanism(s) of action of Tβ4 in these processes remain largely unknown, with the binding of the Gactin protein being insufficient to explain these multiactions. Here we identify for the first time the important part of Tβ4 mechanism in ferroptosis, an irondependent form of cell death, which leads to neurodegeneration and somehow protects cancer cells against cell death. Specifically, we demonstrate four iron$^{2+}$and iron$^{3+}$ binding regions along the peptide and show that the presence of Tβ4 in cell growing medium inhibits erastin and glutamateinduced ferroptosis in macrophage cell line. Moreover, Tβ4 increases the expression of oxidative stressrelated genes, namely BAX, hem oxygenase1, Heat shock protein 70 and Thioredoxin reductase 1, which are downregulated during ferroptosis. We state the hypothesis that Tβ4 is an endogenous iron chelator and take part of iron homeostasis in ferroptosis process. We discuss the literature data of parallel involvement of Tβ4 and ferroptosis in different human pathologies, mainly cancer and neurodegeneration. Our findings confronted with literature data shows that controlled Tβ4 release could command on/off switching of ferroptosis, and may provide novel therapeutic opportunities in pathologies of cancer and tissue degeneration.

Towards Robust General Medical Image Segmentation(arXiv, 20210709) [Preprint]The reliability of Deep Learning systems depends on their accuracy but also on their robustness against adversarial perturbations to the input data. Several attacks and defenses have been proposed to improve the performance of Deep Neural Networks under the presence of adversarial noise in the natural image domain. However, robustness in computeraided diagnosis for volumetric data has only been explored for specific tasks and with limited attacks. We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are twofold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG). Our results show that ROG is capable of generalizing across different tasks of the MSD and largely surpasses the stateoftheart under sophisticated adversarial attacks.

Accelerating Seismic Redatuming Using Tile LowRank Approximations on NEC SXAurora TSUBASA(Submitted to SUPERCOMPUTING FRONTIERS AND INNOVATIONS, Submitted to South Ural State University (Chelyabinsk, Russia), 20210707) [Preprint]With the aim of imaging subsurface discontinuities, seismic data recorded at the surface of the Earth must be numerically repositioned at locations in the subsurface where reflections have originated, a process generally referred to as redatuming by the geophysical community. Historically, this process has been carried out by numerically timereversing the data recorded along an open boundary of surface receivers into the subsurface. Despite its simplicity, such an approach is only able to handle seismic energy from primary arrivals (i.e., waves that interact only once with the medium discontinuities), failing to explain multiscattering in the subsurface. As a result, seismic images are contaminated by artificial reflectors if data are not preprocessed prior to imaging such that multiples are removed from the data. In the last decade, a novel family of methods has emerged under the name of Marchenko redatuming; such methods allow for accurate redatuming of the fullwavefield recorded seismic data including multiple arrivals. This is achieved by solving an inverse problem, whose adjoint modeling can be shown to be equivalent to the standard singlescattering redatuming method for primaryonly data. A downside of this application is that the socalled multidimensional convolution operator must be repeatedly evaluated as part of the inversion. Such an operator requires the application of multiple dense matrixvector multiplications (MVM), which represent the most timeconsuming operations in the forward and adjoint processes. We identify and leverage the data sparsity structure for each of the frequency matrices during the MVM operation, and propose to accelerate the MVM step using tile lowrank (TLR) matrix approximations. We study the TLR impact on timetosolution for the MVM using different accuracy thresholds whilst at the same time assessing the quality of the resulting subsurface seismic wavefields and show that TLR leads to a minimal degradation in terms of signaltonoise ratio on a 3D synthetic dataset. We mitigate the load imbalance overhead and provide performance evaluation on two distributedmemory systems. Our MPI+OpenMP TLRMVM implementation reaches up to 3X performance speedup against the dense MVM counterpart from NEC scientific library on 128 NEC SXAurora TSUBASA cards. Thanks to the second generation of high bandwidth memory technology, it further attains up to 67X performance speedup (i.e., 110 TB/s) compared to the dense MVM from Intel MKL when running on 128 dualsocket 20core Intel Cascade Lake nodes with DDR4 memory, without deteriorating the quality of the reconstructed seismic wavefields.

Randomized multilevel Monte Carlo for embarrassingly parallel inference(arXiv, 20210705) [Preprint]This position paper summarizes a recently developed research program focused on inference in the context of data centric science and engineering applications, and forecasts its trajectory forward over the next decade. Often one endeavours in this context to learn complex systems in order to make more informed predictions and high stakes decisions under uncertainty. Some key challenges which must be met in this context are robustness, generalizability, and interpretability. The Bayesian framework addresses these three challenges, while bringing with it a fourth, undesirable feature: it is typically far more expensive than its deterministic counterparts. In the 21st century, and increasingly over the past decade, a growing number of methods have emerged which allow one to leverage cheap lowfidelity models in order to precondition algorithms for performing inference with more expensive models and make Bayesian inference tractable in the context of highdimensional and expensive models. Notable examples are multilevel Monte Carlo (MLMC), multiindex Monte Carlo (MIMC), and their randomized counterparts (rMLMC), which are able to provably achieve a dimensionindependent (including $\infty$dimension) canonical complexity rate with respect to mean squared error (MSE) of $1/$MSE. Some parallelizability is typically lost in an inference context, but recently this has been largely recovered via novel double randomization approaches. Such an approach delivers i.i.d. samples of quantities of interest which are unbiased with respect to the infinite resolution target distribution. Over the coming decade, this family of algorithms has the potential to transform data centric science and engineering, as well as classical machine learning applications such as deep learning, by scaling up and scaling out fully Bayesian inference.

Reference Tracking AND Observer Design for SpaceFractional Partial Differential Equation Modeling Gas Pressures in Fractured Media(arXiv, 20210705) [Preprint]This paper considers a class of space fractional partial differential equations (FPDEs) that describe gas pressures in fractured media. First, the wellposedness, uniqueness, and the stability in $L_(\infty{R})$of the considered FPDEs are investigated. Then, the reference tracking problem is studied to track the pressure gradient at a downstream location of a channel. This requires manipulation of gas pressure at the downstream location and the use of pressure measurements at an upstream location. To achiever this, the backstepping approach is adapted to the space FPDEs. The key challenge in this adaptation is the nonapplicability of the Lyapunov theory which is typically used to prove the stability of the target system as, the obtained target system is fractional in space. In addition, a backstepping adaptive observer is designed to jointly estimate both the system's state and the disturbance. The stability of the closed loop (reference tracking controller/observer) is also investigated. Finally, numerical simulations are given to evaluate the efficiency of the proposed method.

DeformRS: Certifying Input Deformations with Randomized Smoothing(arXiv, 20210702) [Preprint]Deep neural networks are vulnerable to input deformations in the form of vector fields of pixel displacements and to other parameterized geometric deformations e.g. translations, rotations, etc. Current input deformation certification methods either (i) do not scale to deep networks on large input datasets, or (ii) can only certify a specific class of deformations, e.g. only rotations. We reformulate certification in randomized smoothing setting for both general vector field and parameterized deformations and propose DeformRSVF and DeformRSPar, respectively. Our new formulation scales to large networks on large input datasets. For instance, DeformRSPar certifies rich deformations, covering translations, rotations, scaling, affine deformations, and other visually aligned deformations such as ones parameterized by DiscreteCosineTransform basis. Extensive experiments on MNIST, CIFAR10 and ImageNet show that DeformRSPar outperforms existing stateoftheart in certified accuracy, e.g. improved certified accuracy of 6% against perturbed rotations in the set [10,10] degrees on ImageNet.