Now showing items 1-20 of 842

    • ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

      Kovalev, Dmitry; Shulgin, Egor; Richtarik, Peter; Rogozin, Alexander; Gasnikov, Alexander (arXiv, 2021-02-18) [Preprint]
      We propose ADOM - an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the same as that of accelerated Nesterov gradient method (Nesterov, 2003). To the best of our knowledge, only the algorithm of Rogozin et al. (2019) has a convergence rate with similar properties. However, their algorithm converges under the very restrictive assumption that the number of network changes can not be greater than a tiny percentage of the number of iterations. This assumption is hard to satisfy in practice, as the network topology changes usually can not be controlled. In contrast, ADOM merely requires the network to stay connected throughout time.
    • Delineating the molecular and phenotypic spectrum of the SETD1B-related syndrome

      Weerts, Marjolein J.A.; Lanko, Kristina; Guzmán-Vega, Francisco J.; Jackson, Adam; Ramakrishnan, Reshmi; Cardona-Londoño, Kelly J.; Peña-Guerra, Karla A.; van Bever, Yolande; van Paassen, Barbara W.; Kievit, Anneke; van Slegtenhorst, Marjon; Allen, Nicholas M.; Kehoe, Caroline M.; Robinson, Hannah K.; Pang, Lewis; Banu, Selina H.; Zaman, Mashaya; Efthymiou, Stephanie; Houlden, Henry; Järvelä, Irma; Lauronen, Leena; Määttä, Tuomo; Schrauwen, Isabelle; Leal, Suzanne M; Ruivenkamp, Claudia A.L; Barge-Schaapveld, Daniela Q.C.M.; Peeters-Scholte, Cacha M.P.C.D.; Galehdari, Hamid; Mazaheri, Neda; Sisodiya, Sanjay M; Harrison, Victoria; Sun, Angela; Thies, Jenny; Pedroza, Luis Alberto; Lara-Taranchenko, Yana; Chinn, Ivan K.; Lupski, James R.; Garza-Flores, Alexandra; McGlothlin, Jefferey; Yang, Lin; Huang, Shaoping; Wang, Xiaodong; Jewett, Tamison; Rosso, Gretchen; Lin, Xi; Mohammed, Shehla; Merritt, J. Lawrence; Mirzaa, Ghayda M.; Timms, Andrew E.; Scheck, Joshua; Elting, Mariet; Polstra, Abeltje M.; Schenck, Lauren; Ruzhnikov, Maura R.Z.; Vetro, Annalisa; Montomoli, Martino; Guerrini, Renzo; Koboldt, Daniel C.; Mosher, Theresa Mihalic; Pastore, Matthew T.; McBride, Kim L.; Peng, Jing; Pan, Zou; Willemsen, Marjolein; Koning, Susanne; Turnpenny, Peter D.; de Vries, Bert B.A.; Gilissen, Christian; Pfundt, Rolph; Lees, Melissa; Braddock, Stephen R.; Klemp, Kara C.; Vansenne, Fleur; van Gijn, Marielle; Quindipan, Catherine; Deardorff, Matthew A.; Austin Hamm, J.; Putnam, Abbey M.; Baud, Rebecca; Walsh, Laurence; Lynch, Sally A.; Baptista, Julia; Person, Richard E.; Monaghan, Kristin G.; Crunk, Amy; Keller-Ramey, Jennifer; Reich, Adi; Elloumi, Houda Zghal; Alders, Marielle; Kerkhof, Jennifer; McConkey, Haley; Haghshenas, Sadegheh; Maroofian, Reza; Sadikovic, Bekim; Banka, Siddharth; Arold, Stefan T.; Barakat, Tahsin Stefan; Genomics England Research Consortium (Cold Spring Harbor Laboratory, 2021-02-18) [Preprint]
      ABSTRACTPathogenic variants in SETD1B have been associated with a syndromic neurodevelopmental disorder including intellectual disability, language delay and seizures. To date, clinical features have been described for eleven patients with (likely) pathogenic SETD1B sequence variants. We perform an in-depth clinical characterization of a cohort of 36 unpublished individuals with SETD1B sequence variants, describing their molecular and phenotypic spectrum. Selected variants were functionally tested using in vitro and genome-wide methylation assays. Our data present evidence for a loss-of-function mechanism of SETD1B variants, resulting in a core clinical phenotype of global developmental delay, language delay including regression, intellectual disability, autism and other behavioral issues, and variable epilepsy phenotypes. Developmental delay appeared to precede seizure onset, suggesting SETD1B dysfunction impacts physiological neurodevelopment even in the absence of epileptic activity. Interestingly, males are significantly overrepresented and more severely affected, and we speculate that sex-linked traits could affect susceptibility to penetrance and the clinical spectrum of SETD1B variants. Finally, despite the possibility of non-redundant contributions of SETD1B and its paralogue SETD1A to epigenetic control, the clinical phenotypes of the related disorders share many similarities, indicating that elucidating shared and divergent downstream targets of both genes will help to understand the mechanism leading to the neurobehavioral phenotypes. Insights from this extensive cohort will facilitate the counseling regarding the molecular and phenotypic landscape of newly diagnosed patients with the SETD1B-related syndrome.
    • Multi-dimensional wave steering with higher-order topological phononic crystal

      Xu, Changqing; Chen, Zeguo; Zhang, Guanqing; Ma, Guancong; Wu, Ying (arXiv, 2021-02-17) [Preprint]
      The recent discovery and realizations of higher-order topological insulators enrich the fundamental studies on topological phases. Here, we report three-dimensional (3D) wave-steering capabilities enabled by topological boundary states at three different orders in a 3D phononic crystal with nontrivial bulk topology originated from the synergy of mirror symmetry of the unit cell and a non-symmorphic glide symmetry of the lattice. The multitude of topological states brings diverse possibility of wave manipulations. Through judicious engineering of the boundary modes, we experimentally demonstrate two functionalities at different dimensions: 2D negative refraction of sound wave enabled by a first-order topological surface state with negative dispersion, and a 3D acoustic interferometer leveraging on second-order topological hinge states. Our work showcases that topological modes at different orders promise diverse wave steering applications across different dimensions.
    • Shape-Tailored Deep Neural Networks

      Khan, Naeemullah; Sharma, Angira; Sundaramoorthi, Ganesh; Torr, Philip H. S. (arXiv, 2021-02-16) [Preprint]
      We present Shape-Tailored Deep Neural Networks (ST-DNN). ST-DNN extend convolutional networks (CNN), which aggregate data from fixed shape (square) neighborhoods, to compute descriptors defined on arbitrarily shaped regions. This is natural for segmentation, where descriptors should describe regions (e.g., of objects) that have diverse shape. We formulate these descriptors through the Poisson partial differential equation (PDE), which can be used to generalize convolution to arbitrary regions. We stack multiple PDE layers to generalize a deep CNN to arbitrary regions, and apply it to segmentation. We show that ST-DNN are covariant to translations and rotations and robust to domain deformations, natural for segmentation, which existing CNN based methods lack. ST-DNN are 3-4 orders of magnitude smaller then CNNs used for segmentation. We show that they exceed segmentation performance compared to state-of-the-art CNN-based descriptors using 2-3 orders smaller training sets on the texture segmentation problem.
    • Ultrafast photo-induced enhancement of electron-phonon coupling in metal-halide perovskites

      Laquai, Frédéric; Wang, Mingcong; Gao, Yajun; Wang, Kai; De Wolf, Stefaan (Research Square, 2021-02-15) [Preprint]
      Abstract In metal-halide perovskites (MHPs), the nature of organic cations affects both, the perovskite’s structure and its optoelectronic properties. Using ultrafast pump-probe spectroscopy, we demonstrate that in state-of-the-art mixed-cation MHPs ultrafast photo-induced bandgap narrowing occurs, and linearly depends on the excited carrier density in the range from 10$^{16}$ cm$^{− 3}$ to above 10$^{18}$ cm$^{− 3}$. Furthermore, time-domain terahertz (td-THz) photoconductivity measurements reveal that the majority of carriers are localized and that the localization increases with the carrier density. Both observations, the bandgap narrowing and carrier localization, can be rationalized by ultrafast (sub-2ps) photo-induced enhancement of electron-phonon coupling, originating from dynamic disorder, as clearly evidenced by the presence of a Debye relaxation component in the terahertz photoconductivity spectra. The observation of photo-induced enhancement of electron-phonon coupling and dynamic disorder not only provides specific insight into the polaron-strain distribution of excited states in MHPs, but also adds to the development of a concise picture of the ultrafast physics of this important class of semiconductors.
    • Insertion of PATC-rich C. elegans introns into synthetic transgenes by golden-gate-based cloning

      Frøkjær-Jensen, Christian (Research Square, 2021-02-15) [Preprint]
      Transgenes are particularly prone to epigenetic silencing in the C. elegans germline. Here, we describe a protocol to insert introns containing a class of non-coding DNA named Periodic An/Tn Clusters (PATCs) into synthetic transgenes. PATCs can protect transgenes from position-dependent silencing (Position Effect Variegation, PEV) and from silencing in simple extra-chromosomal arrays. Using a set of simple design rules, it is possible to routinely insert up to three PATC-rich introns into a synthetic transgene in a single reaction.
    • On the Impact of Device and Behavioral Heterogeneity in Federated Learning

      Abdelmoniem, Ahmed M.; Ho, Chen-Yu; Papageorgiou, Pantelis; Bilal, Muhammad; Canini, Marco (arXiv, 2021-02-15) [Preprint]
      Federated learning (FL) is becoming a popular paradigm for collaborative learning over distributed, private datasets owned by non-trusting entities. FL has seen successful deployment in production environments, and it has been adopted in services such as virtual keyboards, auto-completion, item recommendation, and several IoT applications. However, FL comes with the challenge of performing training over largely heterogeneous datasets, devices, and networks that are out of the control of the centralized FL server. Motivated by this inherent setting, we make a first step towards characterizing the impact of device and behavioral heterogeneity on the trained model. We conduct an extensive empirical study spanning close to 1.5K unique configurations on five popular FL benchmarks. Our analysis shows that these sources of heterogeneity have a major impact on both model performance and fairness, thus sheds light on the importance of considering heterogeneity in FL system design.
    • Decentralized Distributed Optimization for Saddle Point Problems

      Rogozin, Alexander; Beznosikov, Alexander; Dvinskikh, Darina; Kovalev, Dmitry; Dvurechensky, Pavel; Gasnikov, Alexander (arXiv, 2021-02-15) [Preprint]
      We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to have global and local groups of variables. For the proposed algorithm we prove non-asymptotic convergence rate estimates with explicit dependence on the network characteristics. To supplement the convergence rate analysis, we propose lower bounds for strongly-convex-strongly-concave and convex-concave saddle-point problems over arbitrary connected undirected networks. We illustrate the considered problem setting by a particular application to distributed calculation of non-regularized Wasserstein barycenters.
    • Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

      Safaryan, Mher; Hanzely, Filip; Richtarik, Peter (arXiv, 2021-02-14) [Preprint]
      Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data. Recent advancements in the field provide several mechanisms for speeding up the training, including {\em compressed communication}, {\em variance reduction} and {\em acceleration}. However, none of these methods is capable of exploiting the inherently rich data-dependent smoothness structure of the local losses beyond standard smoothness constants. In this paper, we argue that when training supervised models, {\em smoothness matrices} -- information-rich generalizations of the ubiquitous smoothness constants -- can and should be exploited for further dramatic gains, both in theory and practice. In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses. To showcase the power of this tool, we describe how our sparsification technique can be adapted to three distributed optimization algorithms -- DCGD, DIANA and ADIANA -- yielding significant savings in terms of communication complexity. The new methods always outperform the baselines, often dramatically so.
    • Distributed Second Order Methods with Fast Rates and Compressed Communication

      Islamov, Rustem; Qian, Xun; Richtarik, Peter (arXiv, 2021-02-14) [Preprint]
      We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity.
    • Modeling Spatial Data with Cauchy Convolution Processes

      Krupskii, Pavel; Huser, Raphaël (arXiv, 2021-02-14) [Preprint]
      We study the class of models for spatial data obtained from Cauchy convolution processes based on different types of kernel functions. We show that the resulting spatial processes have some appealing tail dependence properties, such as tail dependence at short distances and independence at long distances with suitable kernel functions. We derive the extreme-value limits of these processes, study their smoothness properties, and consider some interesting special cases, including Marshall-Olkin and H\"usler-Reiss processes. We further consider mixtures between such Cauchy processes and Gaussian processes, in order to have a separate control over the bulk and the tail dependence behaviors. Our proposed approach for estimating model parameters relies on matching model-based and empirical summary statistics, while the corresponding extreme-value limit models may be fitted using a pairwise likelihood approach. We show with a simulation study that our proposed inference approach yields accurate estimates. Moreover, the proposed class of models allows for a wide range of flexible dependence structures, and we demonstrate our new methodology by application to a temperature dataset. Our results indicate that our proposed model provides a very good fit to the data, and that it captures both the bulk and the tail dependence structures accurately.
    • A General Framework for Liquid Marbles

      Jr., Adair Gallo; Tavares, Fernanda; Das, Ratul; Mishra, Himanshu (arXiv, 2021-02-12) [Preprint]
      Liquid marbles refer to liquid droplets that are covered with a layer of non-wetting particles. They are observed in nature and have practical significance. However, a generalized framework for analyzing liquid marbles as they inflate or deflate is unavailable. The present study fills this gap by developing an analytical framework based on liquid-particle and particle-particle interactions. We demonstrate that the potential final states of evaporating liquid marbles are characterized by one of the following: (I) constant surface area, (II) particle ejection, or (III) multilayering. Based on these insights, a single-parameter evaporation model for liquid marbles is developed. Model predictions are in excellent agreement with experimental evaporation data for water liquid marbles of particle sizes ranging from 7 nanometers to 300 micrometers (over four orders of magnitude) and chemical compositions ranging from hydrophilic to superhydrophobic. These findings lay the groundwork for the rational design of liquid marble applications.
    • Material absorption-based carrier generation model for modeling optoelectronic devices

      Chen, Liang; Bagci, Hakan (arXiv, 2021-02-12) [Preprint]
      The generation rate of photocarriers in optoelectronic materials is commonly calculated using the Poynting vector in the frequency domain. In time-domain approaches where the nonlinear coupling between electromagnetic (EM) waves and photocarriers can be accounted for, the Poynting vector model is no longer applicable. One main reason is that the photocurrent radiates low-frequency EM waves out of the spectrum of the source, e.g., terahertz (THz) waves are generated in THz photoconductive antennas. These frequency components do not contribute to the photocarrier generation since the corresponding photon energy is smaller than the optoelectronic material's bandgap energy. However, the instantaneous Poynting vector does not distinguish the power flux of different frequency components. This work proposes a material absorption-based model capable of calculating the carrier generation rate accurately in the time domain. Using the Lorentz dispersion model with poles reside in the optical frequency region, the instantaneous optical absorption, which corresponds to the power dissipation in the polarization, is calculated and used to calculate the generation rate. The Lorentz model is formulated with an auxiliary differential equation method that updates the polarization current density, from which the absorbed optical power corresponding to each Lorentz pole is directly calculated in the time domain. Examples show that the proposed model is more accurate than the Poynting vector-based model and is stable even when the generated low-frequency component is strong.
    • Proximal and Federated Random Reshuffling

      Mishchenko, Konstantin; Khaled, Ahmed; Richtarik, Peter (arXiv, 2021-02-12) [Preprint]
      Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization. We propose two new algorithms: Proximal and Federated Random Reshuffing (ProxRR and FedRR). The first algorithm, ProxRR, solves composite convex finite-sum minimization problems in which the objective is the sum of a (potentially non-smooth) convex regularizer and an average of $n$ smooth objectives. We obtain the second algorithm, FedRR, as a special case of ProxRR applied to a reformulation of distributed problems with either homogeneous or heterogeneous data. We study the algorithms' convergence properties with constant and decreasing stepsizes, and show that they have considerable advantages over Proximal and Local SGD. In particular, our methods have superior complexities and ProxRR evaluates the proximal operator once per epoch only. When the proximal operator is expensive to compute, this small difference makes ProxRR up to $n$ times faster than algorithms that evaluate the proximal operator in every iteration. We give examples of practical optimization tasks where the proximal operator is difficult to compute and ProxRR has a clear advantage. Finally, we corroborate our results with experiments on real data sets.
    • Genome-Wide Association Study Reveals Genetic Architecture of Septoria Tritici Blotch Resistance in a Historic Landrace Collection

      Dutta, Anik; Croll, Daniel; McDonald, Bruce A.; Krattinger, Simon G. (Research Square, 2021-02-09) [Preprint]
      Abstract Septoria tritici blotch (STB), caused by the fungus Zymoseptoria tritici, is a major constraint in global wheat production. The lack of genetic diversity in modern elite wheat cultivars largely hinders the improvement of STB resistance. Wheat landraces are reservoirs of untapped genetic diversity, which can be exploited to find novel STB resistance genes or alleles. Here, we characterized 188 Swiss wheat landraces for resistance to STB using four Swiss Z. tritici isolates. We used a genome-wide association study (GWAS) to identify genetic variants associated with the amount of lesion and pycnidia production by the fungus. The majority of the landraces were highly resistant for both traits to the isolate 1E4, indicating a gene-for-gene relationship, while higher phenotypic variability was observed against other isolates. GWAS detected a significant SNP on chromosome 3A that was associated with both traits in the isolate 1E4. The resistance response against 1E4 was likely controlled by the Stb6 gene. Sanger sequencing revealed that the majority of these ~100-year-old landraces carry the Stb6 resistance allele. This indicates the importance of this gene in Switzerland during the early 1900s for disease control in the field. Our study demonstrates the importance of characterizing historic landrace collections for STB resistance to provide valuable information on resistance variability and contributing alleles. This will help breeders in the future to make decisions on integrating such germplasms in STB resistance breeding.
    • Reduction of the Beam Pointing Error for Improved Free-Space Optical Communication Link Performance

      N'Doye, I.; Cai, W.; Al-Alwan, Asem Ibrahim Alwan; Sun, X.; Headary, W. G.; Alouini, M. -S.; Ooi, B. -S.; Laleg-Kirati, T. -M. (arXiv, 2021-02-09) [Preprint]
      Free-space optical communication is emerging as a low-power, low-cost, and high data rate alternative to radio-frequency communication in short-to medium-range applications. However, it requires a close-to-line-of-sight link between the transmitter and the receiver. This paper proposes a robust $\cHi$ control law for free-space optical (FSO) beam pointing error systems under controlled weak turbulence conditions. The objective is to maintain the transmitter-receiver line, which means the center of the optical beam as close as possible to the center of the receiving aperture within a prescribed disturbance attenuation level. First, we derive an augmented nonlinear discrete-time model for pointing error loss due to misalignment caused by weak atmospheric turbulence. We then investigate the $\cHi$-norm optimization problem that guarantees the closed-loop pointing error is stable and ensures the prescribed weak disturbance attenuation. Furthermore, we evaluate the closed-loop outage probability error and bit error rate (BER) that quantify the free-space optical communication performance in fading channels. Finally, the paper concludes with a numerical simulation of the proposed approach to the FSO link's error performance.
    • Establishing and Maintaining a Reliable Optical Wireless Communication in Underwater Environment

      Ndoye, Ibrahima; Zhang, Ding; Alouini, Mohamed-Slim; Laleg-Kirati, Taous-Meriem (arXiv, 2021-02-09) [Preprint]
      This paper proposes the trajectory tracking problem between an autonomous underwater vehicle (AUV) and a mobile surface ship, both equipped with optical communication transceivers. The challenging issue is to maintain stable connectivity between the two autonomous vehicles within an optical communication range. We define a directed optical line-of-sight (LoS) link between the two-vehicle systems. The transmitter is mounted on the AUV while the surface ship is equipped with an optical receiver. However, this optical communication channel needs to preserve a stable transmitter-receiver position to reinforce service quality, which typically includes a bit rate and bit error rates. A cone-shaped beam region of the optical receiver is approximated based on the channel model; then, a minimum bit rate is ensured if the AUV transmitter remains inside of this region. Additionally, we design two control algorithms for the transmitter to drive the AUV and maintain it in the cone-shaped beam region under an uncertain oceanic environment. Lyapunov function-based analysis that ensures asymptotic stability of the resulting closed-loop tracking error is used to design the proposed NLPD controller. Numerical simulations are performed using MATLAB/Simulink to show the controllers' ability to achieve favorable tracking in the presence of the solar background noise within competitive times. Finally, results demonstrate the proposed NLPD controller improves the tracking error performance more than $70\%$ under nominal conditions and $35\%$ with model uncertainties and disturbances compared to the original PD strategy.
    • A Joint Inversion-Segmentation approach to Assisted Seismic Interpretation

      Ravasi, Matteo; Birnie, Claire Emma (arXiv, 2021-02-07) [Preprint]
      Structural seismic interpretation and quantitative characterization are historically intertwined processes. The latter provides estimates of properties of the subsurface which can be used to aid structural interpretation alongside the original seismic data and a number of other seismic attributes. In this work, we redefine this process as a inverse problem which tries to jointly estimate subsurface properties (i.e., acoustic impedance) and a piece-wise segmented representation of the subsurface based on user-defined macro-classes. By inverting for the quantities simultaneously, the inversion is primed with prior knowledge about the regions of interest, whilst at the same time it constrains this belief with the actual seismic measurements. As the proposed functional is separable in the two quantities, these are optimized in an alternating fashion, where each subproblem is solved using a Primal-Dual algorithm. Subsequently, each class is used as input to a workflow which aims to extract the perimeter of the detected shapes and to produce unique horizons. The effectiveness of the proposed method is illustrated through numerical examples on synthetic and field datasets.
    • DOA Estimation with Non-Uniform Linear Arrays: A Phase-Difference Projection Approach

      Chen, Hui; Ballal, Tarig; Al-Naffouri, Tareq Y. (arXiv, 2021-02-06) [Preprint]
      Phase wrapping is a major problem in direction-of-arrival (DOA) estimation using phase-difference observations. For a sensor pair with an inter-sensor spacing greater than half of the wavelength ($\lambda/2$) of the signal, phase wrapping occurs at certain DOA angles leading to phase-difference ambiguities. Existing phase unwrapping methods exploit either frequency or spatial diversity. These techniques work by imposing restrictions on the utilized frequencies or the receiver array geometry. In addition to sensitivity to noise and calibration errors, these methods may also have high computational complexity. We propose a grid-less \emph{phase-difference projection} (PDP) DOA algorithm to overcome these issues. The concept of \emph{wrapped phased-difference pattern} (WPDP) is introduced, which allows the proposed algorithm to compute most of the parameters required for DOA estimation in an offline manner, hence resulting in a superior computational speed in realtime. Simulation results demonstrate the excellent performance of the proposed algorithm, both in terms of accuracy and speed.
    • DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

      Kostopoulou, Kelly; Xu, Hang; Dutta, Aritra; Li, Xin; Ntoulas, Alexandros; Kalnis, Panos (arXiv, 2021-02-05) [Preprint]
      Sparse tensors appear frequently in distributed deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process. Existing communication primitives are agnostic to the peculiarities of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored for distributed deep learning. DeepReduce decomposes sparse tensors in two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of common compressors, such as Deflate for values, or run-length encoding for indices. We also propose two novel compression schemes that achieve superior results: curve fitting-based for values and bloom filter-based for indices. DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead. As proof of concept, we implement our approach on Tensorflow and PyTorch. Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods, without affecting the training accuracy.