For more information visit:

Recent Submissions

  • Quaternion Factorization Machines: A Lightweight Solution to Intricate Feature Interaction Modelling

    Chen, Tong; Yin, Hongzhi; Zhang, Xiangliang; Huang, Zi; Wang, Yang; Wang, Meng (arXiv, 2021-04-05) [Preprint]
    As a well-established approach, factorization machine (FM) is capable of automatically learning high-order interactions among features to make predictions without the need for manual feature engineering. With the prominent development of deep neural networks (DNNs), there is a recent and ongoing trend of enhancing the expressiveness of FM-based models with DNNs. However, though better results are obtained with DNN-based FM variants, such performance gain is paid off by an enormous amount (usually millions) of excessive model parameters on top of the plain FM. Consequently, the heavy parameterization impedes the real-life practicality of those deep models, especially efficient deployment on resource-constrained IoT and edge devices. In this paper, we move beyond the traditional real space where most deep FM-based models are defined, and seek solutions from quaternion representations within the hypercomplex space. Specifically, we propose the quaternion factorization machine (QFM) and quaternion neural factorization machine (QNFM), which are two novel lightweight and memory-efficient quaternion-valued models for sparse predictive analytics. By introducing a brand new take on FM-based models with the notion of quaternion algebra, our models not only enable expressive inter-component feature interactions, but also significantly reduce the parameter size due to lower degrees of freedom in the hypercomplex Hamilton product compared with real-valued matrix multiplication. Extensive experimental results on three large-scale datasets demonstrate that QFM achieves 4.36% performance improvement over the plain FM without introducing any extra parameters, while QNFM outperforms all baselines with up to two magnitudes' parameter size reduction in comparison to state-of-the-art peer methods.
  • Finding Nano-Ötzi: Semi-Supervised Volume Visualization for Cryo-Electron Tomography

    Nguyen, Ngan; Bohak, Ciril; Engel, Dominik; Mindek, Peter; Strnad, Ondrej; Wonka, Peter; Li, Sai; Ropinski, Timo; Viola, Ivan (arXiv, 2021-04-04) [Preprint]
    Cryo-Electron Tomography (cryo-ET) is a new 3D imaging technique with unprecedented potential for resolving submicron structural detail. Existing volume visualization methods, however, cannot cope with its very low signal-to-noise ratio. In order to design more powerful transfer functions, we propose to leverage soft segmentation as an explicit component of visualization for noisy volumes. Our technical realization is based on semi-supervised learning where we combine the advantages of two segmentation algorithms. A first weak segmentation algorithm provides good results for propagating sparse user provided labels to other voxels in the same volume. This weak segmentation algorithm is used to generate dense pseudo labels. A second powerful deep-learning based segmentation algorithm can learn from these pseudo labels to generalize the segmentation to other unseen volumes, a task that the weak segmentation algorithm fails at completely. The proposed volume visualization uses the deep-learning based segmentation as a component for segmentation-aware transfer function design. Appropriate ramp parameters can be suggested automatically through histogram analysis. Finally, our visualization uses gradient-free ambient occlusion shading to further suppress visual presence of noise, and to give structural detail desired prominence. The cryo-ET data studied throughout our technical experiments is based on the highest-quality tilted series of intact SARS-CoV-2 virions. Our technique shows the high impact in target sciences for visual data analysis of very noisy volumes that cannot be visualized with existing techniques.
  • Uniting Heterogeneity, Inductiveness, and Efficiency for Graph Representation Learning

    Chen, Tong; Yin, Hongzhi; Ren, Jie; Huang, Zi; Zhang, Xiangliang; Wang, Hao (arXiv, 2021-04-04) [Preprint]
    With the ubiquitous graph-structured data in various applications, models that can learn compact but expressive vector representations of nodes have become highly desirable. Recently, bearing the message passing paradigm, graph neural networks (GNNs) have greatly advanced the performance of node representation learning on graphs. However, a majority class of GNNs are only designed for homogeneous graphs, leading to inferior adaptivity to the more informative heterogeneous graphs with various types of nodes and edges. Also, despite the necessity of inductively producing representations for completely new nodes (e.g., in streaming scenarios), few heterogeneous GNNs can bypass the transductive learning scheme where all nodes must be known during training. Furthermore, the training efficiency of most heterogeneous GNNs has been hindered by their sophisticated designs for extracting the semantics associated with each meta path or relation. In this paper, we propose WIde and DEep message passing Network (WIDEN) to cope with the aforementioned problems about heterogeneity, inductiveness, and efficiency that are rarely investigated together in graph representation learning. In WIDEN, we propose a novel inductive, meta path-free message passing scheme that packs up heterogeneous node features with their associated edges from both low- and high-order neighbor nodes. To further improve the training efficiency, we innovatively present an active downsampling strategy that drops unimportant neighbor nodes to facilitate faster information propagation. Experiments on three real-world heterogeneous graphs have further validated the efficacy of WIDEN on both transductive and inductive node representation learning, as well as the superior training efficiency against state-of-the-art baselines.
  • Fast-adapting and Privacy-preserving Federated Recommender System

    Wang, Qinyong; Yin, Hongzhi; Chen, Tong; Yu, Junliang; Zhou, Alexander; Zhang, Xiangliang (arXiv, 2021-04-02) [Preprint]
    In the mobile Internet era, recommender systems have become an irreplaceable tool to help users discover useful items, thus alleviating the information overload problem. Recent research on deep neural network (DNN)-based recommender systems have made significant progress in improving prediction accuracy, largely attributed to the widely accessible large-scale user data. Such data is commonly collected from users' personal devices, and then centrally stored in the cloud server to facilitate model training. However, with the rising public concerns on user privacy leakage in online platforms, online users are becoming increasingly anxious over abuses of user privacy. Therefore, it is urgent and beneficial to develop a recommender system that can achieve both high prediction accuracy and strong privacy protection. To this end, we propose a DNN-based recommendation model called PrivRec running on the decentralized federated learning (FL) environment, which ensures that a user's data is fully retained on her/his personal device while contributing to training an accurate model. On the other hand, to better embrace the data heterogeneity (e.g., users' data vary in scale and quality significantly) in FL, we innovatively introduce a first-order meta-learning method that enables fast on-device personalization with only a few data points. Furthermore, to defend against potential malicious participants that pose serious security threat to other users, we further develop a user-level differentially private model, namely DP-PrivRec, so attackers are unable to identify any arbitrary user from the trained model. Finally, we conduct extensive experiments on two large-scale datasets in a simulated FL environment, and the results validate the superiority of both PrivRec and DP-PrivRec.
  • Identifying Novel Drug Targets by iDTPnd: A Case Study of Kinase Inhibitors.

    Naveed, Hammad; Reglin, Corinna; Schubert, Thomas; Gao, Xin; Arold, Stefan T.; Maitland, Michael L (Genomics, proteomics & bioinformatics, Elsevier BV, 2021-04-01) [Article]
    Current FDA-approved kinase inhibitors cause diverse adverse effects, some of which are due to the mechanism-independent effects of these drugs. Identifying these mechanism-independent interactions could improve drug safety and support drug repurposing. We have developed iDTPnd (integrated Drug Target Predictor with negative dataset), a computational approach for large-scale discovery of novel targets for known drugs. For a given drug, we construct a positive and a negative structural signature that captures the weakly conserved structural features of drug binding sites. To facilitate assessment of unintended targets, iDTPnd also provides a docking-based interaction score and its statistical significance. We were able to confirm the interaction of sorafenib, imatinib, dasatinib, sunitinib, and pazopanib with their known targets at a sensitivity and specificity of 52% and 55%, respectively. We have validated 10 predicted novel targets by using in vitro experiments. Our results suggest that proteins other than kinases, such as nuclear receptors, cytochrome P450, or MHC Class I molecules can also be physiologically relevant targets of kinase inhibitors. Our method is general and broadly applicable for the identification of protein-small molecule interactions, when sufficient drug-target 3D data are available. The code for constructing the structural signature is available at
  • Towards Similarity-based Differential Diagnostics For Common Diseases

    Slater, Luke T; Karwath, Andreas; Williams, John A.; Russell, Sophie; Makepeace, Silver; Carberry, Alexander; Hoehndorf, Robert; Gkoutos, Georgios (Computers in Biology and Medicine, Elsevier BV, 2021-04-01) [Article]
    Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.
  • arthurlirui/refsepECCV2020: Code for Reflection Separation via Multi-bounce Polarization State Tracing

    Li, Rui; Qiu, Simeng; Zang, Guangming; Heidrich, Wolfgang (Github, 2021-03-31) [Software]
    Code for Reflection Separation via Multi-bounce Polarization State Tracing
  • Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging

    Chugunov, Ilya; Baek, Seung-Hwan; Fu, Qiang; Heidrich, Wolfgang; Heide, Felix (arXiv, 2021-03-30) [Preprint]
    We introduce Mask-ToF, a method to reduce flying pixels (FP) in time-of-flight (ToF) depth captures. FPs are pervasive artifacts which occur around depth edges, where light paths from both an object and its background are integrated over the aperture. This light mixes at a sensor pixel to produce erroneous depth estimates, which can adversely affect downstream 3D vision tasks. Mask-ToF starts at the source of these FPs, learning a microlens-level occlusion mask which effectively creates a custom-shaped sub-aperture for each sensor pixel. This modulates the selection of foreground and background light mixtures on a per-pixel basis and thereby encodes scene geometric information directly into the ToF measurements. We develop a differentiable ToF simulator to jointly train a convolutional neural network to decode this information and produce high-fidelity, low-FP depth reconstructions. We test the effectiveness of Mask-ToF on a simulated light field dataset and validate the method with an experimental prototype. To this end, we manufacture the learned amplitude mask and design an optical relay system to virtually place it on a high-resolution ToF sensor. We find that Mask-ToF generalizes well to real data without retraining, cutting FP counts in half.
  • Labels4Free: Unsupervised Segmentation using StyleGAN

    Abdal, Rameen; Zhu, Peihao; Mitra, Niloy; Wonka, Peter (arXiv, 2021-03-27) [Preprint]
    We propose an unsupervised segmentation framework for StyleGAN generated objects. We build on two main observations. First, the features generated by StyleGAN hold valuable information that can be utilized towards training segmentation networks. Second, the foreground and background can often be treated to be largely independent and be composited in different ways. For our solution, we propose to augment the StyleGAN2 generator architecture with a segmentation branch and to split the generator into a foreground and background network. This enables us to generate soft segmentation masks for the foreground object in an unsupervised fashion. On multiple object classes, we report comparable results against state-of-the-art supervised segmentation networks, while against the best unsupervised segmentation approach we demonstrate a clear improvement, both in qualitative and quantitative metrics.
  • Hierarchical Hyperedge Embedding-based Representation Learning for Group Recommendation

    Guo, Lei; Yin, Hongzhi; Chen, Tong; Zhang, Xiangliang; Zheng, Kai (arXiv, 2021-03-24) [Preprint]
    In this work, we study group recommendation in a particular scenario, namely Occasional Group Recommendation (OGR). Most existing works have addressed OGR by aggregating group members' personal preferences to learn the group representation. However, the representation learning for a group is most complex beyond the fusion of group member representation, as the personal preferences and group preferences may be in different spaces. In addition, the learned user representation is not accurate due to the sparsity of users' interaction data. Moreover, the group similarity in terms of common group members has been overlooked, which however has the great potential to improve the group representation learning. In this work, we focus on addressing the above challenges in group representation learning task, and devise a hierarchical hyperedge embedding-based group recommender, namely HyperGroup. Specifically, we propose to leverage the user-user interactions to alleviate the sparsity issue of user-item interactions, and design a GNN-based representation learning network to enhance the learning of individuals' preferences from their friends' preferences, which provides a solid foundation for learning groups' preferences. To exploit the group similarity to learn a more accurate group representation from highly limited group-item interactions, we connect all groups as a network of overlapping sets, and treat the task of group preference learning as embedding hyperedges in a hypergraph, where an inductive hyperedge embedding method is proposed. To further enhance the group-level preference modeling, we develop a joint training strategy to learn both user-item and group-item interactions in the same process. We conduct extensive experiments on two real-world datasets and the experimental results demonstrate the superiority of our proposed HyperGroup in comparison to the state-of-the-art baselines.
  • RDMA is Turing complete, we just did not know it yet!

    Reda, Waleed; Canini, Marco; Kostić, Dejan; Peter, Simon (arXiv, 2021-03-24) [Preprint]
    It is becoming increasingly popular for distributed systems to exploit network offload to alleviate load on the CPU. Remote Direct Memory Access (RDMA) NICs (RNICs) are one such device, allowing applications to offload remote memory accesses. However, RDMA still requires CPU intervention for complex offloads, beyond simple remote memory access. As such, the offload potential for RNICs is limited and RDMA-based systems usually have to work around such limitations. We present RedN, a principled, practical approach to implementing complex RNIC offloads, without requiring any hardware modifications. Using self-modifying RDMA chains, we lift the existing RDMA verbs interface to a Turing complete set of programming abstractions. We explore what is possible in terms of offload complexity and performance with just a commodity RNIC. Through a key-value store use case study, we show how to integrate complex RNIC offloads into existing applications. RedN can outperform one and two-sided RDMA implementations by up to 3x and 7.8x for key-value get operations and performance isolation, respectively, and provide applications with failure resiliency to OS and process crashes.
  • Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach

    Kuwahara, Hiroyuki; Gao, Xin (Journal of Cheminformatics, Springer Science and Business Media LLC, 2021-03-23) [Article]
    AbstractTwo-dimensional (2D) chemical fingerprints are widely used as binary features for the quantification of structural similarity of chemical compounds, which is an important step in similarity-based virtual screening (VS). Here, using an eigenvalue-based entropy approach, we identified 2D fingerprints with little to no contribution to shaping the eigenvalue distribution of the feature matrix as related ones and examined the degree to which these related 2D fingerprints influenced molecular similarity scores calculated with the Tanimoto coefficient. Our analysis identified many related fingerprints in publicly available fingerprint schemes and showed that their presence in the feature set could have substantial effects on the similarity scores and bias the outcome of molecular similarity analysis. Our results have implication in the optimal selection of 2D fingerprints for compound similarity analysis and the identification of potential hits for compounds with target biological activity in VS.
  • Networking research for the Arab world

    Shihada, Basem; Elbatt, Tamer; Eltawil, Ahmed; Mansour, Mohammad; Sabir, Essaid; Rekhis, Slim; Sharafeddine, Sanaa (Communications of the ACM, Association for Computing Machinery (ACM), 2021-03-22) [Article]
    THE ARAB REGION, composed of 22 countries spanning Asia and Africa, opens ample room for communications and networking innovations and services and contributes to the critical mass of the global networking innovation. While the Arab world is considered an emerging market for communications and networking services, the rate of adoption is outpacing the global average. In fact, as of 2019, the mobile Internet penetration stands at 67.2% in the Arab world, as opposed to a global average of 56.5%.
  • Transfer Deep Learning for Reconfigurable Snapshot HDR Imaging Using Coded Masks

    Alghamdi, Masheal M.; Fu, Qiang; Thabet, Ali Kassem; Heidrich, Wolfgang (Computer Graphics Forum, Wiley, 2021-03-11) [Article]
    High dynamic range (HDR) image acquisition from a single image capture, also known as snapshot HDR imaging, is challenging because the bit depths of camera sensors are far from sufficient to cover the full dynamic range of the scene. Existing HDR techniques focus either on algorithmic reconstruction or hardware modification to extend the dynamic range. In this paper we propose a joint design for snapshot HDR imaging by devising a spatially varying modulation mask in the hardware and building a deep learning algorithm to reconstruct the HDR image. We leverage transfer learning to overcome the lack of sufficiently large HDR datasets available. We show how transferring from a different large-scale task (image classification on ImageNet) leads to considerable improvements in HDR reconstruction. We achieve a reconfigurable HDR camera design that does not require custom sensors, and instead can be reconfigured between HDR and conventional mode with very simple calibration steps. We demonstrate that the proposed hardware–software so lution offers a flexible yet robust way to modulate per-pixel exposures, and the network requires little knowledge of the hardware to faithfully reconstruct the HDR image. Comparison results show that our method outperforms the state of the art in terms of visual perception quality.
  • 1'-Ribose cyano substitution allows Remdesivir to effectively inhibit nucleotide addition and proofreading during SARS-CoV-2 viral RNA replication.

    Zhang, Lu; Zhang, Dong; Wang, Xiaowei; Yuan, Congmin; Li, Yongfang; Jia, Xilin; Gao, Xin; Yen, Hui-Ling; Cheung, Peter Pak-Hang; Huang, Xuhui (Physical chemistry chemical physics : PCCP, Royal Society of Chemistry (RSC), 2021-03-10) [Article]
    COVID-19 has recently caused a global health crisis and an effective interventional therapy is urgently needed. Remdesivir is one effective inhibitor for SARS-CoV-2 viral RNA replication. It supersedes other NTP analogues because it not only terminates the polymerization activity of RNA-dependent RNA polymerase (RdRp), but also inhibits the proofreading activity of intrinsic exoribonuclease (ExoN). Even though the static structure of Remdesivir binding to RdRp has been solved and biochemical experiments have suggested it to be a "delayed chain terminator", the underlying molecular mechanisms is not fully understood. Here, we performed all-atom molecular dynamics (MD) simulations with an accumulated simulation time of 24 microseconds to elucidate the inhibitory mechanism of Remdesivir on nucleotide addition and proofreading. We found that when Remdesivir locates at an upstream site in RdRp, the 1'-cyano group experiences electrostatic interactions with a salt bridge (Asp865-Lys593), which subsequently halts translocation. Our findings can supplement the current understanding of the delayed chain termination exerted by Remdesivir and provide an alternative molecular explanation about Remdesivir's inhibitory mechanism. Such inhibition also reduces the likelihood of Remdesivir to be cleaved by ExoN acting on 3'-terminal nucleotides. Furthermore, our study also suggests that Remdesivir's 1'-cyano group can disrupt the cleavage site of ExoN via steric interactions, leading to a further reduction in the cleavage efficiency. Our work provides plausible and novel mechanisms at the molecular level of how Remdesivir inhibits viral RNA replication, and our findings may guide rational design for new treatments of COVID-19 targeting viral replication.
  • DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes.

    Liu-Wei, Wang; Kafkas, Senay; Chen, Jun; Dimonaco, Nicholas J; Tegner, Jesper; Hoehndorf, Robert (Bioinformatics (Oxford, England), Oxford University Press (OUP), 2021-03-08) [Article]
    MotivationInfectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus-host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts.ResultsWe developed DeepViral, a deep learning based method that predicts protein-protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction.AvailabilityCode and datasets for reproduction and customization are available at Prediction results for 14 virus families are available at
  • DeepMOCCA: A pan-cancer prognostic model identifies personalized prognostic markers through graph attention and multi-omics data integration

    Althubaiti, Sara; Kulmanov, Maxat; Liu, Yang; Gkoutos, Georgios; Schofield, Paul N.; Hoehndorf, Robert (Cold Spring Harbor Laboratory, 2021-03-03) [Preprint]
    Combining multiple types of genomic, transcriptional, proteomic, and epigenetic datasets has the potential to reveal biological mechanisms across multiple scales, and may lead to more accurate models for clinical decision support. Developing efficient models that can derive clinical outcomes from high-dimensional data remains problematical; challenges include the integration of multiple types of omics data, inclusion of biological background knowledge, and developing machine learning models that are able to deal with this high dimensionality while having only few samples from which to derive a model. We developed DeepMOCCA, a framework for multi-omics cancer analysis. We combine different types of omics data using biological relations between genes, transcripts, and proteins, combine the multi-omics data with background knowledge in the form of protein-protein interaction networks, and use graph convolution neural networks to exploit this combination of multi-omics data and background knowledge. DeepMOCCA predicts survival time for individual patient samples for 33 cancer types and outperforms most existing survival prediction methods. Moreover, DeepMOCCA includes a graph attention mechanism which prioritizes driver genes and prognostic markers in a patient-specific manner; the attention mechanism can be used to identify drivers and prognostic markers within cohorts and individual patients.
  • DeeReCT-APA: Prediction of Alternative Polyadenylation Site Usage Through Deep Learning

    Li, Zhongxiao; Li, Yisheng; Zhang, Bin; Li, Yu; Long, Yongkang; Zhou, Juexiao; Zou, Xudong; Zhang, Min; Hu, Yuhui; Chen, Wei; Gao, Xin (Genomics, Proteomics & Bioinformatics, Elsevier BV, 2021-03-02) [Article]
    Alternative polyadenylation (APA) is a crucial step in post-transcriptional regulation. Previous bioinformatic works have mainly focused on the recognition of polyadenylation sites (PASs) in a given genomic sequence, which is a binary classification problem. Recently, computational methods for predicting the usage level of alternative PASs in a same gene have been proposed. However, all of them cast the problem as a non-quantitative pairwise comparison task and do not take the competition among multiple PASs into account. To address this, here we propose a deep learning architecture, DeeReCT-APA, to quantitatively predict the usage of all alternative PASs of a given gene. To accommodate different genes with potentially different numbers of PASs, DeeReCT-APA treats the problem as a regression task with a variable-length target. Based on a CNN-LSTM architecture, DeeReCT-APA extracts sequence features with CNN layers, uses bidirectional LSTM to explicitly model the interactions among competing PASs, and outputs percentage scores representing the usage levels of all PASs of a gene. In addition to the fact that only our method can predict quantitatively the usage of all the PASs within a gene, we show that our method consistently outperforms other existing methods on three different tasks for which they are trained: pairwise comparison task, highest usage prediction task, and ranking task. Finally, we demonstrate that our method can be used to predict the effect of genetic variations on APA patterns and shed light on future mechanistic understanding in APA regulation. Our code and data are available at
  • Enabling a large-scale assessment of litter along Saudi Arabian red sea shores by combining drones and machine learning.

    Martin, Cecilia; Zhang, Qiannan; Zhai, Dongjun; Zhang, Xiangliang; Duarte, Carlos M. (Environmental pollution (Barking, Essex : 1987), Elsevier BV, 2021-03-02) [Article]
    Beach litter assessments rely on time inefficient and high human cost protocols, mining the attainment of global beach litter estimates. Here we show the application of an emerging technique, the use of drones for acquisition of high-resolution beach images coupled with machine learning for their automatic processing, aimed at achieving the first national-scale beach litter survey completed by only one operator. The aerial survey had a time efficiency of 570 ± 40 m2 min-1 and the machine learning reached a mean (±SE) detection sensitivity of 59 ± 3% with high resolution images. The resulting mean (±SE) litter density on Saudi Arabian shores of the Red Sea is of 0.12 ± 0.02 litter items m-2, distributed independently of the population density in the area around the sampling station. Instead, accumulation of litter depended on the exposure of the beach to the prevailing wind and litter composition differed between islands and the main shore, where recreational activities are the major source of anthropogenic debris.
  • ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

    Li, Zhize; Richtarik, Peter (arXiv, 2021-03-02) [Preprint]
    We propose ZeroSARAH -- a novel variant of the variance-reduced method SARAH (Nguyen et al., 2017) -- for minimizing the average of a large number of nonconvex functions $\frac{1}{n}\sum_{i=1}^{n}f_i(x)$. To the best of our knowledge, in this nonconvex finite-sum regime, all existing variance-reduced methods, including SARAH, SVRG, SAGA and their variants, need to compute the full gradient over all $n$ data samples at the initial point $x^0$, and then periodically compute the full gradient once every few iterations (for SVRG, SARAH and their variants). Moreover, SVRG, SAGA and their variants typically achieve weaker convergence results than variants of SARAH: $n^{2/3}/\epsilon^2$ vs. $n^{1/2}/\epsilon^2$. ZeroSARAH is the first variance-reduced method which does not require any full gradient computations, not even for the initial point. Moreover, ZeroSARAH obtains new state-of-the-art convergence results, which can improve the previous best-known result (given by e.g., SPIDER, SpiderBoost, SARAH, SSRGD and PAGE) in certain regimes. Avoiding any full gradient computations (which is a time-consuming step) is important in many applications as the number of data samples $n$ usually is very large. Especially in the distributed setting, periodic computation of full gradient over all data samples needs to periodically synchronize all machines/devices, which may be impossible or very hard to achieve. Thus, we expect that ZeroSARAH will have a practical impact in distributed and federated learning where full device participation is impractical.

View more