Recent Submissions

  • A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization

    Li, Zhize (arXiv, 2021-06-17) [Preprint]
    In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., ICML'21). Then we provide a simple and clean convergence analysis of PAGE for achieving optimal convergence rates. Moreover, PAGE and its analysis can be easily adopted and generalized to other works. We hope that this note provides the insights and is helpful for future works.
  • Snapshot Space–Time Holographic 3D Particle Tracking Velocimetry

    Chen, Ni; Wang, Congli; Heidrich, Wolfgang (Laser & Photonics Reviews, Wiley, 2021-06-10) [Article]
    Digital inline holography is an amazingly simple and effective approach for 3D imaging, to which particle tracking velocimetry is of particular interest. Conventional digital holographic particle tracking velocimetry techniques are computationally separated in particle and flow reconstruction, plus the expensive computations. Usually, the particle volumes are recovered first, from which fluid flows are computed. Without iterative reconstructions, This sequential space–time process lacks accuracy. This paper presents a joint optimization framework for digital holographic particle tracking velocimetry: particle volumes and fluid flows are reconstructed jointly in a higher space–time dimension, enabling faster convergence and better reconstruction quality of both fluid flow and particle volumes within a few minutes on modern GPUs. Synthetic and experimental results are presented to show the efficiency of the proposed technique.
  • Smoothness-Aware Quantization Techniques

    Wang, Bokun; Safaryan, Mher; Richtarik, Peter (arXiv, 2021-06-07) [Preprint]
    Distributed machine learning has become an indispensable tool for training large supervised machine learning models. To address the high communication costs of distributed training, which is further exacerbated by the fact that modern highly performing models are typically overparameterized, a large body of work has been devoted in recent years to the design of various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them. Recently, Safaryan et al (2021) pioneered a dramatically different compression design approach: they first use the local training data to form local {\em smoothness matrices}, and then propose to design a compressor capable of exploiting the smoothness information contained therein. While this novel approach leads to substantial savings in communication, it is limited to sparsification as it crucially depends on the linearity of the compression operator. In this work, we resolve this problem by extending their smoothness-aware compression strategy to arbitrary unbiased compression operators, which also includes sparsification. Specializing our results to quantization, we observe significant savings in communication complexity compared to standard quantization. In particular, we show theoretically that block quantization with $n$ blocks outperforms single block quantization, leading to a reduction in communication complexity by an $\mathcal{O}(n)$ factor, where $n$ is the number of nodes in the distributed system. Finally, we provide extensive numerical evidence that our smoothness-aware quantization strategies outperform existing quantization schemes as well the aforementioned smoothness-aware sparsification strategies with respect to all relevant success measures: the number of iterations, the total amount of bits communicated, and wall-clock time.
  • Complexity Analysis of Stein Variational Gradient Descent Under Talagrand's Inequality T1

    Salim, Adil; Sun, Lukang; Richtarik, Peter (arXiv, 2021-06-06) [Preprint]
    We study the complexity of Stein Variational Gradient Descent (SVGD), which is an algorithm to sample from $\pi(x) \propto \exp(-F(x))$ where $F$ smooth and nonconvex. We provide a clean complexity bound for SVGD in the population limit in terms of the Stein Fisher Information (or squared Kernelized Stein Discrepancy), as a function of the dimension of the problem $d$ and the desired accuracy $\varepsilon$. Unlike existing work, we do not make any assumption on the trajectory of the algorithm. Instead, our key assumption is that the target distribution satisfies Talagrand's inequality T1.
  • MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

    Condat, Laurent Pierre; Richtarik, Peter (arXiv, 2021-06-06) [Preprint]
    We propose a generic variance-reduced algorithm, which we call MUltiple RANdomized Algorithm (MURANA), for minimizing a sum of several smooth functions plus a regularizer, in a sequential or distributed manner. Our method is formulated with general stochastic operators, which allow us to model various strategies for reducing the computational complexity. For example, MURANA supports sparse activation of the gradients, and also reduction of the communication load via compression of the update vectors. This versatility allows MURANA to cover many existing randomization mechanisms within a unified framework. However, MURANA also encodes new methods as special cases. We highlight one of them, which we call ELVIRA, and show that it improves upon Loopless SVRG.
  • FedNL: Making Newton-Type Methods Applicable to Federated Learning

    Safaryan, Mher; Islamov, Rustem; Qian, Xun; Richtarik, Peter (arXiv, 2021-06-05) [Preprint]
    Inspired by recent work of Islamov et al (2021), we propose a family of Federated Newton Learn (FedNL) methods, which we believe is a marked step in the direction of making second-order methods applicable to FL. In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice. Notably, we do not need to rely on error feedback for our methods to work with contractive compressors. Moreover, we develop FedNL-PP, FedNL-CR and FedNL-LS, which are variants of FedNL that support partial participation, and globalization via cubic regularization and line search, respectively, and FedNL-BC, which is a variant that can further benefit from bidirectional compression of gradients and models, i.e., smart uplink gradient and smart downlink model compression. We prove local convergence rates that are independent of the condition number, the number of training data points, and compression variance. Our communication efficient Hessian learning technique provably learns the Hessian at the optimum. Finally, we perform a variety of numerical experiments that show that our FedNL methods have state-of-the-art communication complexity when compared to key baselines.
  • Self-Supervised Learning of Domain Invariant Features for Depth Estimation

    Akada, Hiroyasu; Bhat, Shariq Farooq; Alhashim, Ibraheem; Wonka, Peter (arXiv, 2021-06-04) [Preprint]
    We tackle the problem of unsupervised synthetic-to-realistic domain adaptation for single image depth estimation. An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output. In this paper, we propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner. Specifically, we extend self-supervised learning from traditional representation learning, which works on images from a single domain, to domain invariant representation learning, which works on images from two different domains by utilizing an image-to-image translation network. Firstly, we use our bidirectional image-to-image translation network to transfer domain-specific styles between synthetic and real domains. This style transfer operation allows us to obtain similar images from the different domains. Secondly, we jointly train our task network and Siamese network with the same images from the different domains to obtain domain invariance for the task network. Finally, we fine-tune the task network using labeled synthetic and unlabeled real-world data. Our training strategy yields improved generalization capability in the real-world domain. We carry out an extensive evaluation on two popular datasets for depth estimation, KITTI and Make3D. The results demonstrate that our proposed method outperforms the state-of-the-art both qualitatively and quantitatively. The source code and model weights will be made available.
  • SketchGen: Generating Constrained CAD Sketches

    Para, Wamiq Reyaz; Bhat, Shariq Farooq; Guerrero, Paul; Kelly, Tom; Mitra, Niloy J.; Guibas, Leonidas; Wonka, Peter (arXiv, 2021-06-04) [Preprint]
    Computer-aided design (CAD) is the most widely used modeling approach for technical design. The typical starting point in these designs is 2D sketches which can later be extruded and combined to obtain complex three-dimensional assemblies. Such sketches are typically composed of parametric primitives, such as points, lines, and circular arcs, augmented with geometric constraints linking the primitives, such as coincidence, parallelism, or orthogonality. Sketches can be represented as graphs, with the primitives as nodes and the constraints as edges. Training a model to automatically generate CAD sketches can enable several novel workflows, but is challenging due to the complexity of the graphs and the heterogeneity of the primitives and constraints. In particular, each type of primitive and constraint may require a record of different size and parameter types. We propose SketchGen as a generative model based on a transformer architecture to address the heterogeneity problem by carefully designing a sequential language for the primitives and constraints that allows distinguishing between different primitive or constraint types and their parameters, while encouraging our model to re-use information across related parameters, encoding shared structure. A particular highlight of our work is the ability to produce primitives linked via constraints that enables the final output to be further regularized via a constraint solver. We evaluate our model by demonstrating constraint prediction for given sets of primitives and full sketch generation from scratch, showing that our approach significantly out performs the state-of-the-art in CAD sketch generation.
  • A practical and efficient model for intensity calibration of multi-light image collections

    Pintus, Ruggero; Jaspe Villanueva, Alberto; Zorcolo, Antonio; Hadwiger, Markus; Gobbetti, Enrico (Visual Computer, Springer Science and Business Media LLC, 2021-06-04) [Article]
    We present a novel practical and efficient mathematical formulation for light intensity calibration of multi-light image collections (MLICs). Inspired by existing and orthogonal calibration methods, we design a hybrid solution that leverages their strengths while overcoming most of their weaknesses. We combine the rationale of approaches based on fixed analytical models with the interpolation scheme of image domain methods. This allows us to minimize the final residual error in light intensity estimation, without imposing an overly constraining illuminant type. Unlike previous approaches, the proposed calibration strategy proved to be simpler, more efficient and versatile, and extremely adaptable in different setup scenarios. We conduct an extensive analysis and validation of our new light model compared to several state-of-the-art techniques, and we show how the proposed solution provides a more reliable outcomes in terms of accuracy and precision, and a more stable calibration across different light positions/orientations, and with a more general light form factor.
  • Barbershop: GAN-based Image Compositing using Segmentation Masks

    Zhu, Peihao; Abdal, Rameen; Femiani, John; Wonka, Peter (arXiv, 2021-06-02) [Preprint]
    Seamlessly blending features from multiple images is extremely challenging because of complex relationships in lighting, geometry, and partial occlusion which cause coupling between different parts of the image. Even though recent work on GANs enables synthesis of realistic hair or faces, it remains difficult to combine them into a single, coherent, and plausible image rather than a disjointed set of image patches. We present a novel solution to image blending, particularly for the problem of hairstyle transfer, based on GAN-inversion. We propose a novel latent space for image blending which is better at preserving detail and encoding spatial information, and propose a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask. Our novel representation enables the transfer of the visual properties from multiple reference images including specific details such as moles and wrinkles, and because we do image blending in a latent-space we are able to synthesize images that are coherent. Our approach avoids blending artifacts present in other approaches and finds a globally consistent image. Our results demonstrate a significant improvement over the current state of the art in a user study, with users preferring our blending solution over 95 percent of the time.
  • SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation

    Li, Bing; Zheng, Cheng; Giancola, Silvio; Ghanem, Bernard (arXiv, 2021-05-10) [Preprint]
    We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds. Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. Such unstructured data poses difficulties in matching corresponding points between point clouds, leading to inaccurate flow estimation. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. Specifically, by leveraging the sparse convolution, SCTN transfers irregular point cloud into locally consistent flow features for estimating continuous and consistent motions within an object/local object part. We further propose to explicitly learn point relations using a point transformer module, different from exiting methods. We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation. In addition, a novel loss function is proposed to adaptively encourage flow consistency according to feature similarity. Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on FlyingThings3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins.
  • StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

    Abdal, Rameen; Zhu, Peihao; Mitra, Niloy J.; Wonka, Peter (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-05-06) [Article]
    High-quality, diverse, and photorealistic images can now be generated by unconditional GANs (e.g., StyleGAN). However, limited options exist to control the generation process using (semantic) attributes while still preserving the quality of the output. Further, due to the entangled nature of the GAN latent space, performing edits along one attribute can easily result in unwanted changes along other attributes. In this article, in the context of conditional exploration of entangled latent spaces, we investigate the two sub-problems of attribute-conditioned sampling and attribute-controlled editing. We present StyleFlow as a simple, effective, and robust solution to both the sub-problems by formulating conditional exploration as an instance of conditional continuous normalizing flows in the GAN latent space conditioned by attribute features. We evaluate our method using the face and the car latent space of StyleGAN, and demonstrate fine-grained disentangled edits along various attributes on both real photographs and StyleGAN generated images. For example, for faces, we vary camera pose, illumination variation, expression, facial hair, gender, and age. Finally, via extensive qualitative and quantitative comparisons, we demonstrate the superiority of StyleFlow over prior and several concurrent works. Project Page and Video: https://rameenabdal.github.io/StyleFlow .
  • Synthetic 3D Data Generation Pipeline for Geometric Deep Learning in Architecture

    Fedorova, Stanislava; Tono, Alberto; Nigam, Meher Shashwat; Zhang, Jiayao; Ahmadnia, Amirhossein; Bolognesi, Cecilia; Michels, Dominik L. (arXiv, 2021-04-26) [Preprint]
    With the growing interest in deep learning algorithms and computational design in the architectural field, the need for large, accessible and diverse architectural datasets increases. We decided to tackle this problem by constructing a field-specific synthetic data generation pipeline that generates an arbitrary amount of 3D data along with the associated 2D and 3D annotations. The variety of annotations, the flexibility to customize the generated building and dataset parameters make this framework suitable for multiple deep learning tasks, including geometric deep learning that requires direct 3D supervision. Creating our building data generation pipeline we leveraged architectural knowledge from experts in order to construct a framework that would be modular, extendable and would provide a sufficient amount of class-balanced data samples. Moreover, we purposefully involve the researcher in the dataset customization allowing the introduction of additional building components, material textures, building classes, number and type of annotations as well as the number of views per 3D model sample. In this way, the framework would satisfy different research requirements and would be adaptable to a large variety of tasks. All code and data are made publicly available.
  • Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting

    Cioppa, Anthony; Deliège, Adrien; Magera, Floriane; Giancola, Silvio; Barnich, Olivier; Ghanem, Bernard; Droogenbroeck, Marc Van (arXiv, 2021-04-19) [Preprint]
    Soccer broadcast video understanding has been drawing a lot of attention in recent years within data scientists and industrial companies. This is mainly due to the lucrative potential unlocked by effective deep learning techniques developed in the field of computer vision. In this work, we focus on the topic of camera calibration and on its current limitations for the scientific community. More precisely, we tackle the absence of a large-scale calibration dataset and of a public calibration network trained on such a dataset. Specifically, we distill a powerful commercial calibration tool in a recent neural network architecture on the large-scale SoccerNet dataset, composed of untrimmed broadcast videos of 500 soccer games. We further release our distilled network, and leverage it to provide 3 ways of representing the calibration results along with player localization. Finally, we exploit those representations within the current best architecture for the action spotting task of SoccerNet-v2, and achieve new state-of-the-art performances.
  • SeedQuant: A deep learning-based tool for assessing stimulant and inhibitor activity on root parasitic seeds.

    Braguy, Justine; Ramazanova, Merey; Giancola, Silvio; Jamil, Muhammad; Kountche, Boubacar Amadou; Zarban, Randa Alhassan Yahya; Felemban, Abrar; Wang, Jian You; Lin, Pei-Yu; Haider, Imran; Zurbriggen, Matias; Ghanem, Bernard; Al-Babili, Salim (Plant physiology, Oxford University Press (OUP), 2021-04-15) [Article]
    Witchweeds (Striga spp.) and broomrapes (Orobanchaceae and Phelipanche spp.) are root parasitic plants that infest many crops in warm and temperate zones, causing enormous yield losses and endangering global food security. Seeds of these obligate parasites require rhizospheric, host-released stimulants to germinate, which opens up possibilities for controlling them by applying specific germination inhibitors or synthetic stimulants that induce lethal germination in host's absence. To determine their effect on germination, root exudates or synthetic stimulants/inhibitors are usually applied to parasitic seeds in in vitro bioassays, followed by assessment of germination ratios. Although these protocols are very sensitive, the germination recording process is laborious, representing a challenge for researchers and impeding high-throughput screens. Here, we developed an automatic seed census tool to count and discriminate germinated from non-germinated seeds. We combined deep learning, a powerful data-driven framework that can accelerate the procedure and increase its accuracy, for object detection with computer vision latest development based on the Faster R-CNN algorithm. Our method showed an accuracy of 94% in counting seeds of Striga hermonthica and reduced the required time from ˜5 minutes to 5 seconds per image. Our proposed software, SeedQuant, will be of great help for seed germination bioassays and enable high-throughput screening for germination stimulants/inhibitors. ​SeedQuant is an open-source software that can be further trained to count different types of seeds for research purposes.
  • Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts

    Giancola, Silvio; Ghanem, Bernard (arXiv, 2021-04-14) [Preprint]
    Toward the goal of automatic production for sports broadcasts, a paramount task consists in understanding the high-level semantic information of the game in play. For instance, recognizing and localizing the main actions of the game would allow producers to adapt and automatize the broadcast production, focusing on the important details of the game and maximizing the spectator engagement. In this paper, we focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game. To that end, we propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge. Different from previous pooling methods that consider the temporal context as a single set to pool from, we split the context before and after an action occurs. We argue that considering the contextual information around the action spot as a single entity leads to a sub-optimal learning for the pooling module. With NetVLAD++, we disentangle the context from the past and future frames and learn specific vocabularies of semantics for each subsets, avoiding to blend and blur such vocabulary in time. Injecting such prior knowledge creates more informative pooling modules and more discriminative pooled features, leading into a better understanding of the actions. We train and evaluate our methodology on the recent large-scale dataset SoccerNet-v2, reaching 53.4% Average-mAP for action spotting, a +12.7% improvement w.r.t the current state-of-the-art.
  • Snapshot space-time holographic three-dimensional particle tracking velocimetry

    Chen, Ni; Wang, Congli; Heidrich, Wolfgang (Laser & Photonics Reviews, Wiley-VCH, 2021-04-12) [Article]
    Digital inline holography is an amazingly simple and effective approach for three-dimensional imaging, to which particle tracking velocimetry is of particular interest. Conventional digital holographic particle tracking velocimetry techniques are computationally separated in particle and flow reconstruction, plus the expensive computations. Usually, the particle volumes are recovered firstly, from which fluid flows are computed. Without iterative reconstructions, This sequential spacetime process lacks accuracy. This paper presents a joint optimization framework for digital holographic particle tracking velocimetry: particle volumes and fluid flows are reconstructed jointly in a higher space-time dimension, enabling faster convergence and better reconstruction quality of both fluid flow and particle volumes within a few minutes on modern GPUs. Synthetic and experimental results are presented to show the efficiency of the proposed technique.
  • Uncertainty principle for communication compression in distributed and federated learning and the search for an optimal compressor

    Safaryan, Mher; Shulgin, Egor; Richtarik, Peter (Information and Inference: A Journal of the IMA, Oxford University Press (OUP), 2021-04-12) [Article]
    Abstract In order to mitigate the high communication cost in distributed and federated learning, various vector compression schemes, such as quantization, sparsification and dithering, have become very popular. In designing a compression method, one aims to communicate as few bits as possible, which minimizes the cost per communication round, while at the same time attempting to impart as little distortion (variance) to the communicated messages as possible, which minimizes the adverse effect of the compression on the overall number of communication rounds. However, intuitively, these two goals are fundamentally in conflict: the more compression we allow, the more distorted the messages become. We formalize this intuition and prove an uncertainty principle for randomized compression operators, thus quantifying this limitation mathematically, and effectively providing asymptotically tight lower bounds on what might be achievable with communication compression. Motivated by these developments, we call for the search for the optimal compression operator. In an attempt to take a first step in this direction, we consider an unbiased compression method inspired by the Kashin representation of vectors, which we call Kashin compression (KC). In contrast to all previously proposed compression mechanisms, KC enjoys a dimension independent variance bound for which we derive an explicit formula even in the regime when only a few bits need to be communicate per each vector entry.
  • Finding Nano-Ötzi: Semi-Supervised Volume Visualization for Cryo-Electron Tomography

    Nguyen, Ngan; Bohak, Ciril; Engel, Dominik; Mindek, Peter; Strnad, Ondrej; Wonka, Peter; Li, Sai; Ropinski, Timo; Viola, Ivan (arXiv, 2021-04-04) [Preprint]
    Cryo-Electron Tomography (cryo-ET) is a new 3D imaging technique with unprecedented potential for resolving submicron structural detail. Existing volume visualization methods, however, cannot cope with its very low signal-to-noise ratio. In order to design more powerful transfer functions, we propose to leverage soft segmentation as an explicit component of visualization for noisy volumes. Our technical realization is based on semi-supervised learning where we combine the advantages of two segmentation algorithms. A first weak segmentation algorithm provides good results for propagating sparse user provided labels to other voxels in the same volume. This weak segmentation algorithm is used to generate dense pseudo labels. A second powerful deep-learning based segmentation algorithm can learn from these pseudo labels to generalize the segmentation to other unseen volumes, a task that the weak segmentation algorithm fails at completely. The proposed volume visualization uses the deep-learning based segmentation as a component for segmentation-aware transfer function design. Appropriate ramp parameters can be suggested automatically through histogram analysis. Finally, our visualization uses gradient-free ambient occlusion shading to further suppress visual presence of noise, and to give structural detail desired prominence. The cryo-ET data studied throughout our technical experiments is based on the highest-quality tilted series of intact SARS-CoV-2 virions. Our technique shows the high impact in target sciences for visual data analysis of very noisy volumes that cannot be visualized with existing techniques.
  • arthurlirui/refsepECCV2020: Code for Reflection Separation via Multi-bounce Polarization State Tracing

    Li, Rui; Qiu, Simeng; Zang, Guangming; Heidrich, Wolfgang (Github, 2021-03-31) [Software]
    Code for Reflection Separation via Multi-bounce Polarization State Tracing

View more