Recent Submissions

  • Ships, splashes, and waves on a vast ocean

    Huang, Libo; Qu, Ziyin; Tan, Xun; Zhang, Xinxin; Michels, Dominik L.; Jiang, Chenfanfu (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-12-10) [Article]
    The simulation of large open water surface is challenging using a uniform volumetric discretization of the Navier-Stokes equations. Simulating water splashes near moving objects, which height field methods for water waves cannot capture, necessitates high resolutions. Such simulations can be carried out using the Fluid-Implicit-Particle (FLIP) method. However, the FLIP method is not efficient for the long-lasting water waves that propagate to long distances, which require sufficient depth for a correct dispersion relationship. This paper presents a new method to tackle this dilemma through an efficient hybridization of volumetric and surface-based advection-projection discretizations. We design a hybrid time-stepping algorithm that combines a FLIP domain and an adaptively remeshed Boundary Element Method (BEM) domain for the incompressible Euler equations. The resulting framework captures the detailed water splashes near moving objects with the FLIP method, and produces convincing water waves with correct dispersion relationships at modest additional costs.
  • Intuitive and efficient roof modeling for reconstruction and synthesis

    Ren, Jing; Zhang, Biao; Wu, Bojian; Huang, Jianqiang; Fan, Lubin; Ovsjanikov, Maks; Wonka, Peter (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-12-10) [Article]
    We propose a novel and flexible roof modeling approach that can be used for constructing planar 3D polygon roof meshes. Our method uses a graph structure to encode roof topology and enforces the roof validity by optimizing a simple but effective planarity metric we propose. This approach is significantly more efficient than using general purpose 3D modeling tools such as 3ds Max or SketchUp, and more powerful and expressive than specialized tools such as the straight skeleton. Our optimization-based formulation is also flexible and can accommodate different styles and user preferences for roof modeling. We showcase two applications. The first application is an interactive roof editing framework that can be used for roof design or roof reconstruction from aerial images. We highlight the efficiency and generality of our approach by constructing a mesh-image paired dataset consisting of 2539 roofs. Our second application is a generative model to synthesize new roof meshes from scratch. We use our novel dataset to combine machine learning and our roof optimization techniques, by using transformers and graph convolutional networks to model roof topology, and our roof optimization methods to enforce the planarity constraint.
  • CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions

    Abdal, Rameen; Zhu, Peihao; Femiani, John; Mitra, Niloy J.; Wonka, Peter (arXiv, 2021-12-09) [Preprint]
    The success of StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images. However, such editing operations are either trained with semantic supervision or described using human guidance. In another development, the CLIP architecture has been trained with internet-scale image and text pairings and has been shown to be useful in several zero-shot learning settings. In this work, we investigate how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically labeled edit directions from StyleGAN, finding and naming meaningful edit operations without any additional human guidance. Technically, we propose two novel building blocks; one for finding interesting CLIP directions and one for labeling arbitrary directions in CLIP latent space. The setup does not assume any pre-determined labels and hence we do not require any additional supervised text/attributes to build the editing framework. We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible, and reveals interesting and non-trivial edit directions.
  • Snapshot HDR Video Construction Using Coded Mask

    Alghamdi, Masheal; Fu, Qiang; Thabet, Ali Kassem; Heidrich, Wolfgang (arXiv, 2021-12-05) [Preprint]
    This paper study the reconstruction of High Dynamic Range (HDR) video from snapshot-coded LDR video. Constructing an HDR video requires restoring the HDR values for each frame and maintaining the consistency between successive frames. HDR image acquisition from single image capture, also known as snapshot HDR imaging, can be achieved in several ways. For example, the reconfigurable snapshot HDR camera is realized by introducing an optical element into the optical stack of the camera; by placing a coded mask at a small standoff distance in front of the sensor. High-quality HDR image can be recovered from the captured coded image using deep learning methods. This study utilizes 3D-CNNs to perform a joint demosaicking, denoising, and HDR video reconstruction from coded LDR video. We enforce more temporally consistent HDR video reconstruction by introducing a temporal loss function that considers the short-term and long-term consistency. The obtained results are promising and could lead to affordable HDR video capture using conventional cameras.
  • Barbershop

    Zhu, Peihao; Abdal, Rameen; Femiani, John; Wonka, Peter (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-12) [Article]
    Seamlessly blending features from multiple images is extremely challenging because of complex relationships in lighting, geometry, and partial occlusion which cause coupling between different parts of the image. Even though recent work on GANs enables synthesis of realistic hair or faces, it remains difficult to combine them into a single, coherent, and plausible image rather than a disjointed set of image patches. We present a novel solution to image blending, particularly for the problem of hairstyle transfer, based on GAN-inversion. We propose a novel latent space for image blending which is better at preserving detail and encoding spatial information, and propose a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask. Our novel representation enables the transfer of the visual properties from multiple reference images including specific details such as moles and wrinkles, and because we do image blending in a latent-space we are able to synthesize images that are coherent. Our approach avoids blending artifacts present in other approaches and finds a globally consistent image. Our results demonstrate a significant improvement over the current state of the art in a user study, with users preferring our blending solution over 95 percent of the time. Source code for the new approach is available at
  • MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

    Soldan, Mattia; Pardo, Alejandro; Alcázar, Juan León; Heilbron, Fabian Caba; Zhao, Chen; Giancola, Silvio; Ghanem, Bernard (arXiv, 2021-12-01) [Preprint]
    The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that state-of-the-art techniques commonly overfit to hidden dataset biases. In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets. MAD's collection strategy enables a novel and more challenging version of video-language grounding, where short temporal moments (typically seconds long) must be accurately grounded in diverse long-form videos that can last up to three hours.
  • Learning to reconstruct botanical trees from single images

    Li, Bosheng; Kałużny, Jacek; Klein, Jonathan; Michels, Dominik L.; Pałubicki, Wojtek; Benes, Bedrich; Pirk, Sören (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-12) [Article]
    We introduce a novel method for reconstructing the 3D geometry of botanical trees from single photographs. Faithfully reconstructing a tree from single-view sensor data is a challenging and open problem because many possible 3D trees exist that fit the tree's shape observed from a single view. We address this challenge by defining a reconstruction pipeline based on three neural networks. The networks simultaneously mask out trees in input photographs, identify a tree's species, and obtain its 3D radial bounding volume - our novel 3D representation for botanical trees. Radial bounding volumes (RBV) are used to orchestrate a procedural model primed on learned parameters to grow a tree that matches the main branching structure and the overall shape of the captured tree. While the RBV allows us to faithfully reconstruct the main branching structure, we use the procedural model's morphological constraints to generate realistic branching for the tree crown. This constraints the number of solutions of tree models for a given photograph of a tree. We show that our method reconstructs various tree species even when the trees are captured in front of complex backgrounds. Moreover, although our neural networks have been trained on synthetic data with data augmentation, we show that our pipeline performs well for real tree photographs. We evaluate the reconstructed geometries with several metrics, including leaf area index and maximum radial tree distances.
  • Weatherscapes : nowcasting heat transfer and water continuity

    Herrera, Jorge Alejandro Amador; Hadrich, Torsten; Pałubicki, Wojtek; Banuti, Daniel T.; Pirk, Sören; Michels, Dominik L. (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-12) [Article]
    Due to the complex interplay of various meteorological phenomena, simulating weather is a challenging and open research problem. In this contribution, we propose a novel physics-based model that enables simulating weather at interactive rates. By considering atmosphere and pedosphere we can define the hydrologic cycle - and consequently weather - in unprecedented detail. Specifically, our model captures different warm and cold clouds, such as mammatus, hole-punch, multi-layer, and cumulonimbus clouds as well as their dynamic transitions. We also model different precipitation types, such as rain, snow, and graupel by introducing a comprehensive microphysics scheme. The Wegener-Bergeron-Findeisen process is incorporated into our Kessler-type microphysics formulation covering ice crystal growth occurring in mixed-phase clouds. Moreover, we model the water run-off from the ground surface, the infiltration into the soil, and its subsequent evaporation back to the atmosphere. We account for daily temperature changes, as well as heat transfer between pedosphere and atmosphere leading to a complex feedback loop. Our framework enables us to interactively explore various complex weather phenomena. Our results are assessed visually and validated by simulating weatherscapes for various setups covering different precipitation events and environments, by showcasing the hydrologic cycle, and by reproducing common effects such as Foehn winds. We also provide quantitative evaluations creating high-precipitation cumulonimbus clouds by prescribing atmospheric conditions based on infrared satellite observations. With our model we can generate dynamic 3D scenes of weatherscapes with high visual fidelity and even nowcast real weather conditions as simulations by streaming weather data into our framework.
  • Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding

    Hamdi, Abdullah; Giancola, Silvio; Ghanem, Bernard (arXiv, 2021-11-30) [Preprint]
    Multi-view projection methods have demonstrated promising performance on 3D understanding tasks like 3D classification and segmentation. However, it remains unclear how to combine such multi-view methods with the widely available 3D point clouds. Previous methods use unlearned heuristics to combine features at the point level. To this end, we introduce the concept of the multi-view point cloud (Voint cloud), representing each 3D point as a set of features extracted from several view-points. This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation. Naturally, we can equip this new representation with convolutional and pooling operations. We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space. Our novel representation achieves state-of-the-art performance on 3D classification and retrieval on ScanObjectNN, ModelNet40, and ShapeNet Core55. Additionally, we achieve competitive performance for 3D semantic segmentation on ShapeNet Parts. Further analysis shows that VointNet improves the robustness to rotation and occlusion compared to other methods.
  • Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

    Qian, Xun; Islamov, Rustem; Safaryan, Mher; Richtarik, Peter (arXiv, 2021-11-02) [Preprint]
    Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods. We discover that the communication cost of these methods can be further reduced, sometimes dramatically so, with a surprisingly simple trick: {\em Basis Learn (BL)}. The idea is to transform the usual representation of the local Hessians via a change of basis in the space of matrices and apply compression tools to the new representation. To demonstrate the potential of using custom bases, we design a new Newton-type method (BL1), which reduces communication cost via both {\em BL} technique and bidirectional compression mechanism. Furthermore, we present two alternative extensions (BL2 and BL3) to partial participation to accommodate federated learning applications. We prove local linear and superlinear rates independent of the condition number. Finally, we support our claims with numerical experiments by comparing several first and second~order~methods.
  • Fast Sinkhorn Filters: Using Matrix Scaling for Non-Rigid Shape Correspondence with Functional Maps

    Pai, Gautam; Ren, Jing; Melzi, Simone; Wonka, Peter; Ovsjanikov, Maks (IEEE, 2021-11-02) [Conference Paper]
    In this paper, we provide a theoretical foundation for pointwise map recovery from functional maps and highlight its relation to a range of shape correspondence methods based on spectral alignment. With this analysis in hand, we develop a novel spectral registration technique: Fast Sinkhorn Filters, which allows for the recovery of accurate and bijective pointwise correspondences with a superior time and memory complexity in comparison to existing approaches. Our method combines the simple and concise representation of correspondence using functional maps with the matrix scaling schemes from computational optimal transport. By exploiting the sparse structure of the kernel matrices involved in the transport map computation, we provide an efficient trade-off between acceptable accuracy and complexity for the problem of dense shape correspondence, while promoting bijectivity.
  • PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks

    Qian, Guocheng; Abualshour, Abdulellah; Li, Guohao; Thabet, Ali Kassem; Ghanem, Bernard (IEEE, 2021-11-02) [Conference Paper]
    The effectiveness of learning-based point cloud upsampling pipelines heavily relies on the upsampling modules and feature extractors used therein. For the point upsampling module, we propose a novel model called NodeShuffle, which uses a Graph Convolutional Network (GCN) to better encode local point information from point neighborhoods. NodeShuffle is versatile and can be incorporated into any point cloud upsampling pipeline. Extensive experiments show how NodeShuffle consistently improves state-of-the-art upsampling methods. For feature extraction, we also propose a new multi-scale point feature extractor, called Inception DenseGCN. By aggregating features at multiple scales, this feature extractor enables further performance gain in the final upsampled point clouds. We combine Inception DenseGCN with NodeShuffle into a new point upsampling pipeline called PU-GCN. PU-GCN sets new state-of-art performance with much fewer parameters and more efficient inference. Our code is publicly available at
  • Point Cloud Instance Segmentation using Probabilistic Embeddings

    Zhang, Biao; Wonka, Peter (IEEE, 2021-11-02) [Conference Paper]
    In this paper we propose a new framework for point cloud instance segmentation. Our framework has two steps: an embedding step and a clustering step. In the embedding step, our main contribution is to propose a probabilistic embedding space for point cloud embedding. Specifically, each point is represented as a tri-variate normal distribution. In the clustering step, we propose a novel loss function, which benefits both the semantic segmentation and the clustering. Our experimental results show important improvements to the SOTA, i.e., 3.1% increased average per-category mAP on the PartNet dataset.
  • AdaBins: Depth Estimation Using Adaptive Bins

    Bhat, Shariq Farooq; Alhashim, Ibraheem; Wonka, Peter (IEEE, 2021-11-02) [Conference Paper]
    We address the problem of estimating a high quality dense depth map from a single RGB input image. We start out with a baseline encoder-decoder convolutional neural network architecture and pose the question of how the global processing of information can help improve overall depth estimation. To this end, we propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image. The final depth values are estimated as linear combinations of the bin centers. We call our new building block AdaBins. Our results show a decisive improvement over the state-of-the-art on several popular depth datasets across all metrics. We also validate the effectiveness of the proposed block with an ablation study and provide the code and corresponding pre-trained weights of the new state-of-the-art model.
  • ISP-Agnostic Image Reconstruction for Under-Display Cameras

    Qi, Miao; Li, Yuqi; Heidrich, Wolfgang (arXiv, 2021-11-02) [Preprint]
    Under-display cameras have been proposed in recent years as a way to reduce the form factor of mobile devices while maximizing the screen area. Unfortunately, placing the camera behind the screen results in significant image distortions, including loss of contrast, blur, noise, color shift, scattering artifacts, and reduced light sensitivity. In this paper, we propose an image-restoration pipeline that is ISP-agnostic, i.e. it can be combined with any legacy ISP to produce a final image that matches the appearance of regular cameras using the same ISP. This is achieved with a deep learning approach that performs a RAW-to-RAW image restoration. To obtain large quantities of real under-display camera training data with sufficient contrast and scene diversity, we furthermore develop a data capture method utilizing an HDR monitor, as well as a data augmentation method to generate suitable HDR content. The monitor data is supplemented with real-world data that has less scene diversity but allows us to achieve fine detail recovery without being limited by the monitor resolution. Together, this approach successfully restores color and contrast as well as image detail.
  • Seeing in Extra Darkness Using a Deep-Red Flash

    Xiong, Jinhui; Wang, Jian; Heidrich, Wolfgang; Nayar, Shree (IEEE, 2021-11-02) [Conference Paper]
    We propose a new flash technique for low-light imaging, using deep-red light as an illuminating source. Our main observation is that in a dim environment, the human eye mainly uses rods for the perception of light, which are not sensitive to wavelengths longer than 620nm, yet the camera sensor still has a spectral response. We propose a novel modulation strategy when training a modern CNN model for guided image filtering, fusing a noisy RGB frame and a flash frame. This fusion network is further extended for video reconstruction. We have built a prototype with minor hardware adjustments and tested the new flash technique on a variety of static and dynamic scenes. The experimental results demonstrate that our method produces compelling reconstructions, even in extra dim conditions.
  • Mechanisms of elastoplastic deformation and their effect on hardness of nanogranular Ni-Fe coatings

    Zubar, T.I.; Fedosyuk, V.M.; Tishkevich, D.I.; Panasyuk, M.I.; Kanafyev, O.D.; Kozlovskiy, A.; Zdorovets, M.; Michels, Dominik L.; Lyakhov, Dmitry; Trukhanov, A.V. (International Journal of Mechanical Sciences, Elsevier BV, 2021-11) [Article]
    This article contains the study of correlation between the microstructure, mechanical properties and mechanisms of elastoplastic deformation of Ni-Fe coatings that were grown in five electrodeposition modes and had fundamentally different microstructures. A nonlinear change in hardness was detected using nanoindentation. Explanation of the abnormal change in hardness was found in the nature of the relaxation method of elastoplastic energy under load. It is shown that the deformation of coatings with a grain size of 100 nm or more occurs due to dislocation slip. A decrease in grain size leads to the predominance of deformation due to rotations and sliding of grains, as well as surface and grain boundary diffusion. The effect of deformation mechanisms on the nanoscale hardness of Ni-Fe coatings was established. Full hardening of the coatings (both in the bulk and on the surface) was achieved while maintaining the balance of three mechanisms of elastoplastic deformation in the sample. Unique coatings consisting of two fractions of grains (70% of nano-grains and 30% of their agglomerates) demonstrate high crack resistance and full-depth hardening up to H = 7.4 GPa due to the release of deformation energy for amorphization and agglomeration of nanograins.
  • Additive lithographic fabrication of a Tilt-Gaussian-Vortex mask for focal plane wavefront sensing

    Fu, Qiang; Amata, Hadi; Gerard, Benjamin; Marois, Christian; Heidrich, Wolfgang (SPIE, 2021-10-28) [Conference Paper]
    Spatially-varying features with uniform depths in large areas are challenging to achieve with etching based lithography. Here we propose an additive lithographic fabrication process to realize simultaneous presence of micrometer and millimeter features with low surface roughness. The etching step is replaced by sputter deposition and bi-layer lift-off to form the microstructures. Instead of removing materials, our method grows materials onto the substrate. We demonstrate its effectiveness with a reflective Tilt-Gaussian-Vortex mask with aluminum deposited on a fused silica substrate. The center has a diameter of 130 microns with minimum spacing of 2 microns, and the background pattern is 3 mm by 3 mm, with the largest flat region spanning 1.5 mm. A preliminary 4-level prototype has been tested in the Gemini Planet Imaging calibration unit upgrading project, and an improved 16-level sample has been measured. The results show uniform depth and surface roughness control in the whole area.
  • Shape and Reflectance Reconstruction in Uncontrolled Environments by Differentiable Rendering

    Li, Rui; Zang, Guangmin; Qi, Miao; Heidrich, Wolfgang (arXiv, 2021-10-25) [Preprint]
    Simultaneous reconstruction of geometry and reflectance properties in uncontrolled environments remains a challenging problem. In this paper, we propose an efficient method to reconstruct the scene's 3D geometry and reflectance from multi-view photography using conventional hand-held cameras. Our method automatically builds a virtual scene in a differentiable rendering system that roughly matches the real world's scene parameters, optimized by minimizing photometric objectives alternatingly and stochastically. With the optimal scene parameters evaluated, photo-realistic novel views for various viewing angles and distances can then be generated by our approach. We present the results of captured scenes with complex geometry and various reflection types. Our method also shows superior performance compared to state-of-the-art alternatives in novel view synthesis visually and quantitatively.
  • Etch-free additive lithographic fabrication methods for reflective and transmissive micro-optics

    Fu, Qiang; Amata, Hadi; Heidrich, Wolfgang (Optics Express, The Optical Society, 2021-10-22) [Article]
    With the widespread application of micro-optics in a large range of areas, versatile high quality fabrication methods for diffractive optical elements (DOEs) have always been desired by both the research community and by industry. Traditionally, multi-level DOEs are fabricated by a repetitive combination of photolithography and reactive-ion etching (RIE). The optical phase accuracy and micro-surface quality are severely affected by various etching artifacts, e.g., RIE lag, aspect ratio dependent etching rates, and etching artifacts in the RIE steps. Here we propose an alternative way to fabricate DOEs by additively growing multi-level microstructures onto the substrate. Depth accuracy, surface roughness, uniformity and smoothness are easily controlled to high accuracy by a combination of deposition and lift-off, rather than etching. Uniform depths can be realized for both micrometer and millimeter scale features that are simultaneously present in the designs. The grown media can either be used directly as a reflective DOE, or as a master stamp for nanoimprinting refractive designs. We demonstrate the effectiveness of the fabrication methods with representative reflective and transmissive DOEs for imaging and display applications.

View more