Recent Submissions

  • Intuitive and Efficient Roof Modeling for Reconstruction and Synthesis

    Ren, Jing; Zhang, Biao; Wu, Bojian; Huang, Jianqiang; Fan, Lubin; Ovsjanikov, Maks; Wonka, Peter (arXiv, 2021-09-16) [Preprint]
    We propose a novel and flexible roof modeling approach that can be used for constructing planar 3D polygon roof meshes. Our method uses a graph structure to encode roof topology and enforces the roof validity by optimizing a simple but effective planarity metric we propose. This approach is significantly more efficient than using general purpose 3D modeling tools such as 3ds Max or SketchUp, and more powerful and expressive than specialized tools such as the straight skeleton. Our optimization-based formulation is also flexible and can accommodate different styles and user preferences for roof modeling. We showcase two applications. The first application is an interactive roof editing framework that can be used for roof design or roof reconstruction from aerial images. We highlight the efficiency and generality of our approach by constructing a mesh-image paired dataset consisting of 2539 roofs. Our second application is a generative model to synthesize new roof meshes from scratch. We use our novel dataset to combine machine learning and our roof optimization techniques, by using transformers and graph convolutional networks to model roof topology, and our roof optimization methods to enforce the planarity constraint.
  • Features of structure, magnetic state and electrodynamic performance of SrFe12−xInxO19

    Turchenko, V A; Trukhanov, S V; Kostishin, V G; Damay, F; Porcher, F; Klygach, D S; Vakhitov, M G; Lyakhov, Dmitry; Michels, Dominik L.; Bozzo, B; Fina, I; Almessiere, M A; Slimani, Y; Baykal, A; Zhou, D; Trukhanov, A V (Scientific reports, Springer Science and Business Media LLC, 2021-09-16) [Article]
    Indium-substituted strontium hexaferrites were prepared by the conventional solid-phase reaction method. Neutron diffraction patterns were obtained at room temperature and analyzed using the Rietveld methods. A linear dependence of the unit cell parameters is found. In3+ cations are located mainly in octahedral positions of 4fVI and 12 k. The average crystallite size varies within 0.84–0.65 μm. With increasing substitution, the TC Curie temperature decreases monotonically down to ~ 520 K. ZFC and FC measurements showed a frustrated state. Upon substitution, the average and maximum sizes of ferrimagnetic clusters change in the opposite direction. The Mr remanent magnetization decreases down to ~ 20.2 emu/g at room temperature. The Ms spontaneous magnetization and the keff effective magnetocrystalline anisotropy constant are determined. With increasing substitution, the maximum of the ε/ real part of permittivity decreases in magnitude from ~ 3.3 to ~ 1.9 and shifts towards low frequencies from ~ 45.5 GHz to ~ 37.4 GHz. The maximum of the tg(α) dielectric loss tangent decreases from ~ 1.0 to ~ 0.7 and shifts towards low frequencies from ~ 40.6 GHz to ~ 37.3 GHz. The low-frequency maximum of the μ/ real part of permeability decreases from ~ 1.8 to ~ 0.9 and slightly shifts towards high frequencies up to ~ 34.7 GHz. The maximum of the tg(δ) magnetic loss tangent decreases from ~ 0.7 to ~ 0.5 and shifts slightly towards low frequencies from ~ 40.5 GHz to ~ 37.7 GHz. The discussion of microwave properties is based on the saturation magnetization, natural ferromagnetic resonance and dielectric polarization types.
  • Neural Étendue Expander for Ultra-Wide-Angle High-Fidelity Holographic Display

    Baek, Seung-Hwan; Tseng, Ethan; Maimone, Andrew; Matsuda, Nathan; Kuo, Grace; Fu, Qiang; Heidrich, Wolfgang; Lanman, Douglas; Heide, Felix (arXiv, 2021-09-16) [Preprint]
    Holographic displays can generate light fields by dynamically modulating the wavefront of a coherent beam of light using a spatial light modulator, promising rich virtual and augmented reality applications. However, the limited spatial resolution of existing dynamic spatial light modulators imposes a tight bound on the diffraction angle. As a result, today’s holographic displays possess low etendue, which is the product of the display area and the ´maximum solid angle of diffracted light. The low etendue forces a sacrifice of either the field ´ of view (FOV) or the display size. In this work, we lift this limitation by presenting neural etendue expanders. This new breed of optical elements, which is learned from a natural im- ´ age dataset, enables higher diffraction angles for ultra-wide FOV while maintaining both a compact form factor and the fidelity of displayed contents to human viewers. With neural etendue expanders, we achieve 64 ´ × etendue expansion of natural images with reconstruction ´ quality (measured in PSNR) over 29 dB on simulated retinal-resolution images. As a result, the proposed approach with expansion factor 64× enables high-fidelity ultra-wide-angle holographic projection of natural images using an 8K-pixel SLM, resulting in a 18.5 mm eyebox size and 2.18 steradians FOV, covering 85% of the human stereo FOV.
  • MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

    Pardo, Alejandro; Heilbron, Fabian Caba; Alcázar, Juan León; Thabet, Ali Kassem; Ghanem, Bernard (arXiv, 2021-09-12) [Preprint]
    Understanding movies and their structural patterns is a crucial task to decode the craft of video editing. While previous works have developed tools for general analysis such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the cut type recognition task, which requires modeling of multi-modal information. To ignite research in the new task, we construct a large-scale dataset called MovieCuts, which contains more than 170K videoclips labeled among ten cut types. We benchmark a series of audio-visual approaches, including some that deal with the problem's multi-modal and multi-label nature. Our best model achieves 45.7% mAP, which suggests that the task is challenging and that attaining highly accurate cut type recognition is an open research problem.
  • IntraTomo: Self-supervised Learning-based Tomography via Sinogram Synthesis and Prediction

    Zang, Guangming; Idoughi, Ramzi; Li, Rui; Wonka, Peter; Heidrich, Wolfgang (IEEE, 2021-09-10) [Conference Paper]
    We propose IntraTomo, a powerful framework that combines the benefits of learning-based and model-based approaches for solving highly ill-posed inverse problems, in the Computed Tomography (CT) context. IntraTomo is composed of two core modules: a novel sinogram prediction module and a geometry refinement module, which are applied iteratively. In the first module, the unknown density field is represented as a continuous and differentiable function, parameterized by a deep neural network. This network is learned, in a self-supervised fashion, from the incomplete or/and degraded input sinogram. After getting estimated through the sinogram prediction module, the density field is consistently refined in the second module using local and non-local geometrical priors. With these two core modules, we show that IntraTomo significantly outperforms existing approaches on several ill-posed inverse problems, such as limited angle tomography with a range of 45 degrees, sparse view tomographic reconstruction with as few as eight views, or super-resolution tomography with eight times increased resolution. The experiments on simulated and real data show that our approach can achieve results of unprecedented quality.
  • A duality approach to a price formation MFG model

    Gomes, Diogo A.; Ashrafyan, Yuri; Bakaryan, Tigran; Gutierrez, Julian (arXiv, 2021-09-04) [Preprint]
    We study the connection between the Aubry-Mather theory and a mean-field game (MFG) price-formation model. We introduce a framework for Mather measures that is suited for constrained time-dependent problems in R. Then, we propose a variational problem on a space of measures, from which we obtain a duality relation involving the MFG problem examined in [36].
  • Towards self-calibrated lens metrology by differentiable refractive deflectometry

    Wang, Congli; Chen, Ni; Heidrich, Wolfgang (Optics Express, The Optical Society, 2021-09-02) [Article]
    Deflectometry, as a non-contact, fully optical metrology method, is difficult to apply to refractive elements due to multi-surface entanglement and precise pose alignment. Here, we present a computational self-calibration approach to measure parametric lenses using dual-camera refractive deflectometry, achieved by an accurate, differentiable, and efficient ray tracing framework for modeling the metrology setup, based on which damped least squares is utilized to estimate unknown lens shape and pose parameters. We successfully demonstrate both synthetic and experimental results on singlet lens surface curvature and asphere-freeform metrology in a transmissive setting.
  • Flow-Guided Video Inpainting with Scene Templates

    Alzahrani, Majed A.; Zhu, Peihao; Wonka, Peter; Sundaramoorthi, Ganesh (arXiv, 2021-08-29) [Preprint]
    We consider the problem of filling in missing spatio-temporal regions of a video. We provide a novel flow-based solution by introducing a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images. We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings. This ensures consistency of the frame-to-frame flows generated to the underlying scene, reducing geometric distortions in flow based inpainting. The template is mapped to the missing regions in the video by a new L2-L1 interpolation scheme, creating crisp inpaintings and reducing common blur and distortion artifacts. We show on two benchmark datasets that our approach out-performs state-of-the-art quantitatively and in user studies.
  • Discrete Optimization for Shape Matching

    Ren, Jing; Melzi, Simone; Wonka, Peter; Ovsjanikov, Maks (Computer Graphics Forum, Wiley, 2021-08-23) [Article]
    We propose a novel discrete solver for optimizing functional map-based energies, including descriptor preservation and pro-moting structural properties such as area-preservation, bijectivity and Laplacian commutativity among others. Unlike thecommonly-used continuous optimization methods, our approach enforces the functional map to be associated with a pointwisecorrespondence as a hard constraint, which provides a stronger link between optimized properties of functional and point-to-point maps. Under this hard constraint, our solver obtains functional maps with lower energy values compared to the standardcontinuous strategies. Perhaps more importantly, the recovered pointwise maps from our discrete solver preserve the optimizedfor functional properties and are thus of higher overall quality. We demonstrate the advantages of our discrete solver on arange of energies and shape categories, compared to existing techniques for promoting pointwise maps within the functionalmap framework. Finally, with this solver in hand, we introduce a novel Effective Functional Map Refinement (EFMR) methodwhich achieves the state-of-the-art accuracy on the SHREC’19 benchmark.
  • Large field-of-view holographic display by gapless splicing of multisegment cylindrical holograms

    Ma, Yifan; Wang, Jun; Wu, Yang; Jin, Fengming; Zhang, Zekun; Zhou, Zhenxing; Chen, Ni (Applied Optics, The Optical Society, 2021-08-17) [Article]
    A holographic three-dimensional (3D) display is a recognized and ideal 3D display technology. In the field of holographic research, cylindrical holography with the merit of 360° field of view (FOV) has recently become a hot issue, as it naturally solves the problem of limited FOV in planar holography. The recently proposed approximate phase compensation (APC) method successfully obtains larger FOV and fast generation of segment cylindrical hologram (SCH) in the visible light band. However, the FOV of SCH remains limited due to its intrinsic limitations, and, to our best knowledge, the issue has not been effectively addressed. In this paper, the restricted conditions are first analyzed for the generation ofSCHby the APC method. Then, anFOV expansion method is proposed for realizing a large FOV holographic display by gapless splicing of multi-SCH. The proposed method can successfully obtain larger FOV cylindrical holograms and effectively eliminate the splicing gaps; its effectiveness is verified by the results of numerical simulation and optical experiments. Therefore, the proposed method can effectively solve the FOV limitation problem of the APC method for the generation of SCH in the visible band, realize a large FOV 3D display, and provide a useful reference for holographic3Ddisplay.
  • Ships, Splashes, and Waves on a Vast Ocean

    Huang, Libo; Qu, Ziyin; Tan, Xun; Zhang, Xinxin; Michels, Dominik L.; Jiang, Chenfanfu (arXiv, 2021-08-12) [Preprint]
    The simulation of large open water surface is challenging for a uniform volumetric discretization of the Navier-Stokes equation. The water splashes near moving objects, which height field methods for water waves cannot capture, necessitates high resolution simulation such as the Fluid-Implicit-Particle (FLIP) method. On the other hand, FLIP is not efficient for the long-lasting water waves that propagates to long distances, which requires sufficient depth for correct dispersion relationship. This paper presents a new method to tackle this dilemma through an efficient hybridization of volumetric and surface-based advection-projection discretizations. We design a hybrid time-stepping algorithm that combines a FLIP domain and an adaptively remeshed Boundary Element Method (BEM) domain for the incompressible Euler equations. The resulting framework captures the detailed water splashes near moving objects with FLIP, and produces convincing water waves with correct dispersion relationship at modest additional cost.
  • FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

    Zhao, Haoyu; Li, Zhize; Richtarik, Peter (arXiv, 2021-08-10) [Preprint]
    Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local methods for both federated convex and nonconvex optimization. Concretely, 1) in the convex setting, the number of communication rounds of FedPAGE is $O(\frac{N^{3/4}}{S\epsilon})$, improving the best-known result $O(\frac{N}{S\epsilon})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/4}$, where $N$ is the total number of clients (usually is very large in federated learning), $S$ is the sampled subset of clients in each communication round, and $\epsilon$ is the target error; 2) in the nonconvex setting, the number of communication rounds of FedPAGE is $O(\frac{\sqrt{N}+S}{S\epsilon^2})$, improving the best-known result $O(\frac{N^{2/3}}{S^{2/3}\epsilon^2})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/6}S^{1/3}$, if the sampled clients $S\leq \sqrt{N}$. Note that in both settings, the communication cost for each round is the same for both FedPAGE and SCAFFOLD. As a result, FedPAGE achieves new state-of-the-art results in terms of communication complexity for both federated convex and nonconvex optimization.
  • Learning to Cut by Watching Movies

    Pardo, Alejandro; Heilbron, Fabian Caba; Alcázar, Juan León; Thabet, Ali Kassem; Ghanem, Bernard (arXiv, 2021-08-09) [Preprint]
    Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts. To do this, we first collected a data source of more than 10K videos, from which we extract more than 255K cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning. We set up a new task and a set of baselines to benchmark video cut generation. We observe that our proposed model outperforms the baselines by large margins. To demonstrate our model in real-world applications, we conduct human studies in a collection of unedited videos. The results show that our model does a better job at cutting than random and alternative baselines.
  • Lost photon enhances superresolution

    Mikhalychev, A. B.; Novik, P. I.; Karuseichyk, I. L.; Lyakhov, D. A.; Michels, Dominik L.; Mogilevtsev, D. S. (npj Quantum Information, Springer Science and Business Media LLC, 2021-08-09) [Article]
    Quantum imaging can beat classical resolution limits, imposed by the diffraction of light. In particular, it is known that one can reduce the image blurring and increase the achievable resolution by illuminating an object by entangled light and measuring coincidences of photons. If an n-photon entangled state is used and the nth-order correlation function is measured, the point-spread function (PSF) effectively becomes n−−√ times narrower relatively to classical coherent imaging. Quite surprisingly, measuring n-photon correlations is not the best choice if an n-photon entangled state is available. We show that for measuring (n − 1)-photon coincidences (thus, ignoring one of the available photons), PSF can be made even narrower. This observation paves a way for a strong conditional resolution enhancement by registering one of the photons outside the imaging area. We analyze the conditions necessary for the resolution increase and propose a practical scheme, suitable for observation and exploitation of the effect.
  • Tikhonov Regularization of Circle-Valued Signals

    Condat, Laurent Pierre (arXiv, 2021-08-05) [Preprint]
    It is common to have to process signals or images whose values are cyclic and can be represented as points on the complex circle, like wrapped phases, angles, orientations, or color hues. We consider a Tikhonov-type regularization model to smoothen or interpolate circle-valued signals defined on arbitrary graphs. We propose a convex relaxation of this nonconvex problem as a semidefinite program, and an efficient algorithm to solve it.
  • Optimizing dyadic nets

    Ahmed, Abdalla G.M.; Wonka, Peter (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-08) [Article]
    We explore the space of (0, m, 2)-nets in base 2 commonly used for sampling. We present a novel constructive algorithm that can exhaustively generate all nets - - up to m-bit resolution - - and thereby compute the exact number of distinct nets. We observe that the construction algorithm holds the key to defining a transformation operation that lets us transform one valid net into another one. This enables the optimization of digital nets using arbitrary objective functions. For example, we define an analytic energy function for blue noise, and use it to generate nets with high-quality blue-noise frequency power spectra. We also show that the space of (0, 2)-sequences is significantly smaller than nets with the same number of points, which drastically limits the optimizability of sequences.
  • Enhancing Adversarial Robustness via Test-time Transformation Ensembling

    Pérez, Juan C.; Alfarra, Motasem; Jeanneret, Guillaume; Rueda, Laura; Thabet, Ali Kassem; Ghanem, Bernard; Arbeláez, Pablo (arXiv, 2021-07-29) [Preprint]
    Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks. In this work, we study how equipping models with Test-time Transformation Ensembling (TTE) can work as a reliable defense against such attacks. While transforming the input data, both at train and test times, is known to enhance model performance, its effects on adversarial robustness have not been studied. Here, we present a comprehensive empirical study of the impact of TTE, in the form of widely-used image transforms, on adversarial robustness. We show that TTE consistently improves model robustness against a variety of powerful attacks without any need for re-training, and that this improvement comes at virtually no trade-off with accuracy on clean samples. Finally, we show that the benefits of TTE transfer even to the certified robustness domain, in which TTE provides sizable and consistent improvements.
  • CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

    Li, Zhize; Richtarik, Peter (arXiv, 2021-07-20) [Preprint]
    Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} $O\bigg(\sqrt{\Big(1+\sqrt{\frac{\omega^3}{n}}\Big)\frac{L}{\epsilon}} + \omega\big(\frac{1}{\epsilon}\big)^{\frac{1}{3}}\bigg)$, which improves upon the state-of-the-art non-accelerated rate $O\left((1+\frac{\omega}{n})\frac{L}{\epsilon} + \frac{\omega^2+n}{\omega+n}\frac{1}{\epsilon}\right)$ of DIANA (Khaled et al., 2020b) for distributed general convex problems, where $\epsilon$ is the target error, $L$ is the smooth parameter of the objective, $n$ is the number of machines/devices, and $\omega$ is the compression parameter (larger $\omega$ means more compression can be applied, and no compression implies $\omega=0$). Our results show that as long as the number of devices $n$ is large (often true in distributed/federated learning), or the compression $\omega$ is not very high, CANITA achieves the faster convergence rate $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$, i.e., the number of communication rounds is $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$ (vs. $O\big(\frac{L}{\epsilon}\big)$ achieved by previous works). As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).
  • Fire in paradise: mesoscale simulation of wildfires

    Hadrich, Torsten; Banuti, Daniel T.; Pałubicki, Wojtek; Pirk, Sören; Michels, Dominik L. (ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2021-07-19) [Article]
    Resulting from changing climatic conditions, wildfires have become an existential threat across various countries around the world. The complex dynamics paired with their often rapid progression renders wildfires an often disastrous natural phenomenon that is difficult to predict and to counteract. In this paper we present a novel method for simulating wildfires with the goal to realistically capture the combustion process of individual trees and the resulting propagation of fires at the scale of forests. We rely on a state-of-the-art modeling approach for large-scale ecosystems that enables us to represent each plant as a detailed 3D geometric model. We introduce a novel mathematical formulation for the combustion process of plants - also considering effects such as heat transfer, char insulation, and mass loss - as well as for the propagation of fire through the entire ecosystem. Compared to other wildfire simulations which employ geometric representations of plants such as cones or cylinders, our detailed 3D tree models enable us to simulate the interplay of geometric variations of branching structures and the dynamics of fire and wood combustion. Our simulation runs at interactive rates and thereby provides a convenient way to explore different conditions that affect wildfires, ranging from terrain elevation profiles and ecosystem compositions to various measures against wildfires, such as cutting down trees as firebreaks, the application of fire retardant, or the simulation of rain.
  • End-to-End Complex Lens Design with Differentiable Ray Tracing

    Sun, Qilin; Wang, Congli; Fu, Qiang; Dun, Xiong; Heidrich, Wolfgang (ACM TRANSACTIONS ON GRAPHICS, Association for Computing Machinery (ACM), 2021-07-19) [Article]
    Imaging systems have long been designed in separated steps: experience-driven optical design followed by sophisticated image processing. Although recent advances in computational imaging aim to bridge the gap in an end-to-end fashion, the image formation models used in these approaches have been quite simplistic, built either on simple wave optics models such as Fourier transform, or on similar paraxial models. Such models only support the optimization of a single lens surface, which limits the achievable image quality. To overcome these challenges, we propose a general end-to-end complex lens design framework enabled by a differentiable ray tracing image formation model. Specifically, our model relies on the differentiable ray tracing rendering engine to render optical images in the full field by taking into account all on/off-axis aberrations governed by the theory of geometric optics. Our design pipeline can jointly optimize the lens module and the image reconstruction network for a specific imaging task. We demonstrate the effectiveness of the proposed method on two typical applications, including large field-of-view imaging and extended depth-of-field imaging. Both simulation and experimental results show superior image quality compared with conventional lens designs. Our framework offers a competitive alternative for the design of modern imaging systems.

View more