Visual Computing Center (VCC)
Recent Submissions

Intuitive and Efficient Roof Modeling for Reconstruction and Synthesis(arXiv, 20210916) [Preprint]We propose a novel and flexible roof modeling approach that can be used for constructing planar 3D polygon roof meshes. Our method uses a graph structure to encode roof topology and enforces the roof validity by optimizing a simple but effective planarity metric we propose. This approach is significantly more efficient than using general purpose 3D modeling tools such as 3ds Max or SketchUp, and more powerful and expressive than specialized tools such as the straight skeleton. Our optimizationbased formulation is also flexible and can accommodate different styles and user preferences for roof modeling. We showcase two applications. The first application is an interactive roof editing framework that can be used for roof design or roof reconstruction from aerial images. We highlight the efficiency and generality of our approach by constructing a meshimage paired dataset consisting of 2539 roofs. Our second application is a generative model to synthesize new roof meshes from scratch. We use our novel dataset to combine machine learning and our roof optimization techniques, by using transformers and graph convolutional networks to model roof topology, and our roof optimization methods to enforce the planarity constraint.

Features of structure, magnetic state and electrodynamic performance of SrFe12−xInxO19(Scientific reports, Springer Science and Business Media LLC, 20210916) [Article]Indiumsubstituted strontium hexaferrites were prepared by the conventional solidphase reaction method. Neutron diffraction patterns were obtained at room temperature and analyzed using the Rietveld methods. A linear dependence of the unit cell parameters is found. In3+ cations are located mainly in octahedral positions of 4fVI and 12 k. The average crystallite size varies within 0.84–0.65 μm. With increasing substitution, the TC Curie temperature decreases monotonically down to ~ 520 K. ZFC and FC measurements showed a frustrated state. Upon substitution, the average and maximum sizes of ferrimagnetic clusters change in the opposite direction. The Mr remanent magnetization decreases down to ~ 20.2 emu/g at room temperature. The Ms spontaneous magnetization and the keff effective magnetocrystalline anisotropy constant are determined. With increasing substitution, the maximum of the ε/ real part of permittivity decreases in magnitude from ~ 3.3 to ~ 1.9 and shifts towards low frequencies from ~ 45.5 GHz to ~ 37.4 GHz. The maximum of the tg(α) dielectric loss tangent decreases from ~ 1.0 to ~ 0.7 and shifts towards low frequencies from ~ 40.6 GHz to ~ 37.3 GHz. The lowfrequency maximum of the μ/ real part of permeability decreases from ~ 1.8 to ~ 0.9 and slightly shifts towards high frequencies up to ~ 34.7 GHz. The maximum of the tg(δ) magnetic loss tangent decreases from ~ 0.7 to ~ 0.5 and shifts slightly towards low frequencies from ~ 40.5 GHz to ~ 37.7 GHz. The discussion of microwave properties is based on the saturation magnetization, natural ferromagnetic resonance and dielectric polarization types.

Neural Étendue Expander for UltraWideAngle HighFidelity Holographic Display(arXiv, 20210916) [Preprint]Holographic displays can generate light fields by dynamically modulating the wavefront of a coherent beam of light using a spatial light modulator, promising rich virtual and augmented reality applications. However, the limited spatial resolution of existing dynamic spatial light modulators imposes a tight bound on the diffraction angle. As a result, today’s holographic displays possess low etendue, which is the product of the display area and the ´maximum solid angle of diffracted light. The low etendue forces a sacrifice of either the field ´ of view (FOV) or the display size. In this work, we lift this limitation by presenting neural etendue expanders. This new breed of optical elements, which is learned from a natural im ´ age dataset, enables higher diffraction angles for ultrawide FOV while maintaining both a compact form factor and the fidelity of displayed contents to human viewers. With neural etendue expanders, we achieve 64 ´ × etendue expansion of natural images with reconstruction ´ quality (measured in PSNR) over 29 dB on simulated retinalresolution images. As a result, the proposed approach with expansion factor 64× enables highfidelity ultrawideangle holographic projection of natural images using an 8Kpixel SLM, resulting in a 18.5 mm eyebox size and 2.18 steradians FOV, covering 85% of the human stereo FOV.

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition(arXiv, 20210912) [Preprint]Understanding movies and their structural patterns is a crucial task to decode the craft of video editing. While previous works have developed tools for general analysis such as detecting characters or recognizing cinematography properties at the shot level, less effort has been devoted to understanding the most basic video edit, the Cut. This paper introduces the cut type recognition task, which requires modeling of multimodal information. To ignite research in the new task, we construct a largescale dataset called MovieCuts, which contains more than 170K videoclips labeled among ten cut types. We benchmark a series of audiovisual approaches, including some that deal with the problem's multimodal and multilabel nature. Our best model achieves 45.7% mAP, which suggests that the task is challenging and that attaining highly accurate cut type recognition is an open research problem.

IntraTomo: Selfsupervised Learningbased Tomography via Sinogram Synthesis and Prediction(IEEE, 20210910) [Conference Paper]We propose IntraTomo, a powerful framework that combines the benefits of learningbased and modelbased approaches for solving highly illposed inverse problems, in the Computed Tomography (CT) context. IntraTomo is composed of two core modules: a novel sinogram prediction module and a geometry refinement module, which are applied iteratively. In the first module, the unknown density field is represented as a continuous and differentiable function, parameterized by a deep neural network. This network is learned, in a selfsupervised fashion, from the incomplete or/and degraded input sinogram. After getting estimated through the sinogram prediction module, the density field is consistently refined in the second module using local and nonlocal geometrical priors. With these two core modules, we show that IntraTomo significantly outperforms existing approaches on several illposed inverse problems, such as limited angle tomography with a range of 45 degrees, sparse view tomographic reconstruction with as few as eight views, or superresolution tomography with eight times increased resolution. The experiments on simulated and real data show that our approach can achieve results of unprecedented quality.

A duality approach to a price formation MFG model(arXiv, 20210904) [Preprint]We study the connection between the AubryMather theory and a meanfield game (MFG) priceformation model. We introduce a framework for Mather measures that is suited for constrained timedependent problems in R. Then, we propose a variational problem on a space of measures, from which we obtain a duality relation involving the MFG problem examined in [36].

Towards selfcalibrated lens metrology by differentiable refractive deflectometry(Optics Express, The Optical Society, 20210902) [Article]Deflectometry, as a noncontact, fully optical metrology method, is difficult to apply to refractive elements due to multisurface entanglement and precise pose alignment. Here, we present a computational selfcalibration approach to measure parametric lenses using dualcamera refractive deflectometry, achieved by an accurate, differentiable, and efficient ray tracing framework for modeling the metrology setup, based on which damped least squares is utilized to estimate unknown lens shape and pose parameters. We successfully demonstrate both synthetic and experimental results on singlet lens surface curvature and aspherefreeform metrology in a transmissive setting.

FlowGuided Video Inpainting with Scene Templates(arXiv, 20210829) [Preprint]We consider the problem of filling in missing spatiotemporal regions of a video. We provide a novel flowbased solution by introducing a generative model of images in relation to the scene (without missing regions) and mappings from the scene to images. We use the model to jointly infer the scene template, a 2D representation of the scene, and the mappings. This ensures consistency of the frametoframe flows generated to the underlying scene, reducing geometric distortions in flow based inpainting. The template is mapped to the missing regions in the video by a new L2L1 interpolation scheme, creating crisp inpaintings and reducing common blur and distortion artifacts. We show on two benchmark datasets that our approach outperforms stateoftheart quantitatively and in user studies.

Discrete Optimization for Shape Matching(Computer Graphics Forum, Wiley, 20210823) [Article]We propose a novel discrete solver for optimizing functional mapbased energies, including descriptor preservation and promoting structural properties such as areapreservation, bijectivity and Laplacian commutativity among others. Unlike thecommonlyused continuous optimization methods, our approach enforces the functional map to be associated with a pointwisecorrespondence as a hard constraint, which provides a stronger link between optimized properties of functional and pointtopoint maps. Under this hard constraint, our solver obtains functional maps with lower energy values compared to the standardcontinuous strategies. Perhaps more importantly, the recovered pointwise maps from our discrete solver preserve the optimizedfor functional properties and are thus of higher overall quality. We demonstrate the advantages of our discrete solver on arange of energies and shape categories, compared to existing techniques for promoting pointwise maps within the functionalmap framework. Finally, with this solver in hand, we introduce a novel Effective Functional Map Reﬁnement (EFMR) methodwhich achieves the stateoftheart accuracy on the SHREC’19 benchmark.

Large fieldofview holographic display by gapless splicing of multisegment cylindrical holograms(Applied Optics, The Optical Society, 20210817) [Article]A holographic threedimensional (3D) display is a recognized and ideal 3D display technology. In the field of holographic research, cylindrical holography with the merit of 360° field of view (FOV) has recently become a hot issue, as it naturally solves the problem of limited FOV in planar holography. The recently proposed approximate phase compensation (APC) method successfully obtains larger FOV and fast generation of segment cylindrical hologram (SCH) in the visible light band. However, the FOV of SCH remains limited due to its intrinsic limitations, and, to our best knowledge, the issue has not been effectively addressed. In this paper, the restricted conditions are first analyzed for the generation ofSCHby the APC method. Then, anFOV expansion method is proposed for realizing a large FOV holographic display by gapless splicing of multiSCH. The proposed method can successfully obtain larger FOV cylindrical holograms and effectively eliminate the splicing gaps; its effectiveness is verified by the results of numerical simulation and optical experiments. Therefore, the proposed method can effectively solve the FOV limitation problem of the APC method for the generation of SCH in the visible band, realize a large FOV 3D display, and provide a useful reference for holographic3Ddisplay.

Ships, Splashes, and Waves on a Vast Ocean(arXiv, 20210812) [Preprint]The simulation of large open water surface is challenging for a uniform volumetric discretization of the NavierStokes equation. The water splashes near moving objects, which height field methods for water waves cannot capture, necessitates high resolution simulation such as the FluidImplicitParticle (FLIP) method. On the other hand, FLIP is not efficient for the longlasting water waves that propagates to long distances, which requires sufficient depth for correct dispersion relationship. This paper presents a new method to tackle this dilemma through an efficient hybridization of volumetric and surfacebased advectionprojection discretizations. We design a hybrid timestepping algorithm that combines a FLIP domain and an adaptively remeshed Boundary Element Method (BEM) domain for the incompressible Euler equations. The resulting framework captures the detailed water splashes near moving objects with FLIP, and produces convincing water waves with correct dispersion relationship at modest additional cost.

FedPAGE: A Fast Local Stochastic Gradient Method for CommunicationEfficient Federated Learning(arXiv, 20210810) [Preprint]Federated Averaging (FedAvg, also known as LocalSGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local methods for both federated convex and nonconvex optimization. Concretely, 1) in the convex setting, the number of communication rounds of FedPAGE is $O(\frac{N^{3/4}}{S\epsilon})$, improving the bestknown result $O(\frac{N}{S\epsilon})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/4}$, where $N$ is the total number of clients (usually is very large in federated learning), $S$ is the sampled subset of clients in each communication round, and $\epsilon$ is the target error; 2) in the nonconvex setting, the number of communication rounds of FedPAGE is $O(\frac{\sqrt{N}+S}{S\epsilon^2})$, improving the bestknown result $O(\frac{N^{2/3}}{S^{2/3}\epsilon^2})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/6}S^{1/3}$, if the sampled clients $S\leq \sqrt{N}$. Note that in both settings, the communication cost for each round is the same for both FedPAGE and SCAFFOLD. As a result, FedPAGE achieves new stateoftheart results in terms of communication complexity for both federated convex and nonconvex optimization.

Learning to Cut by Watching Movies(arXiv, 20210809) [Preprint]Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires nontrivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn finegrained audiovisual patterns that trigger cuts. To do this, we first collected a data source of more than 10K videos, from which we extract more than 255K cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning. We set up a new task and a set of baselines to benchmark video cut generation. We observe that our proposed model outperforms the baselines by large margins. To demonstrate our model in realworld applications, we conduct human studies in a collection of unedited videos. The results show that our model does a better job at cutting than random and alternative baselines.

Lost photon enhances superresolution(npj Quantum Information, Springer Science and Business Media LLC, 20210809) [Article]Quantum imaging can beat classical resolution limits, imposed by the diffraction of light. In particular, it is known that one can reduce the image blurring and increase the achievable resolution by illuminating an object by entangled light and measuring coincidences of photons. If an nphoton entangled state is used and the nthorder correlation function is measured, the pointspread function (PSF) effectively becomes n−−√ times narrower relatively to classical coherent imaging. Quite surprisingly, measuring nphoton correlations is not the best choice if an nphoton entangled state is available. We show that for measuring (n − 1)photon coincidences (thus, ignoring one of the available photons), PSF can be made even narrower. This observation paves a way for a strong conditional resolution enhancement by registering one of the photons outside the imaging area. We analyze the conditions necessary for the resolution increase and propose a practical scheme, suitable for observation and exploitation of the effect.

Tikhonov Regularization of CircleValued Signals(arXiv, 20210805) [Preprint]It is common to have to process signals or images whose values are cyclic and can be represented as points on the complex circle, like wrapped phases, angles, orientations, or color hues. We consider a Tikhonovtype regularization model to smoothen or interpolate circlevalued signals defined on arbitrary graphs. We propose a convex relaxation of this nonconvex problem as a semidefinite program, and an efficient algorithm to solve it.

Optimizing dyadic nets(ACM Transactions on Graphics, Association for Computing Machinery (ACM), 202108) [Article]We explore the space of (0, m, 2)nets in base 2 commonly used for sampling. We present a novel constructive algorithm that can exhaustively generate all nets   up to mbit resolution   and thereby compute the exact number of distinct nets. We observe that the construction algorithm holds the key to defining a transformation operation that lets us transform one valid net into another one. This enables the optimization of digital nets using arbitrary objective functions. For example, we define an analytic energy function for blue noise, and use it to generate nets with highquality bluenoise frequency power spectra. We also show that the space of (0, 2)sequences is significantly smaller than nets with the same number of points, which drastically limits the optimizability of sequences.

Enhancing Adversarial Robustness via Testtime Transformation Ensembling(arXiv, 20210729) [Preprint]Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks. In this work, we study how equipping models with Testtime Transformation Ensembling (TTE) can work as a reliable defense against such attacks. While transforming the input data, both at train and test times, is known to enhance model performance, its effects on adversarial robustness have not been studied. Here, we present a comprehensive empirical study of the impact of TTE, in the form of widelyused image transforms, on adversarial robustness. We show that TTE consistently improves model robustness against a variety of powerful attacks without any need for retraining, and that this improvement comes at virtually no tradeoff with accuracy on clean samples. Finally, we show that the benefits of TTE transfer even to the certified robustness domain, in which TTE provides sizable and consistent improvements.

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression(arXiv, 20210720) [Preprint]Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradienttype methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} $O\bigg(\sqrt{\Big(1+\sqrt{\frac{\omega^3}{n}}\Big)\frac{L}{\epsilon}} + \omega\big(\frac{1}{\epsilon}\big)^{\frac{1}{3}}\bigg)$, which improves upon the stateoftheart nonaccelerated rate $O\left((1+\frac{\omega}{n})\frac{L}{\epsilon} + \frac{\omega^2+n}{\omega+n}\frac{1}{\epsilon}\right)$ of DIANA (Khaled et al., 2020b) for distributed general convex problems, where $\epsilon$ is the target error, $L$ is the smooth parameter of the objective, $n$ is the number of machines/devices, and $\omega$ is the compression parameter (larger $\omega$ means more compression can be applied, and no compression implies $\omega=0$). Our results show that as long as the number of devices $n$ is large (often true in distributed/federated learning), or the compression $\omega$ is not very high, CANITA achieves the faster convergence rate $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$, i.e., the number of communication rounds is $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$ (vs. $O\big(\frac{L}{\epsilon}\big)$ achieved by previous works). As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).

Fire in paradise: mesoscale simulation of wildfires(ACM Transactions on Graphics, Association for Computing Machinery (ACM), 20210719) [Article]Resulting from changing climatic conditions, wildfires have become an existential threat across various countries around the world. The complex dynamics paired with their often rapid progression renders wildfires an often disastrous natural phenomenon that is difficult to predict and to counteract. In this paper we present a novel method for simulating wildfires with the goal to realistically capture the combustion process of individual trees and the resulting propagation of fires at the scale of forests. We rely on a stateoftheart modeling approach for largescale ecosystems that enables us to represent each plant as a detailed 3D geometric model. We introduce a novel mathematical formulation for the combustion process of plants  also considering effects such as heat transfer, char insulation, and mass loss  as well as for the propagation of fire through the entire ecosystem. Compared to other wildfire simulations which employ geometric representations of plants such as cones or cylinders, our detailed 3D tree models enable us to simulate the interplay of geometric variations of branching structures and the dynamics of fire and wood combustion. Our simulation runs at interactive rates and thereby provides a convenient way to explore different conditions that affect wildfires, ranging from terrain elevation profiles and ecosystem compositions to various measures against wildfires, such as cutting down trees as firebreaks, the application of fire retardant, or the simulation of rain.

EndtoEnd Complex Lens Design with Differentiable Ray Tracing(ACM TRANSACTIONS ON GRAPHICS, Association for Computing Machinery (ACM), 20210719) [Article]Imaging systems have long been designed in separated steps: experiencedriven optical design followed by sophisticated image processing. Although recent advances in computational imaging aim to bridge the gap in an endtoend fashion, the image formation models used in these approaches have been quite simplistic, built either on simple wave optics models such as Fourier transform, or on similar paraxial models. Such models only support the optimization of a single lens surface, which limits the achievable image quality. To overcome these challenges, we propose a general endtoend complex lens design framework enabled by a differentiable ray tracing image formation model. Specifically, our model relies on the differentiable ray tracing rendering engine to render optical images in the full field by taking into account all on/offaxis aberrations governed by the theory of geometric optics. Our design pipeline can jointly optimize the lens module and the image reconstruction network for a specific imaging task. We demonstrate the effectiveness of the proposed method on two typical applications, including large fieldofview imaging and extended depthoffield imaging. Both simulation and experimental results show superior image quality compared with conventional lens designs. Our framework offers a competitive alternative for the design of modern imaging systems.