### Recent Submissions

• #### GLAM: Glycogen-derived Lactate Absorption Map for visual analysis of dense and sparse surface reconstructions of rodent brain structures on desktop systems and virtual environments

(Computers & Graphics, Elsevier BV, 2018-05-21) [Article]
Human brain accounts for about one hundred billion neurons, but they cannot work properly without ultrastructural and metabolic support. For this reason, mammalian brains host another type of cells called “glial cells”, whose role is to maintain proper conditions for efficient neuronal function. One type of glial cell, astrocytes, are involved in particular in the metabolic support of neurons, by feeding them with lactate, one byproduct of glucose metabolism that they can take up from blood vessels, and store it under another form, glycogen granules. These energy-storage molecules, whose morphology resembles to spheres with a diameter ranging 10–80 nanometers roughly, can be easily recognized using electron microscopy, the only technique whose resolution is high enough to resolve them. Understanding and quantifying their distribution is of particular relevance for neuroscientists, in order to understand where and when neurons use energy under this form. To answer this question, we developed a visualization technique, dubbed GLAM (Glycogen-derived Lactate Absorption Map), and customized for the analysis of the interaction of astrocytic glycogen on surrounding neurites in order to formulate hypotheses on the energy absorption mechanisms. The method integrates high-resolution surface reconstruction of neurites, astrocytes, and the energy sources in form of glycogen granules from different automated serial electron microscopy methods, like focused ion beam scanning electron microscopy (FIB-SEM) or serial block face electron microscopy (SBEM), together with an absorption map computed as a radiance transfer mechanism. The resulting visual representation provides an immediate and comprehensible illustration of the areas in which the probability of lactate shuttling is higher. The computed dataset can be then explored and quantified in a 3D space, either using 3D modeling software or virtual reality environments. Domain scientists have evaluated the technique by either using the computed maps for formulating functional hypotheses or for planning sparse reconstructions to avoid excessive occlusion. Furthermore, we conducted a pioneering user study showing that immersive VR setups can ease the investigation of the areas of interest and the analysis of the absorption patterns in the cellular structures.
• #### Isotropic Surface Remeshing without Large and Small Angles

(IEEE Transactions on Visualization and Computer Graphics, Institute of Electrical and Electronics Engineers (IEEE), 2018-05-18) [Article]
We introduce a novel algorithm for isotropic surface remeshing which progressively eliminates obtuse triangles and improves small angles. The main novelty of the proposed approach is a simple vertex insertion scheme that facilitates the removal of large angles, and a vertex removal operation that improves the distribution of small angles. In combination with other standard local mesh operators, e.g., connectivity optimization and local tangential smoothing, our algorithm is able to remesh efficiently a low-quality mesh surface. Our approach can be applied directly or used as a post-processing step following other remeshing approaches. Our method has a similar computational efficiency to the fastest approach available, i.e., real-time adaptive remeshing [1]. In comparison with state-of-the-art approaches, our method consistently generates better results based on evaluations using different metrics.
• #### Discriminative Transfer Learning for General Image Restoration

(IEEE Transactions on Image Processing, Institute of Electrical and Electronics Engineers (IEEE), 2018-04-30) [Article]
Recently, several discriminative learning approaches have been proposed for effective image restoration, achieving convincing trade-off between image quality and computational efficiency. However, these methods require separate training for each restoration task (e.g., denoising, deblurring, demosaicing) and problem condition (e.g., noise level of input images). This makes it time-consuming and difficult to encompass all tasks and conditions during training. In this paper, we propose a discriminative transfer learning method that incorporates formal proximal optimization and discriminative learning for general image restoration. The method requires a single-pass discriminative training and allows for reuse across various problems and conditions while achieving an efficiency comparable to previous discriminative approaches. Furthermore, after being trained, our model can be easily transferred to new likelihood terms to solve untrained tasks, or be combined with existing priors to further improve image restoration quality.
• #### Weighted Low-Rank Approximation of Matrices and Background Modeling

(arXiv, 2018-04-15) [Preprint]
We primarily study a special a weighted low-rank approximation of matrices and then apply it to solve the background modeling problem. We propose two algorithms for this purpose: one operates in the batch mode on the entire data and the other one operates in the batch-incremental mode on the data and naturally captures more background variations and computationally more effective. Moreover, we propose a robust technique that learns the background frame indices from the data and does not require any training frames. We demonstrate through extensive experiments that by inserting a simple weight in the Frobenius norm, it can be made robust to the outliers similar to the $\ell_1$ norm. Our methods match or outperform several state-of-the-art online and batch background modeling methods in virtually all quantitative and qualitative measures.
• #### Supervised Convolutional Sparse Coding

(arXiv, 2018-04-08) [Preprint]
Convolutional Sparse Coding (CSC) is a well-established image representation model especially suited for image restoration tasks. In this work, we extend the applicability of this model by proposing a supervised approach to convolutional sparse coding, which aims at learning discriminative dictionaries instead of purely reconstructive ones. We incorporate a supervised regularization term into the traditional unsupervised CSC objective to encourage the final dictionary elements to be discriminative. Experimental results show that using supervised convolutional learning results in two key advantages. First, we learn more semantically relevant filters in the dictionary and second, we achieve improved image reconstruction on unseen data.
• #### Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

(International Journal of Computer Vision, Springer Nature, 2018-04-06) [Article]
This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their labels are missing). The key point to handle missing labels is propagating the label information from the provided labels to missing labels, through a dependency graph that each label of each instance is treated as a node. We build this graph by utilizing different types of label dependencies. Specifically, the instance-level similarity is served as undirected edges to connect the label nodes across different instances and the semantic label hierarchy is used as directed edges to connect different classes. This base graph is referred to as the mixed dependency graph, as it includes both undirected and directed edges. Furthermore, we present another two types of label dependencies to connect the label nodes across different classes. One is the class co-occurrence, which is also encoded as undirected edges. Combining with the above base graph, we obtain a new mixed graph, called mixed graph with co-occurrence (MG-CO). The other is the sparse and low rank decomposition of the whole label matrix, to embed high-order dependencies over all labels. Combining with the base graph, the new mixed graph is called as MG-SL (mixed graph with sparse and low rank decomposition). Based on MG-CO and MG-SL, we further propose two convex transductive formulations of the MLML problem, denoted as MLMG-CO and MLMG-SL respectively. In both formulations, the instance-level similarity is embedded through a quadratic smoothness term, while the semantic label hierarchy is used as a linear constraint. In MLMG-CO, the class co-occurrence is also formulated as a quadratic smoothness term, while the sparse and low rank decomposition is incorporated into MLMG-SL, through two additional matrices (one is assumed as sparse, and the other is assumed as low rank) and an equivalence constraint between the summation of this two matrices and the original label matrix. Interestingly, two important applications, including image annotation and tag based image retrieval, can be jointly handled using our proposed methods. Experimental results on several benchmark datasets show that our methods lead to significant improvements in performance and robustness to missing labels over the state-of-the-art methods.
• #### Guess Where? Actor-Supervision for Spatiotemporal Action Localization

(arXiv, 2018-04-05) [Preprint]
This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which is linked over time by Siamese similarity matching to account for actor deformations. Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable. Experiments on three human and non-human action datasets show actor supervision is state-of-the-art for weakly-supervised action localization and is even competitive to some fully-supervised alternatives.
• #### Accelerated Optimization in the PDE Framework: Formulations for the Manifold of Diffeomorphisms

(arXiv, 2018-04-04) [Preprint]
We consider the problem of optimization of cost functionals on the infinite-dimensional manifold of diffeomorphisms. We present a new class of optimization methods, valid for any optimization problem setup on the space of diffeomorphisms by generalizing Nesterov accelerated optimization to the manifold of diffeomorphisms. While our framework is general for infinite dimensional manifolds, we specifically treat the case of diffeomorphisms, motivated by optical flow problems in computer vision. This is accomplished by building on a recent variational approach to a general class of accelerated optimization methods by Wibisono, Wilson and Jordan, which applies in finite dimensions. We generalize that approach to infinite dimensional manifolds. We derive the surprisingly simple continuum evolution equations, which are partial differential equations, for accelerated gradient descent, and relate it to simple mechanical principles from fluid mechanics. Our approach has natural connections to the optimal mass transport problem. This is because one can think of our approach as an evolution of an infinite number of particles endowed with mass (represented with a mass density) that moves in an energy landscape. The mass evolves with the optimization variable, and endows the particles with dynamics. This is different than the finite dimensional case where only a single particle moves and hence the dynamics does not depend on the mass. We derive the theory, compute the PDEs for accelerated optimization, and illustrate the behavior of these new accelerated optimization schemes.
• #### Tagging like Humans: Diverse and Distinct Image Annotation

(arXiv, 2018-03-31) [Preprint]
In this work we propose a new automatic image annotation model, dubbed {\bf diverse and distinct image annotation} (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.
• #### Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

(International Journal of Computer Vision, Springer Nature, 2018-03-24) [Article]
We present a photo-realistic training and evaluation simulator (Sim4CV) (http://www.sim4cv.org) with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool.
• #### SurfCut: Surfaces of Minimal Paths From Topological Structures

(IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers (IEEE), 2018-03-05) [Article]
We present SurfCut, an algorithm for extracting a smooth, simple surface with an unknown 3D curve boundary from a noisy image and a seed point. Our method is built on the novel observation that certain ridge curves of a function defined on a front propagated using the Fast Marching algorithm lie on the surface. Our method extracts and cuts these ridges to form the surface boundary. Our surface extraction algorithm is built on the novel observation that the surface lies in a valley of the distance from Fast Marching. We show that the resulting surface is a collection of minimal paths. Using the framework of cubical complexes and Morse theory, we design algorithms to extract these critical structures robustly. Experiments on three 3D datasets show the robustness of our method, and that it achieves higher accuracy with lower computational cost than state-of-the-art.
• #### Data-Driven Analysis of Virtual 3D Exploration of a Large Sculpture Collection in Real-World Museum Exhibitions

(Journal on Computing and Cultural Heritage, Association for Computing Machinery (ACM), 2018-01-29) [Article]
We analyze use of an interactive system for the exploration of highly detailed three-dimensional (3D) models of a collection of protostoric Mediterranean sculptures. In this system, when the object of interest is selected, its detailed 3D model and associated information are presented at high resolution on a large display controlled by a touch-enabled horizontal surface at a suitable distance. The user interface combines an object-Aware interactive camera controller with an interactive point-ofinterest selector and is implemented within a scalable implementation based on multiresolution structures shared between the rendering and user interaction subsystems. The system was installed in several temporary and permanent exhibitions and was extensively used by tens of thousands of visitors. We provide a data-driven analysis of usage experience based on logs gathered during a 27-month period at four exhibitions in archeological museums for a total of more than 75K exploration sessions. We focus on discerning the main visitor behaviors during 3D exploration by employing tools for deriving interest measures on surfaces and tools for clustering and knowledge discovery from high-dimensional data. The results highlight the main trends in visitor behavior during the interactive sessions. These results provide useful insights for the design of 3D exploration user interfaces in future digital installations.© 2017 ACM 1556-4673/2017/12-ART2 \$15.00.
• #### Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

(arXiv, 2018-01-28) [Preprint]
Activity detection is a fundamental problem in computer vision. Detecting activities of different temporal scales is particularly challenging. In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection. To deal with the inherent temporal scale variability of activity instances, the temporal feature pyramid is used to represent activities of different temporal scales. On each level of the temporal feature pyramid, an activity proposal detector and an activity classifier are learned to detect activities of specific temporal scales. Temporal contextual information is fused into activity classifiers for better recognition. More importantly, the entire model at all levels can be trained end-to-end. Our CMS-RC3D detector can deal with activities at all temporal scale ranges with only a single pass through the backbone network. We test our detector on two public activity detection benchmarks, THUMOS14 and ActivityNet. Extensive experiments show that the proposed CMS-RC3D detector outperforms state-of-the-art methods on THUMOS14 by a substantial margin and achieves comparable results on ActivityNet despite using a shallow feature extractor.
• #### Structure-aware Local Sparse Coding for Visual Tracking

(IEEE Transactions on Image Processing, Institute of Electrical and Electronics Engineers (IEEE), 2018-01-24) [Article]
Sparse coding has been applied to visual tracking and related vision problems with demonstrated success in recent years. Existing tracking methods based on local sparse coding sample patches from a target candidate and sparsely encode these using a dictionary consisting of patches sampled from target template images. The discriminative strength of existing methods based on local sparse coding is limited as spatial structure constraints among the template patches are not exploited. To address this problem, we propose a structure-aware local sparse coding algorithm which encodes a target candidate using templates with both global and local sparsity constraints. For robust tracking, we show local regions of a candidate region should be encoded only with the corresponding local regions of the target templates that are the most similar from the global view. Thus, a more precise and discriminative sparse representation is obtained to account for appearance changes. To alleviate the issues with tracking drifts, we design an effective template update scheme. Extensive experiments on challenging image sequences demonstrate the effectiveness of the proposed algorithm against numerous stateof- the-art methods.
• #### Metamorphers

(Proceedings of the 33rd Spring Conference on Computer Graphics - SCCG '17, ACM Press, 2018-01-18) [Conference Paper]
In molecular biology, illustrative animations are used to convey complex biological phenomena to broad audiences. However, such animations have to be manually authored in 3D modeling software, a time consuming task that has to be repeated from scratch for every new data set, and requires a high level of expertise in illustration, animation, and biology. We therefore propose metamorphers: a set of operations for defining animation states as well as the transitions to them in the form of re-usable storytelling templates. The re-usability is two-fold. Firstly, due to their modular nature, metamorphers can be re-used in different combinations to create a wide range of animations. Secondly, due to their abstract nature, metamorphers can be re-used to re-create an intended animation for a wide range of compatible data sets. Metamorphers thereby mask the low-level complexity of explicit animation specifications by exploiting the inherent properties of the molecular data, such as the position, size, and hierarchy level of a semantic data subset. We demonstrate the re-usability of our technique based on the authoring and application of two animation use-cases to three molecular data sets.
• #### 2D-Driven 3D Object Detection in RGB-D Images

(2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017-12-25) [Conference Paper]
In this paper, we present a technique that places 3D bounding boxes around objects in an RGB-D scene. Our approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques. We then use the 3D information to orient, place, and score bounding boxes around objects. We independently estimate the orientation for every object, using previous techniques that utilize normal information. Object locations and sizes in 3D are learned using a multilayer perceptron (MLP). In the final step, we refine our detections based on object class relations within a scene. When compared to state-of-the-art detection methods that operate almost entirely in the sparse 3D domain, extensive experiments on the well-known SUN RGB-D dataset [29] show that our proposed method is much faster (4.1s per image) in detecting 3D objects in RGB-D images and performs better (3 mAP higher) than the state-of-the-art method that is 4.7 times slower and comparably to the method that is two orders of magnitude slower. This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.
• #### Revisiting Cross-Channel Information Transfer for Chromatic Aberration Correction

(2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017-12-25) [Conference Paper]
Image aberrations can cause severe degradation in image quality for consumer-level cameras, especially under the current tendency to reduce the complexity of lens designs in order to shrink the overall size of modules. In simplified optical designs, chromatic aberration can be one of the most significant causes for degraded image quality, and it can be quite difficult to remove in post-processing, since it results in strong blurs in at least some of the color channels. In this work, we revisit the pixel-wise similarity between different color channels of the image and accordingly propose a novel algorithm for correcting chromatic aberration based on this cross-channel correlation. In contrast to recent weak prior-based models, ours uses strong pixel-wise fitting and transfer, which lead to significant quality improvements for large chromatic aberrations. Experimental results on both synthetic and real world images captured by different optical systems demonstrate that the chromatic aberration can be significantly reduced using our approach.
• #### PolyFit: Polygonal Surface Reconstruction from Point Clouds

(2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017-12-25) [Conference Paper]
We propose a novel framework for reconstructing lightweight polygonal surfaces from point clouds. Unlike traditional methods that focus on either extracting good geometric primitives or obtaining proper arrangements of primitives, the emphasis of this work lies in intersecting the primitives (planes only) and seeking for an appropriate combination of them to obtain a manifold polygonal surface model without boundary.,We show that reconstruction from point clouds can be cast as a binary labeling problem. Our method is based on a hypothesizing and selection strategy. We first generate a reasonably large set of face candidates by intersecting the extracted planar primitives. Then an optimal subset of the candidate faces is selected through optimization. Our optimization is based on a binary linear programming formulation under hard constraints that enforce the final polygonal surface model to be manifold and watertight. Experiments on point clouds from various sources demonstrate that our method can generate lightweight polygonal surface models of arbitrary piecewise planar objects. Besides, our method is capable of recovering sharp features and is robust to noise, outliers, and missing data.
• #### High Order Tensor Formulation for Convolutional Sparse Coding

(2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017-12-25) [Conference Paper]
Convolutional sparse coding (CSC) has gained attention for its successful role as a reconstruction and a classification tool in the computer vision and machine learning community. Current CSC methods can only reconstruct singlefeature 2D images independently. However, learning multidimensional dictionaries and sparse codes for the reconstruction of multi-dimensional data is very important, as it examines correlations among all the data jointly. This provides more capacity for the learned dictionaries to better reconstruct data. In this paper, we propose a generic and novel formulation for the CSC problem that can handle an arbitrary order tensor of data. Backed with experimental results, our proposed formulation can not only tackle applications that are not possible with standard CSC solvers, including colored video reconstruction (5D- tensors), but it also performs favorably in reconstruction with much fewer parameters as compared to naive extensions of standard CSC to multiple features/channels.