Recent Submissions

  • GLAM: Glycogen-derived Lactate Absorption Map for visual analysis of dense and sparse surface reconstructions of rodent brain structures on desktop systems and virtual environments

    Agus, Marco; Boges, Daniya; Gagnon, Nicolas; Magistretti, Pierre J.; Hadwiger, Markus; Cali, Corrado (Elsevier BV, 2018-05-21)
    Human brain accounts for about one hundred billion neurons, but they cannot work properly without ultrastructural and metabolic support. For this reason, mammalian brains host another type of cells called “glial cells”, whose role is to maintain proper conditions for efficient neuronal function. One type of glial cell, astrocytes, are involved in particular in the metabolic support of neurons, by feeding them with lactate, one byproduct of glucose metabolism that they can take up from blood vessels, and store it under another form, glycogen granules. These energy-storage molecules, whose morphology resembles to spheres with a diameter ranging 10–80 nanometers roughly, can be easily recognized using electron microscopy, the only technique whose resolution is high enough to resolve them. Understanding and quantifying their distribution is of particular relevance for neuroscientists, in order to understand where and when neurons use energy under this form. To answer this question, we developed a visualization technique, dubbed GLAM (Glycogen-derived Lactate Absorption Map), and customized for the analysis of the interaction of astrocytic glycogen on surrounding neurites in order to formulate hypotheses on the energy absorption mechanisms. The method integrates high-resolution surface reconstruction of neurites, astrocytes, and the energy sources in form of glycogen granules from different automated serial electron microscopy methods, like focused ion beam scanning electron microscopy (FIB-SEM) or serial block face electron microscopy (SBEM), together with an absorption map computed as a radiance transfer mechanism. The resulting visual representation provides an immediate and comprehensible illustration of the areas in which the probability of lactate shuttling is higher. The computed dataset can be then explored and quantified in a 3D space, either using 3D modeling software or virtual reality environments. Domain scientists have evaluated the technique by either using the computed maps for formulating functional hypotheses or for planning sparse reconstructions to avoid excessive occlusion. Furthermore, we conducted a pioneering user study showing that immersive VR setups can ease the investigation of the areas of interest and the analysis of the absorption patterns in the cellular structures.
  • Isotropic Surface Remeshing without Large and Small Angles

    Wang, Yiqun; Yan, Dong-Ming; Liu, Xiaohan; Tang, Chengcheng; Guo, Jianwei; Zhang, Xiaopeng; Wonka, Peter (Institute of Electrical and Electronics Engineers (IEEE), 2018-05-18)
    We introduce a novel algorithm for isotropic surface remeshing which progressively eliminates obtuse triangles and improves small angles. The main novelty of the proposed approach is a simple vertex insertion scheme that facilitates the removal of large angles, and a vertex removal operation that improves the distribution of small angles. In combination with other standard local mesh operators, e.g., connectivity optimization and local tangential smoothing, our algorithm is able to remesh efficiently a low-quality mesh surface. Our approach can be applied directly or used as a post-processing step following other remeshing approaches. Our method has a similar computational efficiency to the fastest approach available, i.e., real-time adaptive remeshing [1]. In comparison with state-of-the-art approaches, our method consistently generates better results based on evaluations using different metrics.
  • Discriminative Transfer Learning for General Image Restoration

    Xiao, Lei; Heide, Felix; Heidrich, Wolfgang; Schölkopf, Bernhard; Hirsch, Michael (Institute of Electrical and Electronics Engineers (IEEE), 2018-04-30)
    Recently, several discriminative learning approaches have been proposed for effective image restoration, achieving convincing trade-off between image quality and computational efficiency. However, these methods require separate training for each restoration task (e.g., denoising, deblurring, demosaicing) and problem condition (e.g., noise level of input images). This makes it time-consuming and difficult to encompass all tasks and conditions during training. In this paper, we propose a discriminative transfer learning method that incorporates formal proximal optimization and discriminative learning for general image restoration. The method requires a single-pass discriminative training and allows for reuse across various problems and conditions while achieving an efficiency comparable to previous discriminative approaches. Furthermore, after being trained, our model can be easily transferred to new likelihood terms to solve untrained tasks, or be combined with existing priors to further improve image restoration quality.
  • Weighted Low-Rank Approximation of Matrices and Background Modeling

    Dutta, Aritra; Li, Xin; Richtarik, Peter (arXiv, 2018-04-15)
    We primarily study a special a weighted low-rank approximation of matrices and then apply it to solve the background modeling problem. We propose two algorithms for this purpose: one operates in the batch mode on the entire data and the other one operates in the batch-incremental mode on the data and naturally captures more background variations and computationally more effective. Moreover, we propose a robust technique that learns the background frame indices from the data and does not require any training frames. We demonstrate through extensive experiments that by inserting a simple weight in the Frobenius norm, it can be made robust to the outliers similar to the $\ell_1$ norm. Our methods match or outperform several state-of-the-art online and batch background modeling methods in virtually all quantitative and qualitative measures.
  • SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos

    Giancola, Silvio; Amine, Mohieddine; Dghaily, Tarek; Ghanem, Bernard (arXiv, 2018-04-12)
    In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The dataset is composed of 500 complete soccer games from six main European leagues, covering three seasons from 2014 to 2017 and a total duration of 764 hours. A total of 6,637 temporal annotations are automatically parsed from online match reports at a one minute resolution for three main classes of events (Goal, Yellow/Red Card, and Substitution). As such, the dataset is easily scalable. These annotations are manually refined to a one second resolution by anchoring them at a single timestamp following well-defined soccer rules. With an average of one event every 6.9 minutes, this dataset focuses on the problem of localizing very sparse events within long videos. We define the task of spotting as finding the anchors of soccer events in a video. Making use of recent developments in the realm of generic action recognition and detection in video, we provide strong baselines for detecting soccer events. We show that our best model for classifying temporal segments of length one minute reaches a mean Average Precision (mAP) of 67.8%. For the spotting task, our baseline reaches an Average-mAP of 49.7% for tolerances $\delta$ ranging from 5 to 60 seconds.
  • Supervised Convolutional Sparse Coding

    Affara, Lama Ahmed; Ghanem, Bernard; Wonka, Peter (arXiv, 2018-04-08)
    Convolutional Sparse Coding (CSC) is a well-established image representation model especially suited for image restoration tasks. In this work, we extend the applicability of this model by proposing a supervised approach to convolutional sparse coding, which aims at learning discriminative dictionaries instead of purely reconstructive ones. We incorporate a supervised regularization term into the traditional unsupervised CSC objective to encourage the final dictionary elements to be discriminative. Experimental results show that using supervised convolutional learning results in two key advantages. First, we learn more semantically relevant filters in the dictionary and second, we achieve improved image reconstruction on unseen data.
  • Multi-label Learning with Missing Labels Using Mixed Dependency Graphs

    Wu, Baoyuan; Jia, Fan; Liu, Wei; Ghanem, Bernard; Lyu, Siwei (Springer Nature, 2018-04-06)
    This work focuses on the problem of multi-label learning with missing labels (MLML), which aims to label each test instance with multiple class labels given training instances that have an incomplete/partial set of these labels (i.e., some of their labels are missing). The key point to handle missing labels is propagating the label information from the provided labels to missing labels, through a dependency graph that each label of each instance is treated as a node. We build this graph by utilizing different types of label dependencies. Specifically, the instance-level similarity is served as undirected edges to connect the label nodes across different instances and the semantic label hierarchy is used as directed edges to connect different classes. This base graph is referred to as the mixed dependency graph, as it includes both undirected and directed edges. Furthermore, we present another two types of label dependencies to connect the label nodes across different classes. One is the class co-occurrence, which is also encoded as undirected edges. Combining with the above base graph, we obtain a new mixed graph, called mixed graph with co-occurrence (MG-CO). The other is the sparse and low rank decomposition of the whole label matrix, to embed high-order dependencies over all labels. Combining with the base graph, the new mixed graph is called as MG-SL (mixed graph with sparse and low rank decomposition). Based on MG-CO and MG-SL, we further propose two convex transductive formulations of the MLML problem, denoted as MLMG-CO and MLMG-SL respectively. In both formulations, the instance-level similarity is embedded through a quadratic smoothness term, while the semantic label hierarchy is used as a linear constraint. In MLMG-CO, the class co-occurrence is also formulated as a quadratic smoothness term, while the sparse and low rank decomposition is incorporated into MLMG-SL, through two additional matrices (one is assumed as sparse, and the other is assumed as low rank) and an equivalence constraint between the summation of this two matrices and the original label matrix. Interestingly, two important applications, including image annotation and tag based image retrieval, can be jointly handled using our proposed methods. Experimental results on several benchmark datasets show that our methods lead to significant improvements in performance and robustness to missing labels over the state-of-the-art methods.
  • Guess Where? Actor-Supervision for Spatiotemporal Action Localization

    Escorcia, Victor; Dao, Cuong D.; Jain, Mihir; Ghanem, Bernard; Snoek, Cees (arXiv, 2018-04-05)
    This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which is linked over time by Siamese similarity matching to account for actor deformations. Second, we propose an actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable. Experiments on three human and non-human action datasets show actor supervision is state-of-the-art for weakly-supervised action localization and is even competitive to some fully-supervised alternatives.
  • Accelerated Optimization in the PDE Framework: Formulations for the Manifold of Diffeomorphisms

    Sundaramoorthi, Ganesh; Yezzi, Anthony (arXiv, 2018-04-04)
    We consider the problem of optimization of cost functionals on the infinite-dimensional manifold of diffeomorphisms. We present a new class of optimization methods, valid for any optimization problem setup on the space of diffeomorphisms by generalizing Nesterov accelerated optimization to the manifold of diffeomorphisms. While our framework is general for infinite dimensional manifolds, we specifically treat the case of diffeomorphisms, motivated by optical flow problems in computer vision. This is accomplished by building on a recent variational approach to a general class of accelerated optimization methods by Wibisono, Wilson and Jordan, which applies in finite dimensions. We generalize that approach to infinite dimensional manifolds. We derive the surprisingly simple continuum evolution equations, which are partial differential equations, for accelerated gradient descent, and relate it to simple mechanical principles from fluid mechanics. Our approach has natural connections to the optimal mass transport problem. This is because one can think of our approach as an evolution of an infinite number of particles endowed with mass (represented with a mass density) that moves in an energy landscape. The mass evolves with the optimization variable, and endows the particles with dynamics. This is different than the finite dimensional case where only a single particle moves and hence the dynamics does not depend on the mass. We derive the theory, compute the PDEs for accelerated optimization, and illustrate the behavior of these new accelerated optimization schemes.
  • Tagging like Humans: Diverse and Distinct Image Annotation

    Wu, Baoyuan; Chen, Weidong; Sun, Peng; Liu, Wei; Ghanem, Bernard; Lyu, Siwei (arXiv, 2018-03-31)
    In this work we propose a new automatic image annotation model, dubbed {\bf diverse and distinct image annotation} (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.
  • TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    Müller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard (arXiv, 2018-03-28)
    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
  • Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

    Müller, Matthias; Casser, Vincent; Lahoud, Jean; Smith, Neil; Ghanem, Bernard (Springer Nature, 2018-03-24)
    We present a photo-realistic training and evaluation simulator (Sim4CV) ( with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool.
  • SurfCut: Surfaces of Minimal Paths From Topological Structures

    Algarni, Marei Saeed Mohammed; Sundaramoorthi, Ganesh (Institute of Electrical and Electronics Engineers (IEEE), 2018-03-05)
    We present SurfCut, an algorithm for extracting a smooth, simple surface with an unknown 3D curve boundary from a noisy image and a seed point. Our method is built on the novel observation that certain ridge curves of a function defined on a front propagated using the Fast Marching algorithm lie on the surface. Our method extracts and cuts these ridges to form the surface boundary. Our surface extraction algorithm is built on the novel observation that the surface lies in a valley of the distance from Fast Marching. We show that the resulting surface is a collection of minimal paths. Using the framework of cubical complexes and Morse theory, we design algorithms to extract these critical structures robustly. Experiments on three 3D datasets show the robustness of our method, and that it achieves higher accuracy with lower computational cost than state-of-the-art.
  • Teaching UAVs to Race With Observational Imitation Learning

    Li, Guohao; Mueller, Matthias; Casser, Vincent; Smith, Neil; Michels, Dominik L.; Ghanem, Bernard (arXiv, 2018-03-03)
    Recent work has tackled the problem of autonomous navigation by imitating a teacher and learning an end-to-end policy, which directly predicts controls from raw images. However, these approaches tend to be sensitive to mistakes by the teacher and do not scale well to other environments or vehicles. To this end, we propose a modular network architecture that decouples perception from control, and is trained using Observational Imitation Learning (OIL), a novel imitation learning variant that supports online training and automatic selection of optimal behavior from observing multiple teachers. We apply our proposed methodology to the challenging problem of unmanned aerial vehicle (UAV) racing. We develop a simulator that enables the generation of large amounts of synthetic training data (both UAV captured images and its controls) and also allows for online learning and evaluation. We train a perception network to predict waypoints from raw image data and a control network to predict UAV controls from these waypoints using OIL. Our modular network is able to autonomously fly a UAV through challenging race tracks at high speeds. Extensive experiments demonstrate that our trained network outperforms its teachers, end-to-end baselines, and even human pilots in simulation. The supplementary video can be viewed at
  • Integration of Absolute Orientation Measurements in the KinectFusion Reconstruction pipeline

    Giancola, Silvio; Schneider, Jens; Wonka, Peter; Ghanem, Bernard (arXiv, 2018-02-12)
    In this paper, we show how absolute orientation measurements provided by low-cost but high-fidelity IMU sensors can be integrated into the KinectFusion pipeline. We show that integration improves both runtime, robustness and quality of the 3D reconstruction. In particular, we use this orientation data to seed and regularize the ICP registration technique. We also present a technique to filter the pairs of 3D matched points based on the distribution of their distances. This filter is implemented efficiently on the GPU. Estimating the distribution of the distances helps control the number of iterations necessary for the convergence of the ICP algorithm. Finally, we show experimental results that highlight improvements in robustness, a speed-up of almost 12%, and a gain in tracking quality of 53% for the ATE metric on the Freiburg benchmark.
  • Data-Driven Analysis of Virtual 3D Exploration of a Large Sculpture Collection in Real-World Museum Exhibitions

    Agus, Marco; Marton, Fabio; Bettio, Fabio; Hadwiger, Markus; Gobbetti, Enrico (Association for Computing Machinery (ACM), 2018-01-29)
    We analyze use of an interactive system for the exploration of highly detailed three-dimensional (3D) models of a collection of protostoric Mediterranean sculptures. In this system, when the object of interest is selected, its detailed 3D model and associated information are presented at high resolution on a large display controlled by a touch-enabled horizontal surface at a suitable distance. The user interface combines an object-Aware interactive camera controller with an interactive point-ofinterest selector and is implemented within a scalable implementation based on multiresolution structures shared between the rendering and user interaction subsystems. The system was installed in several temporary and permanent exhibitions and was extensively used by tens of thousands of visitors. We provide a data-driven analysis of usage experience based on logs gathered during a 27-month period at four exhibitions in archeological museums for a total of more than 75K exploration sessions. We focus on discerning the main visitor behaviors during 3D exploration by employing tools for deriving interest measures on surfaces and tools for clustering and knowledge discovery from high-dimensional data. The results highlight the main trends in visitor behavior during the interactive sessions. These results provide useful insights for the design of 3D exploration user interfaces in future digital installations.© 2017 ACM 1556-4673/2017/12-ART2 $15.00.
  • Contextual Multi-Scale Region Convolutional 3D Network for Activity Detection

    Bai, Yancheng; Xu, Huijuan; Saenko, Kate; Ghanem, Bernard (arXiv, 2018-01-28)
    Activity detection is a fundamental problem in computer vision. Detecting activities of different temporal scales is particularly challenging. In this paper, we propose the contextual multi-scale region convolutional 3D network (CMS-RC3D) for activity detection. To deal with the inherent temporal scale variability of activity instances, the temporal feature pyramid is used to represent activities of different temporal scales. On each level of the temporal feature pyramid, an activity proposal detector and an activity classifier are learned to detect activities of specific temporal scales. Temporal contextual information is fused into activity classifiers for better recognition. More importantly, the entire model at all levels can be trained end-to-end. Our CMS-RC3D detector can deal with activities at all temporal scale ranges with only a single pass through the backbone network. We test our detector on two public activity detection benchmarks, THUMOS14 and ActivityNet. Extensive experiments show that the proposed CMS-RC3D detector outperforms state-of-the-art methods on THUMOS14 by a substantial margin and achieves comparable results on ActivityNet despite using a shallow feature extractor.
  • Structure-aware Local Sparse Coding for Visual Tracking

    Qi, Yuankai; Qin, Lei; Zhang, Jian; Zhang, Shengping; Huang, Qingming; Yang, Ming-Hsuan (Institute of Electrical and Electronics Engineers (IEEE), 2018-01-24)
    Sparse coding has been applied to visual tracking and related vision problems with demonstrated success in recent years. Existing tracking methods based on local sparse coding sample patches from a target candidate and sparsely encode these using a dictionary consisting of patches sampled from target template images. The discriminative strength of existing methods based on local sparse coding is limited as spatial structure constraints among the template patches are not exploited. To address this problem, we propose a structure-aware local sparse coding algorithm which encodes a target candidate using templates with both global and local sparsity constraints. For robust tracking, we show local regions of a candidate region should be encoded only with the corresponding local regions of the target templates that are the most similar from the global view. Thus, a more precise and discriminative sparse representation is obtained to account for appearance changes. To alleviate the issues with tracking drifts, we design an effective template update scheme. Extensive experiments on challenging image sequences demonstrate the effectiveness of the proposed algorithm against numerous stateof- the-art methods.
  • Metamorphers

    Sorger, Johannes; Mindek, Peter; Rautek, Peter; Gröller, Eduard; Johnson, Graham; Viola, Ivan (ACM Press, 2018-01-18)
    In molecular biology, illustrative animations are used to convey complex biological phenomena to broad audiences. However, such animations have to be manually authored in 3D modeling software, a time consuming task that has to be repeated from scratch for every new data set, and requires a high level of expertise in illustration, animation, and biology. We therefore propose metamorphers: a set of operations for defining animation states as well as the transitions to them in the form of re-usable storytelling templates. The re-usability is two-fold. Firstly, due to their modular nature, metamorphers can be re-used in different combinations to create a wide range of animations. Secondly, due to their abstract nature, metamorphers can be re-used to re-create an intended animation for a wide range of compatible data sets. Metamorphers thereby mask the low-level complexity of explicit animation specifications by exploiting the inherent properties of the molecular data, such as the position, size, and hierarchy level of a semantic data subset. We demonstrate the re-usability of our technique based on the authoring and application of two animation use-cases to three molecular data sets.

View more