Peng, Yifan; Dun, Xiong; Sun, Qilin; Heidrich, Wolfgang(ACM Transactions on Graphics, Association for Computing Machinery (ACM), 2017-11-22)[Article]
Computational caustics and light steering displays offer a wide range of interesting applications, ranging from art works and architectural installations to energy efficient HDR projection. In this work we expand on this concept by encoding several target images into pairs of front and rear phase-distorting surfaces. Different target holograms can be decoded by mixing and matching different front and rear surfaces under specific geometric alignments. Our approach, which we call mix-and-match holography, is made possible by moving from a refractive caustic image formation process to a diffractive, holographic one. This provides the extra bandwidth that is required to multiplex several images into pairing surfaces.
Affara, Lama Ahmed; Ghanem, Bernard; Wonka, Peter(arXiv, 2017-09-27)[Preprint]
Convolutional sparse coding (CSC) is an important building block of many computer vision applications ranging from image and video compression to deep learning. We present two contributions to the state of the art in CSC. First, we significantly speed up the computation by proposing a new optimization framework that tackles the problem in the dual domain. Second, we extend the original formulation to higher dimensions in order to process a wider range of inputs, such as color inputs, or HOG features. Our results show a significant speedup compared to the current state of the art in CSC.
Visual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.
Traditional approaches for action detection use trimmed data to learn sophisticated action detector models. Although these methods have achieved great success at detecting human actions, we argue that huge information is discarded when ignoring the process, through which this trimmed data is obtained. In this paper, we propose Action Search, a novel approach that mimics the way people annotate activities in video sequences. Using a Recurrent Neural Network, Action Search can efficiently explore a video and determine the time boundaries during which an action occurs. Experiments on the THUMOS14 dataset reveal that our model is not only able to explore the video efficiently but also accurately find human activities, outperforming state-of-the-art methods.
Traditional methods for image compressive sensing (CS) reconstruction solve a well-defined inverse problem that is based on a predefined CS model, which defines the underlying structure of the problem and is generally solved by employing convergent iterative solvers. These optimization-based CS methods face the challenge of choosing optimal transforms and tuning parameters in their solvers, while also suffering from high computational complexity in most cases. Recently, some deep network based CS algorithms have been proposed to improve CS reconstruction performance, while dramatically reducing time complexity as compared to optimization-based methods. Despite their impressive results, the proposed networks (either with fully-connected or repetitive convolutional layers) lack any structural diversity and they are trained as a black box, void of any insights from the CS domain. In this paper, we combine the merits of both types of CS methods: the structure insights of optimization-based method and the performance/speed of network-based ones. We propose a novel structured deep network, dubbed ISTA-Net, which is inspired by the Iterative Shrinkage-Thresholding Algorithm (ISTA) for optimizing a general $l_1$ norm CS reconstruction model. ISTA-Net essentially implements a truncated form of ISTA, where all ISTA-Net parameters are learned end-to-end to minimize a reconstruction error in training. Borrowing more insights from the optimization realm, we propose an accelerated version of ISTA-Net, dubbed FISTA-Net, which is inspired by the fast iterative shrinkage-thresholding algorithm (FISTA). Interestingly, this acceleration naturally leads to skip connections in the underlying network design. Extensive CS experiments demonstrate that the proposed ISTA-Net and FISTA-Net outperform existing optimization-based and network-based CS methods by large margins, while maintaining a fast runtime.
Kernel Correlation Filters have shown a very promising scheme for visual tracking in terms of speed and accuracy on several benchmarks. However it suffers from problems that affect its performance like occlusion, rotation and scale change. This paper tries to tackle the problem of rotation by reformulating the optimization problem for learning the correlation filter. This modification (RKCF) includes learning rotation filter that utilizes circulant structure of HOG feature to guesstimate rotation from one frame to another and enhance the detection of KCF. Hence it gains boost in overall accuracy in many of OBT50 detest videos with minimal additional computation.
Face detection is a fundamental problem in computer vision. It is still a challenging task in unconstrained conditions due to significant variations in scale, pose, expressions, and occlusion. In this paper, we propose a multi-branch fully convolutional network (MB-FCN) for face detection, which considers both efficiency and effectiveness in the design process. Our MB-FCN detector can deal with faces at all scale ranges with only a single pass through the backbone network. As such, our MB-FCN model saves computation and thus is more efficient, compared to previous methods that make multiple passes. For each branch, the specific skip connections of the convolutional feature maps at different layers are exploited to represent faces in specific scale ranges. Specifically, small faces can be represented with both shallow fine-grained and deep powerful coarse features. With this representation, superior improvement in performance is registered for the task of detecting small faces. We test our MB-FCN detector on two public face detection benchmarks, including FDDB and WIDER FACE. Extensive experiments show that our detector outperforms state-of-the-art methods on all these datasets in general and by a substantial margin on the most challenging among them (e.g. WIDER FACE Hard subset). Also, MB-FCN runs at 15 FPS on a GPU for images of size 640 x 480 with no assumption on the minimum detectable face size.
Taking an image and question as the input of our method, it can output the text-based answer of the query question about the given image, so called Visual Question Answering (VQA). There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the basic questions of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question. We formulate the basic questions generation problem as a LASSO optimization problem, and also propose a criterion about how to exploit these basic questions to help answer main question. Our method is evaluated on the challenging VQA dataset and yields state-of-the-art accuracy, 60.34% in open-ended task.
Khan, Naeemullah; Hong, Byung-Woo; Yezzi, Anthony; Sundaramoorthi, Ganesh(2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2017-11-09)[Conference Paper]
We formulate an energy for segmentation that is designed to have preference for segmenting the coarse over fine structure of the image, without smoothing across boundaries of regions. The energy is formulated by integrating a continuum of scales from a scale space computed from the heat equation within regions. We show that the energy can be optimized without computing a continuum of scales, but instead from a single scale. This makes the method computationally efficient in comparison to energies using a discrete set of scales. We apply our method to texture and motion segmentation. Experiments on benchmark datasets show that a continuum of scales leads to better segmentation accuracy over discrete scales and other competing methods.
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.