• Second order online multitask learning

      Yang, P.; Zhao, P.; Zhou, J.; Gao, Xin (2019-01)
    • Linear kernel tests via emperical likelihood for high dimensional data

      Ding, L.; Liu, Z.; Li, Y.; Liao, S.; Liu, Y.; Yang, P.; Yu, G.; Shao, L.; Gao, Xin (2019-01)
    • Approximate kernel selection with strong approximate consistency

      Ding, L.; Liao, S.; Liu, Y.; Yang, P.; Li, Y.; Pan, Y.; Huang, C.; Shao, L.; Gao, Xin (2019-01)
    • Randomized kernel selection with spectra of multilevel circulant matrices

      Ding, Lizhong; Liao, Shizhong; Liu, Yong; Yang, Peng; Gao, Xin (2018-02)
      Kernel selection aims at choosing an appropriate kernel function for kernel-based learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of the main problems faced by kernel selection is the evaluation of the goodness of a kernel, which is typically difficult and computationally expensive. In this paper, we propose a randomized kernel selection approach to evaluate and select the kernel with the spectra of the specifically designed multilevel circulant matrices (MCMs), which is statistically sound and computationally efficient. Instead of constructing the kernel matrix, we construct the randomized MCM to encode the kernel function and all data points together with labels. We build a one-to-one correspondence between all candidate kernel functions and the spectra of the randomized MCMs by Fourier transform. We prove the statistical properties of the randomized MCMs and the randomized kernel selection criteria, which theoretically qualify the utility of the randomized criteria in kernel selection. With the spectra of the randomized MCMs, we derive a series of randomized criteria to conduct kernel selection, which can be computed in log-linear time and linear space complexity by fast Fourier transform (FFT). Experimental results demonstrate that our randomized kernel selection criteria are significantly more efficient than the existing classic and widely-used criteria while preserving similar predictive performance.
    • A Fast Stochastic Riemannian Eigensolver

      Xu, Zhiqiang; Ke,Yiping; Gao, Xin (2017-08)
      We propose a fast stochastic Riemannian gradient eigensolver for a real and symmetric matrix, and prove its local, eigengap-dependent and linear convergence. The fast convergence is brought by deploying the variance reduction technique which was originally developed for the Euclidean strongly convex problems. In this paper, this technique is generalized to Riemannian manifolds for solving the geodesically non-convex problem of finding a group of top eigenvectors of such a matrix. We first propose the general variance reduction form of the stochastic Riemannian gradient, giving rise to the stochastic variance reduced Riemannian gradient method (SVRRG). It turns out that the operation of vector transport is necessary in addition to using Riemannian gradients and retraction operations. We then specialize it to the problem in question resulting in our SVRRG-EIGS algorithm. We are among the first to propose and analyze the generalization of the stochastic variance reduced gradient (SVRG) to Riemannian manifolds. As an extension of the linearly convergent VR-PCA, it is significant and nontrivial for the proposed algorithm to theoretically achieve a further speedup and empirically make a difference, due to our respect to the inherent geometry of the problem.
    • FeaBoost: Joint feature and label refinement for semantic segmentation

      Niu, Yulei; Lu, Zhiwu; Huang, Songfang; Gao, Xin; Wen, Ji-Rong (2017-02)
      We propose a novel approach, called FeaBoost, to image semantic segmentation with only image-level labels taken as weakly-supervised constraints. Our approach is motivated from two evidences: 1) each superpixel can be represented as a linear combination of basic components (e.g., predefined classes); 2) visually similar superpixels have high probability to share the same set of labels, i.e., they tend to have common combination of predefined classes. By taking these two evidences into consideration, semantic segmentation is formulated as joint feature and label refinement over superpixels. Furthermore, we develop an efficient FeaBoost algorithm to solve such optimization problem. Extensive experiments on the MSRC and LabelMe datasets demonstrate the superior performance of our FeaBoost approach in comparison with the state-of-the-art methods, especially when noisy labels are provided for semantic segmentation.
    • On Truly Block Eigensolvers via Riemannian Optimization

      Xu, Zhiqiang; Gao, Xin (2018-04)
      We study theoretical properties of block solvers for the eigenvalue problem. Despite a recent surge of interest in such eigensolver analysis, truly block solvers have received relatively less attention, in contrast to the majority of studies concentrating on vector versions and non-truly block versions that rely on the deflation strategy. In fact, truly block solvers are more widely deployed in practice by virtue of its simplicity without compromise on accuracy. However, the corresponding theoretical analysis remains inadequate for first-order solvers, as only local and k-th gap-dependent rates of convergence have been established thus far. This paper is devoted to revealing significantly better or as-yet-unknown theoretical properties of such solvers. We present a novel convergence analysis in a unified framework for three types of first-order Riemannian solvers, i.e., deterministic, vanilla stochastic, and stochastic with variance reduction, that are to find top-k eigenvectors of a real symmetric matrix, in full generality. In particular, the issue of zero gaps between eigenvalues, to the best of our knowledge for the first time, is explicitly considered for these solvers, which brings new understandings, e.g., the dependence of convergence on gaps other than the k-th one. We thus propose the concept of generalized k-th gap. Three types of solvers are proved to converge to a globally optimal solution at a global, generalized k-th gap-dependent, and linear or sub-linear rate.
    • Optimizing multivariate performance measures from multi-view data

      Wang, Jim Jing-Yan; Tsang, Ivor Wai-Hung; Gao, Xin (2016-03)
      To date, many machine learning applications have multiple views of features, and different applications require specific multivariate performance measures, such as the F-score for retrieval. However, existing multivariate performance measure optimization methods are limited to single-view data, while traditional multi-view learning methods cannot optimize multivariate performance measures directly. To fill this gap, in this paper, we propose the problem of optimizing multivariate performance measures from multi-view data, and an effective method to solve it. We propose to learn linear discriminant functions for different views, and combine them to construct an overall multivariate mapping function for multi-view data. To learn the parameters of the linear discriminant functions of different views to optimize a given multivariate performance measure, we formulate an optimization problem. In this problem, we propose to minimize the complexity of the linear discriminant function of each view, promote the consistency of the responses of different views over the same data points, and minimize the upper boundary of the corresponding loss of a given multivariate performance measure. To optimize this problem, we develop an iterative cutting-plane algorithm. Experiments on four benchmark data sets show that it not only outperforms traditional single-view based multivariate performance optimization methods, but also achieves better results than ordinary multi-view learning methods.
    • Social image parsing by cross-modal data refinement

      Lu, Zhiwu; Gao, Xin; Huang, Songfang; Wang, Liwei; Wen, Ji-Rong (2015-06)
      This paper presents a cross-modal data refinement algorithm for social image parsing, or segmenting all the objects within a social image and then identifying their categories. Different from the traditional fully supervised image parsing that takes pixel-level labels as strong supervisory information, our social image parsing is initially provided with the noisy tags of images (i.e. image-level labels), which are shared by social users. By over-segmenting each image into multiple regions, we formulate social image parsing as a cross-modal data refinement problem over a large set of regions, where the initial labels of each region are inferred from image-level labels. Furthermore, we develop an efficient algorithm to solve such cross-modal data refinement problem. The experimental results on several benchmark datasets show the effectiveness of our algorithm. More notably, our algorithm can be considered to provide an alternative and natural way to address the challenging problem of image parsing, since image-level labels are much easier to access than pixel-level labels.
    • Efficient active learning of halfspaces via query synthesis

      Alabdulmohsin, Ibrahim; Gao, Xin; Zhang, Xiangliang (2015-02)
      Active learning is a subfield of machine learning that has been successfully used in many applications including text classification and bioinformatics. One of the fundamental branches of active learning is query synthesis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the true decision boundary. Nevertheless, the existing literature on membership query synthesis has focused on finite concept classes with a limited extension to real-world applications. In this paper, we present an efficient spectral algorithm for membership query synthesis for halfspaces, whose sample complexity is experimentally shown to be near-optimal. At each iteration, the algorithm consists of two steps. First, a convex optimization problem is solved that provides an approximate characterization of the version space. Second, a principal component is extracted, which yields a synthetic query that shrinks the version space exponentially fast. Unlike traditional methods in active learning, the proposed method can be readily extended into the batch setting by solving for the top κ eigenvectors in the second step. Experimentally, it exhibits a significant improvement over traditional approaches such as uncertainty sampling and representative sampling. For example, to learn a halfspace in the Euclidean plane with 25 dimensions and an estimation error of 1E-4, the proposed algorithm uses less than 3% of the number of queries required by uncertainty sampling.
    • Adaptive graph regularized nonnegative matrix factorization via feature selection

      Wang, Jing-Yan; Almasri, Islam; Gao, Xin (2012-11)
      Nonnegative Matrix Factorization (NMF), a popular compact data representation method, fails to discover the intrinsic geometrical structure of the data space. Graph regularized NMF (GrNMF) is proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data feature space. However using the original feature space directly is not appropriate because of the noisy and irrelevant features. In this paper, we propose a novel data representation algorithm by integrating feature selection and graph regularization for NMF. Instead of using a fixed graph as GrNMF, we regularize NMF with an adaptive graph constructed according to the feature selection results. A uniform object is built to consider feature selection, NMF and adaptive graph regularization jointly, and a novel algorithm is developed to update the graph, feature weights and factorization parameters iteratively. Data clustering experiment shows the efficacy of the proposed method on the Yale database.
    • Support vector machines with indefinite kernels

      Alabdulmohsin, Ibrahim; Gao, Xin; Zhang, Xiangliang (2014-11)
      Training support vector machines (SVM) with indefinite kernels has recently attracted attention in the machine learning community. This is partly due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite, i.e. the Mercer condition is not satisfied, or the Mercer condition is difficult to verify. Previous work on training SVM with indefinite kernels has generally fallen into three categories: (1) positive semidefinite kernel approximation, (2) non-convex optimization, and (3) learning in Krein spaces. All approaches are not fully satisfactory. They have either introduced sources of inconsistency in handling training and test examples using kernel approximation, settled for approximate local minimum solutions using non-convex optimization, or produced non-sparse solutions. In this paper, we establish both theoretically and experimentally that the 1-norm SVM, proposed more than 10 years ago for embedded feature selection, is a better solution for extending SVM to indefinite kernels. More specifically, 1-norm SVM can be interpreted as a structural risk minimization method that seeks a decision boundary with large similarity margin in the original space. It uses a linear programming formulation that remains convex even if the kernel matrix is indefinite, and hence can always be solved quite efficiently. Also, it uses the indefinite similarity function (or distance) directly without any transformation, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy among all methods that train SVM with indefinite kernels with a statistically significant evidence while also retaining sparsity of the support vector set.
    • Noise-Robust Semi-Supervised Learning by Large-Scale Sparse Coding

      Lu, Zhiwu; Gao, Xin; Wang, Liwei; Wen, Ji-Rong; Huang, Songfang (2015-02)
      This paper presents a large-scale sparse coding algorithm to deal with the challenging problem of noise-robust semi-supervised learning over very large data with only few noisy initial labels. By giving an L1-norm formulation of Laplacian regularization directly based upon the manifold structure of the data, we transform noise-robust semi-supervised learning into a generalized sparse coding problem so that noise reduction can be imposed upon the noisy initial labels. Furthermore, to keep the scalability of noise-robust semi-supervised learning over very large data, we make use of both nonlinear approximation and dimension reduction techniques to solve this generalized sparse coding problem in linear time and space complexity. Finally, we evaluate the proposed algorithm in the challenging task of large-scale semi-supervised image classification with only few noisy initial labels. The experimental results on several benchmark image datasets show the promising performance of the proposed algorithm.
    • Partially Labeled Data Tuple Can Optimize Multivariate Performance Measures

      Wang, Jim Jing-Yan; Gao, Xin (ACM Press, 2016-12-01)
    • Robust Cost-Sensitive Learning for Recommendation with Implicit Feedback

      Yang, Peng; Zhao, Peilin; Liu, Yong; Gao, Xin (Society for Industrial and Applied Mathematics, 2018-05-07)
      This paper aims at improvement on the effectiveness of matrix decomposition (MD) methods for implicit feedback. We highlight two critical limitations of existing works. First, due to the large number of unlabeled feedback, most existing works employ a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption may rarely hold in real-world scenarios. Second, the commonly-used bilateral loss function might be infinite if the data point is mis-classified. Outliers may have such issues and misguide the learning process. We address the above two issues by learning a robust asymmetric learning model. By leveraging the cost-sensitive learning and capped unilateral loss function, our robust MD objective function integrates them into a joint formulation, where the low-rank basis for user/item profiles can be modeled in an effective and robust way. Particularly, a novel log-determinant function is employed to refine the nuclear norm with respect to the low-rank approximation. We derive an iterative re-weighted algorithm to efficiently minimize this MD objective, and also rigorously prove a lower error bound of the proposed algorithm compared to the 1-bit matrix completion method. Finally, we show the promising experimental results of our algorithm on benchmark recommendation datasets.
    • Receiver-based Bayesian PAPR reduction in OFDM

      Al-Rabah, Abdullatif R.; Masood, Mudassir; Ali, Anum; Al-Naffouri, Tareq Y. (IEEE, 2013-09)
      One of the main drawbacks of OFDM systems is the high peak-to-average-power ratio (PAPR). Most of the PAPR reduction techniques require transmitter-based processing. However, we propose a receiver-based low-complexity clipping signal recovery method. This method is able to i) reduce PAPR via a simple clipping scheme, ii) use a Bayesian recovery algorithm to reconstruct the distortion signal with high accuracy, and iii) is energy efficient due to low complexity. The proposed method is robust against variation in noise and signal statistics. The method is enhanced by making use of all prior information such as, the locations and the phase of the non-zero elements of the clipping signal. Simulation results demonstrate the superiority of using the proposed algorithm over other recovery algorithms.
    • Non-Gaussian prior Fast Bayesian Matching Pursuit

      Masood, Mudassir; Al-Naffouri, Tareq Y. (2012-12)
      A fast matching pursuit method (nGpFBMP) is introduced which performs Bayesian estimates of sparse signals even when the signal prior is non-Gaussian/unknown. It is agnostic on signal statistics and utilizes a greedy approach and order-recursive updates to determine the approximate MMSE estimate of the sparse signal. Simulation results demonstrate the power and robustness of the method.