Browsing Research by Submit Date
Now showing items 120 of 15367

Second order online multitask learning(201901)

Randomized kernel selection with spectra of multilevel circulant matrices(201802)Kernel selection aims at choosing an appropriate kernel function for kernelbased learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of the main problems faced by kernel selection is the evaluation of the goodness of a kernel, which is typically difficult and computationally expensive. In this paper, we propose a randomized kernel selection approach to evaluate and select the kernel with the spectra of the specifically designed multilevel circulant matrices (MCMs), which is statistically sound and computationally efficient. Instead of constructing the kernel matrix, we construct the randomized MCM to encode the kernel function and all data points together with labels. We build a onetoone correspondence between all candidate kernel functions and the spectra of the randomized MCMs by Fourier transform. We prove the statistical properties of the randomized MCMs and the randomized kernel selection criteria, which theoretically qualify the utility of the randomized criteria in kernel selection. With the spectra of the randomized MCMs, we derive a series of randomized criteria to conduct kernel selection, which can be computed in loglinear time and linear space complexity by fast Fourier transform (FFT). Experimental results demonstrate that our randomized kernel selection criteria are significantly more efficient than the existing classic and widelyused criteria while preserving similar predictive performance.

On Truly Block Eigensolvers via Riemannian Optimization(201804)We study theoretical properties of block solvers for the eigenvalue problem. Despite a recent surge of interest in such eigensolver analysis, truly block solvers have received relatively less attention, in contrast to the majority of studies concentrating on vector versions and nontruly block versions that rely on the deflation strategy. In fact, truly block solvers are more widely deployed in practice by virtue of its simplicity without compromise on accuracy. However, the corresponding theoretical analysis remains inadequate for firstorder solvers, as only local and kth gapdependent rates of convergence have been established thus far. This paper is devoted to revealing significantly better or asyetunknown theoretical properties of such solvers. We present a novel convergence analysis in a unified framework for three types of firstorder Riemannian solvers, i.e., deterministic, vanilla stochastic, and stochastic with variance reduction, that are to find topk eigenvectors of a real symmetric matrix, in full generality. In particular, the issue of zero gaps between eigenvalues, to the best of our knowledge for the first time, is explicitly considered for these solvers, which brings new understandings, e.g., the dependence of convergence on gaps other than the kth one. We thus propose the concept of generalized kth gap. Three types of solvers are proved to converge to a globally optimal solution at a global, generalized kth gapdependent, and linear or sublinear rate.

FeaBoost: Joint feature and label refinement for semantic segmentation(201702)We propose a novel approach, called FeaBoost, to image semantic segmentation with only imagelevel labels taken as weaklysupervised constraints. Our approach is motivated from two evidences: 1) each superpixel can be represented as a linear combination of basic components (e.g., predefined classes); 2) visually similar superpixels have high probability to share the same set of labels, i.e., they tend to have common combination of predefined classes. By taking these two evidences into consideration, semantic segmentation is formulated as joint feature and label refinement over superpixels. Furthermore, we develop an efficient FeaBoost algorithm to solve such optimization problem. Extensive experiments on the MSRC and LabelMe datasets demonstrate the superior performance of our FeaBoost approach in comparison with the stateoftheart methods, especially when noisy labels are provided for semantic segmentation.

A Fast Stochastic Riemannian Eigensolver(201708)We propose a fast stochastic Riemannian gradient eigensolver for a real and symmetric matrix, and prove its local, eigengapdependent and linear convergence. The fast convergence is brought by deploying the variance reduction technique which was originally developed for the Euclidean strongly convex problems. In this paper, this technique is generalized to Riemannian manifolds for solving the geodesically nonconvex problem of finding a group of top eigenvectors of such a matrix. We first propose the general variance reduction form of the stochastic Riemannian gradient, giving rise to the stochastic variance reduced Riemannian gradient method (SVRRG). It turns out that the operation of vector transport is necessary in addition to using Riemannian gradients and retraction operations. We then specialize it to the problem in question resulting in our SVRRGEIGS algorithm. We are among the first to propose and analyze the generalization of the stochastic variance reduced gradient (SVRG) to Riemannian manifolds. As an extension of the linearly convergent VRPCA, it is significant and nontrivial for the proposed algorithm to theoretically achieve a further speedup and empirically make a difference, due to our respect to the inherent geometry of the problem.

Efficient active learning of halfspaces via query synthesis(201502)Active learning is a subfield of machine learning that has been successfully used in many applications including text classification and bioinformatics. One of the fundamental branches of active learning is query synthesis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the true decision boundary. Nevertheless, the existing literature on membership query synthesis has focused on finite concept classes with a limited extension to realworld applications. In this paper, we present an efficient spectral algorithm for membership query synthesis for halfspaces, whose sample complexity is experimentally shown to be nearoptimal. At each iteration, the algorithm consists of two steps. First, a convex optimization problem is solved that provides an approximate characterization of the version space. Second, a principal component is extracted, which yields a synthetic query that shrinks the version space exponentially fast. Unlike traditional methods in active learning, the proposed method can be readily extended into the batch setting by solving for the top κ eigenvectors in the second step. Experimentally, it exhibits a significant improvement over traditional approaches such as uncertainty sampling and representative sampling. For example, to learn a halfspace in the Euclidean plane with 25 dimensions and an estimation error of 1E4, the proposed algorithm uses less than 3% of the number of queries required by uncertainty sampling.

Optimizing multivariate performance measures from multiview data(201603)To date, many machine learning applications have multiple views of features, and different applications require specific multivariate performance measures, such as the Fscore for retrieval. However, existing multivariate performance measure optimization methods are limited to singleview data, while traditional multiview learning methods cannot optimize multivariate performance measures directly. To fill this gap, in this paper, we propose the problem of optimizing multivariate performance measures from multiview data, and an effective method to solve it. We propose to learn linear discriminant functions for different views, and combine them to construct an overall multivariate mapping function for multiview data. To learn the parameters of the linear discriminant functions of different views to optimize a given multivariate performance measure, we formulate an optimization problem. In this problem, we propose to minimize the complexity of the linear discriminant function of each view, promote the consistency of the responses of different views over the same data points, and minimize the upper boundary of the corresponding loss of a given multivariate performance measure. To optimize this problem, we develop an iterative cuttingplane algorithm. Experiments on four benchmark data sets show that it not only outperforms traditional singleview based multivariate performance optimization methods, but also achieves better results than ordinary multiview learning methods.

Social image parsing by crossmodal data refinement(201506)This paper presents a crossmodal data refinement algorithm for social image parsing, or segmenting all the objects within a social image and then identifying their categories. Different from the traditional fully supervised image parsing that takes pixellevel labels as strong supervisory information, our social image parsing is initially provided with the noisy tags of images (i.e. imagelevel labels), which are shared by social users. By oversegmenting each image into multiple regions, we formulate social image parsing as a crossmodal data refinement problem over a large set of regions, where the initial labels of each region are inferred from imagelevel labels. Furthermore, we develop an efficient algorithm to solve such crossmodal data refinement problem. The experimental results on several benchmark datasets show the effectiveness of our algorithm. More notably, our algorithm can be considered to provide an alternative and natural way to address the challenging problem of image parsing, since imagelevel labels are much easier to access than pixellevel labels.

Support vector machines with indefinite kernels(201411)Training support vector machines (SVM) with indefinite kernels has recently attracted attention in the machine learning community. This is partly due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite, i.e. the Mercer condition is not satisfied, or the Mercer condition is difficult to verify. Previous work on training SVM with indefinite kernels has generally fallen into three categories: (1) positive semidefinite kernel approximation, (2) nonconvex optimization, and (3) learning in Krein spaces. All approaches are not fully satisfactory. They have either introduced sources of inconsistency in handling training and test examples using kernel approximation, settled for approximate local minimum solutions using nonconvex optimization, or produced nonsparse solutions. In this paper, we establish both theoretically and experimentally that the 1norm SVM, proposed more than 10 years ago for embedded feature selection, is a better solution for extending SVM to indefinite kernels. More specifically, 1norm SVM can be interpreted as a structural risk minimization method that seeks a decision boundary with large similarity margin in the original space. It uses a linear programming formulation that remains convex even if the kernel matrix is indefinite, and hence can always be solved quite efficiently. Also, it uses the indefinite similarity function (or distance) directly without any transformation, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy among all methods that train SVM with indefinite kernels with a statistically significant evidence while also retaining sparsity of the support vector set.

Adaptive graph regularized nonnegative matrix factorization via feature selection(201211)Nonnegative Matrix Factorization (NMF), a popular compact data representation method, fails to discover the intrinsic geometrical structure of the data space. Graph regularized NMF (GrNMF) is proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data feature space. However using the original feature space directly is not appropriate because of the noisy and irrelevant features. In this paper, we propose a novel data representation algorithm by integrating feature selection and graph regularization for NMF. Instead of using a fixed graph as GrNMF, we regularize NMF with an adaptive graph constructed according to the feature selection results. A uniform object is built to consider feature selection, NMF and adaptive graph regularization jointly, and a novel algorithm is developed to update the graph, feature weights and factorization parameters iteratively. Data clustering experiment shows the efficacy of the proposed method on the Yale database.

NoiseRobust SemiSupervised Learning by LargeScale Sparse Coding(201502)This paper presents a largescale sparse coding algorithm to deal with the challenging problem of noiserobust semisupervised learning over very large data with only few noisy initial labels. By giving an L1norm formulation of Laplacian regularization directly based upon the manifold structure of the data, we transform noiserobust semisupervised learning into a generalized sparse coding problem so that noise reduction can be imposed upon the noisy initial labels. Furthermore, to keep the scalability of noiserobust semisupervised learning over very large data, we make use of both nonlinear approximation and dimension reduction techniques to solve this generalized sparse coding problem in linear time and space complexity. Finally, we evaluate the proposed algorithm in the challenging task of largescale semisupervised image classification with only few noisy initial labels. The experimental results on several benchmark image datasets show the promising performance of the proposed algorithm.

Chapter 14: Automated Mining of DiseaseSpecific Protein Interaction Networks Based on Biomedical Literature(WORLD SCIENTIFIC, 20131217)

Partially Labeled Data Tuple Can Optimize Multivariate Performance Measures(ACM Press, 20161201)

Robust CostSensitive Learning for Recommendation with Implicit Feedback(Society for Industrial and Applied Mathematics, 20180507)This paper aims at improvement on the effectiveness of matrix decomposition (MD) methods for implicit feedback. We highlight two critical limitations of existing works. First, due to the large number of unlabeled feedback, most existing works employ a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption may rarely hold in realworld scenarios. Second, the commonlyused bilateral loss function might be infinite if the data point is misclassified. Outliers may have such issues and misguide the learning process. We address the above two issues by learning a robust asymmetric learning model. By leveraging the costsensitive learning and capped unilateral loss function, our robust MD objective function integrates them into a joint formulation, where the lowrank basis for user/item profiles can be modeled in an effective and robust way. Particularly, a novel logdeterminant function is employed to refine the nuclear norm with respect to the lowrank approximation. We derive an iterative reweighted algorithm to efficiently minimize this MD objective, and also rigorously prove a lower error bound of the proposed algorithm compared to the 1bit matrix completion method. Finally, we show the promising experimental results of our algorithm on benchmark recommendation datasets.

A scalable approach to parameter estimation for statistical thermodynamicbased models of gene regulation using structural information(The First Annual Winter qbio Meeting, 201302)

Receiverbased Bayesian PAPR reduction in OFDM(IEEE, 201309)One of the main drawbacks of OFDM systems is the high peaktoaveragepower ratio (PAPR). Most of the PAPR reduction techniques require transmitterbased processing. However, we propose a receiverbased lowcomplexity clipping signal recovery method. This method is able to i) reduce PAPR via a simple clipping scheme, ii) use a Bayesian recovery algorithm to reconstruct the distortion signal with high accuracy, and iii) is energy efficient due to low complexity. The proposed method is robust against variation in noise and signal statistics. The method is enhanced by making use of all prior information such as, the locations and the phase of the nonzero elements of the clipping signal. Simulation results demonstrate the superiority of using the proposed algorithm over other recovery algorithms.