## Search

Now showing items 1-6 of 6

JavaScript is disabled for your browser. Some features of this site may not work without it.

Author

Kovalev, Dmitry (6)

Richtarik, Peter (6)Mishchenko, Konstantin (3)Horvath, Samuel (2)Gower, Robert M. (1)View MoreDepartment
Computer Science Program (6)

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division (6)Computer Science (2)King Abdullah University of Science and Technology, Thuwal, Saudi Arabia (2)Statistics (2)View MorePublisherarXiv (6)TypePreprint (6)Year (Issue Date)2019 (6)Item AvailabilityOpen Access (6)

Now showing items 1-6 of 6

- List view
- Grid view
- Sort Options:
- Relevance
- Title Asc
- Title Desc
- Issue Date Asc
- Issue Date Desc
- Submit Date Asc
- Submit Date Desc
- Results Per Page:
- 5
- 10
- 20
- 40
- 60
- 80
- 100

Stochastic Proximal Langevin Algorithm: Potential Splitting and Nonasymptotic Rates

Salim, Adil; Kovalev, Dmitry; Richtarik, Peter (arXiv, 2019-05-28) [Preprint]

We propose a new algorithm---Stochastic Proximal Langevin Algorithm (SPLA)---for sampling from a log concave distribution. Our method is a generalization of the Langevin algorithm to potentials expressed as the sum of one stochastic smooth term and multiple stochastic nonsmooth terms. In each iteration, our splitting technique only requires access to a stochastic gradient of the smooth term and a stochastic proximal operator for each of the nonsmooth terms. We establish nonasymptotic sublinear and linear convergence rates under convexity and strong convexity of the smooth term, respectively, expressed in terms of the KL divergence and Wasserstein distance. We illustrate the efficiency of our sampling technique through numerical simulations on a Bayesian learning task.

Revisiting Stochastic Extragradient

Mishchenko, Konstantin; Kovalev, Dmitry; Shulgin, Egor; Richtarik, Peter; Malitsky, Yura (arXiv, 2019-05-27) [Preprint]

We consider a new extension of the extragradient method that is motivated by approximating implicit updates. Since in a recent work~\cite{chavdarova2019reducing} it was shown that the existing stochastic extragradient algorithm (called mirror-prox) of~\cite{juditsky2011solving} diverges on a simple bilinear problem, we prove guarantees for solving variational inequality that are more general than in~\cite{juditsky2011solving}. Furthermore, we illustrate numerically that the proposed variant converges faster than many other methods on the example of~\cite{chavdarova2019reducing}. We also discuss how extragradient can be applied to training Generative Adversarial Networks (GANs). Our experiments on GANs demonstrate that the introduced approach may make the training faster in terms of data passes, while its higher iteration complexity makes the advantage smaller. To further accelerate method's convergence on problems such as bilinear minimax, we combine the extragradient step with negative momentum~\cite{gidel2018negative} and discuss the optimal momentum value.

Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop

Kovalev, Dmitry; Horvath, Samuel; Richtarik, Peter (arXiv, 2019-01-24) [Preprint]

The stochastic variance-reduced gradient method (SVRG) and its acceleratedvariant (Katyusha) have attracted enormous attention in the machine learningcommunity in the last few years due to their superior theoretical propertiesand empirical behaviour on training supervised machine learning models via theempirical risk minimization paradigm. A key structural element in both of thesemethods is the inclusion of an outer loop at the beginning of which a full passover the training data is made in order to compute the exact gradient, which isthen used to construct a variance-reduced estimator of the gradient. In thiswork we design {\em loopless variants} of both of these methods. In particular,we remove the outer loop and replace its function by a coin flip performed ineach iteration designed to trigger, with a small probability, the computationof the gradient. We prove that the new methods enjoy the same superiortheoretical convergence properties as the original methods. However, wedemonstrate through numerical experiments that our methods have substantiallysuperior practical behavior.

Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

Horvath, Samuel; Kovalev, Dmitry; Mishchenko, Konstantin; Stich, Sebastian; Richtarik, Peter (arXiv, 2019-04-10) [Preprint]

We consider distributed optimization where the objective function is spreadamong different devices, each sending incremental model updates to a centralserver. To alleviate the communication bottleneck, recent work proposed variousschemes to compress (e.g.\ quantize or sparsify) the gradients, therebyintroducing additional variance $\omega \geq 1$ that might slow downconvergence. For strongly convex functions with condition number $\kappa$distributed among $n$ machines, we (i) give a scheme that converges in$\mathcal{O}((\kappa + \kappa \frac{\omega}{n} + \omega)$ $\log (1/\epsilon))$steps to a neighborhood of the optimal solution. For objective functions with afinite-sum structure, each worker having less than $m$ components, we (ii)present novel variance reduced schemes that converge in $\mathcal{O}((\kappa +\kappa \frac{\omega}{n} + \omega + m)\log(1/\epsilon))$ steps to arbitraryaccuracy $\epsilon > 0$. These are the first methods that achieve linearconvergence for arbitrary quantized updates. We also (iii) give analysis forthe weakly convex and non-convex cases and (iv) verify in experiments that ournovel variance reduced schemes are more efficient than the baselines.

RSN: Randomized Subspace Newton

Gower, Robert M.; Kovalev, Dmitry; Lieder, Felix; Richtarik, Peter (arXiv, 2019-05-26) [Preprint]

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and randomized extension of the results of Karimireddy, Stich, and Jaggi (2019).

Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

Kovalev, Dmitry; Mishchenko, Konstantin; Richtarik, Peter (arXiv, 2019-12-03) [Preprint]

We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions. The first is a stochastic variant of Newton's method (SN), and the second is a stochastic variant of cubically regularized Newton's method (SCN). We establish local linear-quadratic convergence results. Unlike existing stochastic variants of second order methods, which require the evaluation of a large number of gradients and/or Hessians in each iteration to guarantee convergence, our methods do not have this shortcoming. For instance, the simplest variants of our methods in each iteration need to compute the gradient and Hessian of a {\em single} randomly selected function only. In contrast to most existing stochastic Newton and quasi-Newton methods, our approach guarantees local convergence faster than with first-order oracle and adapts to the problem's curvature. Interestingly, our method is not unbiased, so our theory provides new intuition for designing new stochastic methods.

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.