## Search

Now showing items 1-4 of 4

JavaScript is disabled for your browser. Some features of this site may not work without it.

Author

Hanzely, Filip (4)

Richtarik, Peter (4)Mishchenko, Konstantin (1)DepartmentApplied Mathematics and Computational Science Program (4)
Computer Science Program (4)

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division (4)Applied Mathematics and Computational Science (3)Computer Science (3)PublisherarXiv (4)TypePreprint (4)Year (Issue Date)
2018 (4)

Item AvailabilityOpen Access (4)

Now showing items 1-4 of 4

- List view
- Grid view
- Sort Options:
- Relevance
- Title Asc
- Title Desc
- Issue Date Asc
- Issue Date Desc
- Submit Date Asc
- Submit Date Desc
- Results Per Page:
- 5
- 10
- 20
- 40
- 60
- 80
- 100

SEGA: Variance Reduction via Gradient Sketching

Hanzely, Filip; Mishchenko, Konstantin; Richtarik, Peter (arXiv, 2018-09-09) [Preprint]

We propose a randomized first order optimization method--SEGA (SkEtchedGrAdient method)-- which progressively throughout its iterations builds avariance-reduced estimate of the gradient from random linear measurements(sketches) of the gradient obtained from an oracle. In each iteration, SEGAupdates the current estimate of the gradient through a sketch-and-projectoperation using the information provided by the latest sketch, and this issubsequently used to compute an unbiased estimate of the true gradient througha random relaxation procedure. This unbiased estimate is then used to perform agradient step. Unlike standard subspace descent methods, such as coordinatedescent, SEGA can be used for optimization problems with a non-separableproximal term. We provide a general convergence analysis and prove linearconvergence for strongly convex objectives. In the special case of coordinatesketches, SEGA can be enhanced with various techniques such as importancesampling, minibatching and acceleration, and its rate is up to a small constantfactor identical to the best-known rate of coordinate descent.

Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization

Hanzely, Filip; Richtarik, Peter; Xiao, Lin (arXiv, 2018-08-09) [Preprint]

We consider the problem of minimizing the sum of two convex functions: one isdifferentiable and relatively smooth with respect to a reference convexfunction, and the other can be nondifferentiable but simple to optimize. Therelatively smooth condition is much weaker than the standard assumption ofuniform Lipschitz continuity of the gradients, thus significantly increases thescope of potential applications. We present accelerated Bregman proximalgradient (ABPG) methods that employ the Bregman distance of the referencefunction as the proximity measure. These methods attain an $O(k^{-\gamma})$convergence rate in the relatively smooth setting, where $\gamma\in [1, 2]$ isdetermined by a triangle scaling property of the Bregman distance. We developadaptive variants of the ABPG method that automatically ensure the bestpossible rate of convergence and argue that the $O(k^{-2})$ rate is attainablein most cases. We present numerical experiments with three applications:D-optimal experiment design, Poisson linear inverse problem, andrelative-entropy nonnegative regression. In all experiments, we obtainnumerical certificates showing that these methods do converge with the$O(k^{-2})$ rate.

Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches

Hanzely, Filip; Richtarik, Peter (arXiv, 2018-09-25) [Preprint]

Accelerated coordinate descent is a widely popular optimization algorithm dueto its efficiency on large-dimensional problems. It achieves state-of-the-artcomplexity on an important class of empirical risk minimization problems. Inthis paper we design and analyze an accelerated coordinate descent (ACD) methodwhich in each iteration updates a random subset of coordinates according to anarbitrary but fixed probability law, which is a parameter of the method. If allcoordinates are updated in each iteration, our method reduces to the classicalaccelerated gradient descent method AGD of Nesterov. If a single coordinate isupdated in each iteration, and we pick probabilities proportional to the squareroots of the coordinate-wise Lipschitz constants, our method reduces to thecurrently fastest coordinate descent method NUACDM of Allen-Zhu, Qu,Richt\'{a}rik and Yuan. While mini-batch variants of ACD are more popular and relevant in practice,there is no importance sampling for ACD that outperforms the standard uniformmini-batch sampling. Through insights enabled by our general analysis, wedesign new importance sampling for mini-batch ACD which significantlyoutperforms previous state-of-the-art minibatch ACD in practice. We prove arate that is at most ${\cal O}(\sqrt{\tau})$ times worse than the rate ofminibatch ACD with uniform sampling, but can be ${\cal O}(n/\tau)$ timesbetter, where $\tau$ is the minibatch size. Since in modern supervised learningtraining systems it is standard practice to choose $\tau \ll n$, and often$\tau={\cal O}(1)$, our method can lead to dramatic speedups. Lastly, we obtainsimilar results for minibatch nonaccelerated CD as well, achieving improvementson previous best rates.

Fastest Rates for Stochastic Mirror Descent Methods

Hanzely, Filip; Richtarik, Peter (arXiv, 2018-03-20) [Preprint]

Relative smoothness - a notion introduced by Birnbaum et al. (2011) and rediscovered by Bauschke et al. (2016) and Lu et al. (2016) - generalizes the standard notion of smoothness typically used in the analysis of gradient type methods. In this work we are taking ideas from well studied field of stochastic convex optimization and using them in order to obtain faster algorithms for minimizing relatively smooth functions. We propose and analyze two new algorithms: Relative Randomized Coordinate Descent (relRCD) and Relative Stochastic Gradient Descent (relSGD), both generalizing famous algorithms in the standard smooth setting. The methods we propose can be in fact seen as a particular instances of stochastic mirror descent algorithms. One of them, relRCD corresponds to the first stochastic variant of mirror descent algorithm with linear convergence rate.

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.