## Search

Now showing items 1-5 of 5

JavaScript is disabled for your browser. Some features of this site may not work without it.

Author

Hanzely, Filip (5)

Richtarik, Peter (5)Mishchenko, Konstantin (2)Grishchenko, Dmitry (1)Konečný, Jakub (1)View MoreDepartment
Applied Mathematics and Computational Science (5)

Applied Mathematics and Computational Science Program (5)Computer Science (5)
Computer Science Program (5)

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division (5)Publisher
arXiv (5)

TypePreprint (5)Year (Issue Date)2019 (2)2018 (3)Item AvailabilityOpen Access (5)

Now showing items 1-5 of 5

- List view
- Grid view
- Sort Options:
- Relevance
- Title Asc
- Title Desc
- Issue Date Asc
- Issue Date Desc
- Submit Date Asc
- Submit Date Desc
- Results Per Page:
- 5
- 10
- 20
- 40
- 60
- 80
- 100

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

Mishchenko, Konstantin; Hanzely, Filip; Richtarik, Peter (arXiv, 2019-06-04) [Preprint]

It is well known that many optimization methods, including SGD, SAGA, andAccelerated SGD for over-parameterized models, do not scale linearly in theparallel setting. In this paper, we present a new version of block coordinatedescent that solves this issue for a number of methods. The core idea is tomake the sampling of coordinate blocks on each parallel unit independent of theothers. Surprisingly, we prove that the optimal number of blocks to be updatedby each of $n$ units in every iteration is equal to $m/n$, where $m$ is thetotal number of blocks. As an illustration, this means that when $n=100$parallel units are used, $99\%$ of work is a waste of time. We demonstrate thatwith $m/n$ blocks used by each unit the iteration complexity often remains thesame. Among other applications which we mention, this fact can be exploited inthe setting of distributed optimization to break the communication bottleneck.Our claims are justified by numerical experiments which demonstrate almost aperfect match with our theory on a number of datasets.

SEGA: Variance Reduction via Gradient Sketching

Hanzely, Filip; Mishchenko, Konstantin; Richtarik, Peter (arXiv, 2018-09-09) [Preprint]

We propose a randomized first order optimization method--SEGA (SkEtchedGrAdient method)-- which progressively throughout its iterations builds avariance-reduced estimate of the gradient from random linear measurements(sketches) of the gradient obtained from an oracle. In each iteration, SEGAupdates the current estimate of the gradient through a sketch-and-projectoperation using the information provided by the latest sketch, and this issubsequently used to compute an unbiased estimate of the true gradient througha random relaxation procedure. This unbiased estimate is then used to perform agradient step. Unlike standard subspace descent methods, such as coordinatedescent, SEGA can be used for optimization problems with a non-separableproximal term. We provide a general convergence analysis and prove linearconvergence for strongly convex objectives. In the special case of coordinatesketches, SEGA can be enhanced with various techniques such as importancesampling, minibatching and acceleration, and its rate is up to a small constantfactor identical to the best-known rate of coordinate descent.

Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization

Hanzely, Filip; Richtarik, Peter; Xiao, Lin (arXiv, 2018-08-09) [Preprint]

We consider the problem of minimizing the sum of two convex functions: one isdifferentiable and relatively smooth with respect to a reference convexfunction, and the other can be nondifferentiable but simple to optimize. Therelatively smooth condition is much weaker than the standard assumption ofuniform Lipschitz continuity of the gradients, thus significantly increases thescope of potential applications. We present accelerated Bregman proximalgradient (ABPG) methods that employ the Bregman distance of the referencefunction as the proximity measure. These methods attain an $O(k^{-\gamma})$convergence rate in the relatively smooth setting, where $\gamma\in [1, 2]$ isdetermined by a triangle scaling property of the Bregman distance. We developadaptive variants of the ABPG method that automatically ensure the bestpossible rate of convergence and argue that the $O(k^{-2})$ rate is attainablein most cases. We present numerical experiments with three applications:D-optimal experiment design, Poisson linear inverse problem, andrelative-entropy nonnegative regression. In all experiments, we obtainnumerical certificates showing that these methods do converge with the$O(k^{-2})$ rate.

Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches

Hanzely, Filip; Richtarik, Peter (arXiv, 2018-09-25) [Preprint]

Accelerated coordinate descent is a widely popular optimization algorithm dueto its efficiency on large-dimensional problems. It achieves state-of-the-artcomplexity on an important class of empirical risk minimization problems. Inthis paper we design and analyze an accelerated coordinate descent (ACD) methodwhich in each iteration updates a random subset of coordinates according to anarbitrary but fixed probability law, which is a parameter of the method. If allcoordinates are updated in each iteration, our method reduces to the classicalaccelerated gradient descent method AGD of Nesterov. If a single coordinate isupdated in each iteration, and we pick probabilities proportional to the squareroots of the coordinate-wise Lipschitz constants, our method reduces to thecurrently fastest coordinate descent method NUACDM of Allen-Zhu, Qu,Richt\'{a}rik and Yuan. While mini-batch variants of ACD are more popular and relevant in practice,there is no importance sampling for ACD that outperforms the standard uniformmini-batch sampling. Through insights enabled by our general analysis, wedesign new importance sampling for mini-batch ACD which significantlyoutperforms previous state-of-the-art minibatch ACD in practice. We prove arate that is at most ${\cal O}(\sqrt{\tau})$ times worse than the rate ofminibatch ACD with uniform sampling, but can be ${\cal O}(n/\tau)$ timesbetter, where $\tau$ is the minibatch size. Since in modern supervised learningtraining systems it is standard practice to choose $\tau \ll n$, and often$\tau={\cal O}(1)$, our method can lead to dramatic speedups. Lastly, we obtainsimilar results for minibatch nonaccelerated CD as well, achieving improvementson previous best rates.

A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion

Hanzely, Filip; Konečný, Jakub; Loizou, Nicolas; Richtarik, Peter; Grishchenko, Dmitry (arXiv, 2019-01-27) [Preprint]

In this work we present a randomized gossip algorithm for solving the averageconsensus problem while at the same time protecting the information about theinitial private values stored at the nodes. We give iteration complexity boundsfor the method and perform extensive numerical experiments.

The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.