Show simple item record

dc.contributor.authorKhan, Mohammad Emtiyaz
dc.contributor.authorRue, Haavard
dc.date.accessioned2021-07-14T06:22:12Z
dc.date.available2021-07-14T06:22:12Z
dc.date.issued2021-07-09
dc.identifier.urihttp://hdl.handle.net/10754/670192
dc.description.abstractWe show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/2107.04562.pdf
dc.rightsArchived with thanks to arXiv
dc.titleThe Bayesian Learning Rule
dc.typePreprint
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.contributor.departmentStatistics Program
dc.eprint.versionPre-print
dc.contributor.institutionRIKEN Center for AI Project Tokyo, Japan
dc.identifier.arxivid2107.04562
kaust.personRue, Haavard
refterms.dateFOA2021-07-14T06:22:39Z


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
798.8Kb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record