Show simple item record

dc.contributor.authorBibi, Adel
dc.contributor.authorGhanem, Bernard
dc.contributor.authorKoltun, Vladlen
dc.contributor.authorRanftl, Rene
dc.date.accessioned2020-03-23T12:51:10Z
dc.date.available2020-03-23T12:51:10Z
dc.date.issued2019-02-23
dc.date.submitted2018-09-18
dc.identifier.urihttp://hdl.handle.net/10754/662270
dc.description.abstractWe provide a novel perspective on the forward pass through a block of layers in a deep network. In particular, we show that a forward pass through a standard dropout layer followed by a linear layer and a non-linear activation is equivalent to optimizing a convex objective with a single iteration of a τ-nice Proximal Stochastic Gradient method. We further show that replacing standard Bernoulli dropout with additive dropout is equivalent to optimizing the same convex objective with a variance-reduced proximal method. By expressing both fully-connected and convolutional layers as special cases of a high-order tensor product, we unify the underlying convex optimization problem in the tensor setting and derive a formula for the Lipschitz constant L used to determine the optimal step size of the above proximal methods. We conduct experiments with standard convolutional networks applied to the CIFAR-10 and CIFAR-100 datasets and show that replacing a block of layers with multiple iterations of the corresponding solver, with step size set via L, consistently improves classification accuracy.
dc.description.sponsorshipThis work was partially supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research.
dc.publisherOpenReview.net
dc.relation.urlhttps://openreview.net/forum?id=ryxxCiRqYX
dc.rightsArchived with thanks to OpenReview.net. This version is a redistributed copy from the original at https://openreview.net/forum?id=ryxxCiRqYX
dc.titleDeep Layers as Stochastic Solvers
dc.typeConference Paper
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentElectrical Engineering
dc.contributor.departmentElectrical Engineering Program
dc.contributor.departmentVCC Analytics Research Group
dc.conference.dateMay 6 - May 9, 2019
dc.conference.nameInternational Conference on Learning Representations
dc.conference.locationMay 6 - May 9, 2019
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionIntel Labs
pubs.publication-statusPublished
kaust.personBibi, Adel
kaust.personGhanem, Bernard
refterms.dateFOA2020-03-23T12:51:11Z
kaust.acknowledged.supportUnitOffice of Sponsored Research


Files in this item

Thumbnail
Name:
deep_layers_as_stochastic_solvers.pdf
Size:
655.5Kb
Format:
PDF
Description:
Conference Paper

This item appears in the following Collection(s)

Show simple item record