Sharp Minima Can Generalize For Deep Nets on ShortScience.org

proceedings.mlr.press
scholar.google.com

Sharp Minima Can Generalize For Deep Nets
Dinh, Laurent and Pascanu, Razvan and Bengio, Samy and Bengio, Yoshua
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by David Stutz 4 years ago

Dinh et al. show that it is unclear whether flat minima necessarily generalize better than sharp ones. In particular, they study several notions of flatness, both based on the local curvature and based on the notion of “low change in error”. The authors show that the parameterization of the network has a significant impact on the flatness; this means that functions leading to the same prediction function (i.e., being indistinguishable based on their test performance) might have largely varying flatness around the obtained minima, as illustrated in Figure 1. In conclusion, while networks that generalize well usually correspond to flat minima, it is not necessarily true that flat minima generalize better than sharp ones.

https://i.imgur.com/gHfolEV.jpg
Figure 1: Illustration of the influence of parameterization on the flatness of the obtained minima.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private