Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Frederik Kunstner
and
Lukas Balles
and
Philipp Hennig
arXiv e-Print archive - 2019 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2019/05/29 (5 years ago) Abstract: Natural gradient descent, which preconditions a gradient descent update with
the Fisher information matrix of the underlying statistical model, is a way to
capture partial second-order information. Several highly visible works have
advocated an approximation known as the empirical Fisher, drawing connections
between approximate second-order methods and heuristics like Adam. We dispute
this argument by showing that the empirical Fisher---unlike the Fisher---does
not generally capture second-order information. We further argue that the
conditions under which the empirical Fisher approaches the Fisher (and the
Hessian) are unlikely to be met in practice, and that, even on simple
optimization problems, the pathologies of the empirical Fisher can have
undesirable effects.