### Brief Summary:
Does what it says on the tin. The authors examine the loss surface of the Network-In-Network and VGG neural networks near the critical points found by 5 different stochastic optimisation algorithms. These algorithms are Adam, Adadelta, SGD, SGD with momentum and a new Runge-Kutta based optimiser. They examine the loss function by interpolating along lines between the initial and final parameter values.
They find that the different algorithms find qualitatively genuinely different types of minima. They find that the basin around the minima of ADAM is bigger than those for other optimisation algorithms. They find also that all of the minima are similarly good.