Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan and Luke Vilnis and Quoc V. Le and Ilya Sutskever and Lukasz Kaiser and Karol Kurach and James Martens
arXiv e-Print archive - 2015 via Local arXiv
Keywords: stat.ML, cs.LG

