Self-Normalizing Neural Networks
Günter Klambauer
and
Thomas Unterthiner
and
Andreas Mayr
and
Sepp Hochreiter
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2017/06/08 (7 years ago) Abstract: Deep Learning has revolutionized vision via convolutional neural networks
(CNNs) and natural language processing via recurrent neural networks (RNNs).
However, success stories of Deep Learning with standard feed-forward neural
networks (FNNs) are rare. FNNs that perform well are typically shallow and,
therefore cannot exploit many levels of abstract representations. We introduce
self-normalizing neural networks (SNNs) to enable high-level abstract
representations. While batch normalization requires explicit normalization,
neuron activations of SNNs automatically converge towards zero mean and unit
variance. The activation function of SNNs are "scaled exponential linear units"
(SELUs), which induce self-normalizing properties. Using the Banach fixed-point
theorem, we prove that activations close to zero mean and unit variance that
are propagated through many network layers will converge towards zero mean and
unit variance -- even under the presence of noise and perturbations. This
convergence property of SNNs allows to (1) train deep networks with many
layers, (2) employ strong regularization, and (3) to make learning highly
robust. Furthermore, for activations not close to unit variance, we prove an
upper and lower bound on the variance, thus, vanishing and exploding gradients
are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning
repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with
standard FNNs and other machine learning methods such as random forests and
support vector machines. SNNs significantly outperformed all competing FNN
methods at 121 UCI tasks, outperformed all competing methods at the Tox21
dataset, and set a new record at an astronomy data set. The winning SNN
architectures are often very deep. Implementations are available at:
github.com/bioinf-jku/SNNs.
_Objective:_ Design Feed-Forward Neural Network (fully connected) that can be trained even with very deep architectures.
* _Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/), [CIFAR10](https://www.cs.toronto.edu/%7Ekriz/cifar.html), [Tox21](https://tripod.nih.gov/tox21/challenge/) and [UCI tasks](https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits).
* _Code:_ [here](https://github.com/bioinf-jku/SNNs)
## Inner-workings:
They introduce a new activation functio the Scaled Exponential Linear Unit (SELU) which has the nice property of making neuron activations converge to a fixed point with zero-mean and unit-variance.
They also demonstrate that upper and lower bounds and the variance and mean for very mild conditions which basically means that there will be no exploding or vanishing gradients.
The activation function is:
[![screen shot 2017-06-14 at 11 38 27 am](https://user-images.githubusercontent.com/17261080/27125901-1a4f7276-50f6-11e7-857d-ebad1ac94789.png)](https://user-images.githubusercontent.com/17261080/27125901-1a4f7276-50f6-11e7-857d-ebad1ac94789.png)
With specific parameters for alpha and lambda to ensure the previous properties. The tensorflow impementation is:
def selu(x):
alpha = 1.6732632423543772848170429916717
scale = 1.0507009873554804934193349852946
return scale*np.where(x>=0.0, x, alpha*np.exp(x)-alpha)
They also introduce a new dropout (alpha-dropout) to compensate for the fact that [![screen shot 2017-06-14 at 11 44 42 am](https://user-images.githubusercontent.com/17261080/27126174-e67d212c-50f6-11e7-8952-acad98b850be.png)](https://user-images.githubusercontent.com/17261080/27126174-e67d212c-50f6-11e7-8952-acad98b850be.png)
## Results:
Batch norm becomes obsolete and they are also able to train deeper architectures. This becomes a good choice to replace shallow architectures where random forest or SVM used to be the best results. They outperform most other techniques on small datasets.
[![screen shot 2017-06-14 at 11 36 30 am](https://user-images.githubusercontent.com/17261080/27125798-bd04c256-50f5-11e7-8a74-b3b6a3fe82ee.png)](https://user-images.githubusercontent.com/17261080/27125798-bd04c256-50f5-11e7-8a74-b3b6a3fe82ee.png)
Might become a new standard for fully-connected activations in the future.