Deep Information Propagation on ShortScience.org

arxiv.org
scholar.google.com

Deep Information Propagation
Schoenholz, Samuel S. and Gilmer, Justin and Ganguli, Surya and Sohl-Dickstein, Jascha
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Léo Paillier 7 years ago

_Objective:_ Fondamental analysis of random networks using mean-field theory. Introduces two scales controlling network behavior.

## Results:

Guide to choose hyper-parameters for random networks to be nearly critical (in between order and chaos). This in turn implies that information can propagate forward and backward and thus the network is trainable (not vanishing or exploding gradient).

Basically for any given number of layers and initialization covariances for weights and biases, tells you if the network will be trainable or not, kind of an architecture validation tool.

**To be noted:** any amount of dropout removes the critical point and therefore imply an upper bound on trainable network depth.

## Caveats:

*   Consider only bounded activation units: no relu, etc.
*   Applies directly only to fully connected feed-forward networks: no convnet, etc.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private