Deep Information Propagation
Schoenholz, Samuel S.
and
Gilmer, Justin
and
Ganguli, Surya
and
Sohl-Dickstein, Jascha
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords:
dblp
_Objective:_ Fondamental analysis of random networks using mean-field theory. Introduces two scales controlling network behavior.
## Results:
Guide to choose hyper-parameters for random networks to be nearly critical (in between order and chaos). This in turn implies that information can propagate forward and backward and thus the network is trainable (not vanishing or exploding gradient).
Basically for any given number of layers and initialization covariances for weights and biases, tells you if the network will be trainable or not, kind of an architecture validation tool.
**To be noted:** any amount of dropout removes the critical point and therefore imply an upper bound on trainable network depth.
## Caveats:
* Consider only bounded activation units: no relu, etc.
* Applies directly only to fully connected feed-forward networks: no convnet, etc.