What's Hidden in a Randomly Weighted Neural Network?
Ramanujan, Vivek
and
Wortsman, Mitchell
and
Kembhavi, Aniruddha
and
Farhadi, Ali
and
Rastegari, Mohammad
- 2019 via Local Bibsonomy
Keywords:
deep-learning, readings, generalization, theory
The paper: "Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask" by Zhou et al., 2019 found that by just learning binary masks one can find random subnetworks that do much better than chance on a task. This new paper builds on this method by proposing a strong algorithm than Zhou et al. for finding these high-performing subnetworks.https://i.imgur.com/vxDqCKP.png
The intuition follows: "If a neural network with random weights (center) is sufficiently overparameterized, it will contain a subnetwork (right) that performs as well as a trained neural network (left) with the same number of parameters."
While Zhou et al. learned a probability for each weight this paper learns a score for each weight and takes the top k percent at evaluation. The scores are learned through their primary contribution that they call the edge-popup algorithm:
https://i.imgur.com/9KcIbxd.png
"In the edge-popup Algorithm, we associate a score with each edge. On the forward pass we choose the top edges by score. On the backward pass we update the scores of all the edges with the straight-through estimator, allowing helpful edges that are “dead” to re-enter the subnetwork. *We never update the value of any weight in the network, only the score associated with each weight.*"
They're able to find higher-performing random subnetworks than Zhou et al.
https://i.imgur.com/T3D7OsZ.png