devin132's profile - ShortScience.org

arxiv.org

What's Hidden in a Randomly Weighted Neural Network?
Ramanujan, Vivek and Wortsman, Mitchell and Kembhavi, Aniruddha and Farhadi, Ali and Rastegari, Mohammad
- 2019 via Local Bibsonomy
Keywords: deep-learning, readings, generalization, theory

[link] Summary by devin132 6 years ago

The paper: "Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask" by Zhou et al., 2019 found that by just learning binary masks one can find random subnetworks that do much better than chance on a task. This new paper builds on this method by proposing a strong algorithm than Zhou et al. for finding these high-performing subnetworks.https://i.imgur.com/vxDqCKP.png

The intuition follows: "If a neural network with random weights (center) is sufficiently overparameterized, it will contain a subnetwork (right) that performs as well as a trained neural network (left) with the same number of parameters."

While Zhou et al. learned a probability for each weight this paper learns a score for each weight and takes the top k percent at evaluation. The scores are learned through their primary contribution that they call the edge-popup algorithm: 

https://i.imgur.com/9KcIbxd.png

"In the edge-popup Algorithm, we associate a score with each edge. On the forward pass we choose the top edges by score. On the backward pass we update the scores of all the edges with the straight-through estimator, allowing helpful edges that are “dead” to re-enter the subnetwork. *We never update the value of any weight in the network, only the score associated with each weight.*"

They're able to find higher-performing random subnetworks than Zhou et al.

https://i.imgur.com/T3D7OsZ.png

arxiv.org
arxiv-vanity.com
scholar.google.com

Progress & Compress: A scalable framework for continual learning
Jonathan Schwarz and Jelena Luketina and Wojciech M. Czarnecki and Agnieszka Grabska-Barwinska and Yee Whye Teh and Razvan Pascanu and Raia Hadsell
arXiv e-Print archive - 2018 via Local arXiv
Keywords: stat.ML, cs.LG
more

[link] Summary by devin132 6 years ago

Proposes a two-stage approach for continual learning. An active learning phase and a consolidation phase. The active learning stage optimizes for a specific task that is then consolidated into the knowledge base network via Elastic Weight Consolidation (Kirkpatrick et al., 2016). The active learning phases uses a separate network than the knowledge base, but is not always trained from scratch - authors suggest a heuristic based on task-similarity. Improves EWC by deriving a new online method so parameters don’t increase linearly with the number of tasks.

Desiderata for a continual learning solution:

- A continual learning method should not suffer from catastrophic forgetting. That is, it should be able to perform reasonably well on previously learned tasks.

- It should be able to learn new tasks while taking advantage of knowledge extracted from previous tasks, thus exhibiting positive forward transfer to achieve faster learning and/or better final performance.

- It should be scalable, that is, the method should be trainable on a large number of tasks.

- It should enable positive backward transfer as well, which means gaining improved performance on previous tasks after learning a new task which is similar or relevant.

- Finally, it should be able to learn without requiring task labels, and ideally, it should even be applicable in the absence of clear task boundaries.

Experiments:

- Sequential learning of handwritten characters of 50 alphabets taken from the Omniglot dataset.
- Sequential learning of 6 games in the Atari suite (Bellemare et al., 2012) (“Space Invaders”, “Krull”, “Beamrider”, “Hero”, “Stargunner” and “Ms. Pac-man”).
- 8 navigation tasks in 3D environments inspired by experiments with Distral (Teh et al., 2017).

devin132

sciscore: 3