[link]
Summary by Marek Rei 8 years ago
They describe a version of reinforcement learning where the system also learns to solve some auxiliary tasks, which helps with the main objective.
https://i.imgur.com/fmTVxvr.png
In addition to normal Q-learning, which predicts the downstream reward, they have the system learning 1) a separate policy for maximally changing the pixels on the screen, 2) maximally activating units in a hidden layer, and 3) predicting the reward at the next step, using biased sampling. They show that this improves learning speed and performance on Atari games and Labyrinth (a Quake-like 3D game).
more
less