The main idea in this paper is to use the agent's ability to predict observations at the next step as a measure of how much exploration of that action should be encouraged. This prediction is based on a deep architecture, specifically a deep autoencoder representation of observations, and accuracy of prediction is measured at the level of that learned, deep representation. Exploration is encourage by increasing the reward whenever the models prediction of the representation at the next time step is bad.
#### My two cents
I'm not sure how novel this idea is in RL, but at the very least it's interesting that it was explored the way it was here, with deep learning. As a non-expert in RL, I certainly enjoyed reading the paper. Also, this implements nicely an idea that just seems like common sense, as an exploration strategy for an agent: actions that merit exploration are those that yield results that are unexpected to you.
It will be interesting to see if this general approach will be able to exploit upcoming progress in the development of better generative deep learning models, an area that is currently very active.