Curiosity-driven reinforcement learning with homeostatic regulation on ShortScience.org

arxiv.org
scholar.google.com

Curiosity-driven reinforcement learning with homeostatic regulation
Ildefons Magrans de Abril and Ryota Kanai
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.AI
more

Summaries/Notes 1

[link] Summary by Natalia Diaz Rodriguez, PhD 7 years ago

Exploring an environment with non-linearities in a continuous action space can be optimized by regulating the agent curiosity with an homeostatic drive. This means that a heterostatic drive to move away from habitual states is blended with a homeostatic motivation to encourage actions that lead to states where the agent is familiar with a state-action pair.

This approach improves upon forward models and ICM Pathak et al 17 with an enhanced information gain that basically consists of the following: while the reward in \cite{Pathak17} is formulated as the forward model prediction error, the extended forward model loss in this paper is extended by substracting from the forward model prediction error the error knowing not only $s_t$ and $a_t$, but also $a_{t+1}$.

Curiosity-driven reinforcement learning shows that an additional homeostatic drive enhances the information gain of a classical curious/heterostatic agent.

Implementation: They take advantage of a new Bellman-like equation of information gain and simplify the computation of the local rewards. It could help by prioritizing the exploration of the state-action space according to how hard is to learn each region.

Background: The concept of homeostatic regulation in social robots was first proposed in Breazeal et al. 04. They extend existing approaches by compensating the heterostacity drive encouraged by the curiosity reward with an additional homeostatic drive. 1) The first component implements the heterostatic drive (same as referred to in Pathak et al 17). In other words, this one refers to the tendency to push away our agent from its habitual state; 2) Homeostatic motivation: the second component is our novel contribution. It encourages taking actions $a_t$ that lead to future states $s_{t+1}$ where the corresponding future action $a_{t+1}$ gives us additional information about $s_{t+1}$. This situation happens when the agent is "familiar" with the state-action pair: $\{s_{t+1}, a_{t+1}\}$.

The article misses exact comparison with Pathak et al regarding a joint task. In this paper the tasks consists of a 3 room navigation map is used to measure exploration.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private