Curiosity-driven reinforcement learning with homeostatic regulation
Ildefons Magrans de Abril
and
Ryota Kanai
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.AI
First published: 2018/01/23 (6 years ago) Abstract: We propose a curiosity reward based on information theory principles and
consistent with the animal instinct to maintain certain critical parameters
within a bounded range. Our experimental validation shows the added value of
the additional homeostatic drive to enhance the overall information gain of a
reinforcement learning agent interacting with a complex environment using
continuous actions. Our method builds upon two ideas: i) To take advantage of a
new Bellman-like equation of information gain and ii) to simplify the
computation of the local rewards by avoiding the approximation of complex
distributions over continuous states and actions.
Exploring an environment with non-linearities in a continuous action space can be optimized by regulating the agent curiosity with an homeostatic drive. This means that a heterostatic drive to move away from habitual states is blended with a homeostatic motivation to encourage actions that lead to states where the agent is familiar with a state-action pair.
This approach improves upon forward models and ICM Pathak et al 17 with an enhanced information gain that basically consists of the following: while the reward in \cite{Pathak17} is formulated as the forward model prediction error, the extended forward model loss in this paper is extended by substracting from the forward model prediction error the error knowing not only $s_t$ and $a_t$, but also $a_{t+1}$.
Curiosity-driven reinforcement learning shows that an additional homeostatic drive enhances the information gain of a classical curious/heterostatic agent.
Implementation: They take advantage of a new Bellman-like equation of information gain and simplify the computation of the local rewards. It could help by prioritizing the exploration of the state-action space according to how hard is to learn each region.
Background: The concept of homeostatic regulation in social robots was first proposed in Breazeal et al. 04. They extend existing approaches by compensating the heterostacity drive encouraged by the curiosity reward with an additional homeostatic drive. 1) The first component implements the heterostatic drive (same as referred to in Pathak et al 17). In other words, this one refers to the tendency to push away our agent from its habitual state; 2) Homeostatic motivation: the second component is our novel contribution. It encourages taking actions $a_t$ that lead to future states $s_{t+1}$ where the corresponding future action $a_{t+1}$ gives us additional information about $s_{t+1}$. This situation happens when the agent is "familiar" with the state-action pair: $\{s_{t+1}, a_{t+1}\}$.
The article misses exact comparison with Pathak et al regarding a joint task. In this paper the tasks consists of a 3 room navigation map is used to measure exploration.