[link]
Joint summary from https://mitpress.mit.edu/books/developmental-robotics Developmental robotics is the interdisciplinary approach to the autonomous design of behavioural and cognitive capabilities in artificial agents (robots) that takes direct inspiration from the developmental principles and mechanisms observed in the natural cognitive systems. It relies on a highly interdisciplinary effort of empirical developmental sciences such as developmental psychology, neuroscience, and comparative psychology, and computational and engineering disciplines such as robotics and artificial intelligence. The implementation of these principles and mechanisms into a robot-s control architecture and the testing through experiments where the robot interacts with its physical and social environment simultaneously permits the validation of such principles and the actual design of complex behavioural and mental capabilities in robots. Developmental psychology and developmental robotics mutually benefit from such a combined effort. Synonym terms of Developmental Robotics include cognitive developmental robotics, autonomous mental development as well as epigenetic robotics \cite{Cangelosi18}. The later term borrows the term ‘epigenetic’ from Piaget’s Epigenetic Theory of human development, where the child’s cognitive system develops as a result of the interaction between genetic predispositions and the organism’s interaction with the environment. Therefore, ‘Epigenetic robotics’ was justified by Piaget’s stress on the importance of the role of interaction with the environment (later complemented with the emphasis on social interaction of Zlatev and Balkenius in 2001), and in particular, on the sensorimotor bases of higher-order cognitive capabilities. Due to the challenges of robotics domain not only on acquisition of skills in an incremental manner but also in open-ended environments, it is perhaps worth noticing that the state of the art is not as advanced as in other more constrained well defined unique task settings. Because of the importance of embodiment in robotics for open ended learning \cite{Cangelosi18} and because of the scarcity of continual learning strategies on robotics and merely assessed on perception tasks, we believe both fields methodologies should borrow from each other. The blend could result not only in more effective or robust lifelong learning but also in less forgetful agents (i.e., the main rationale behind CL), which has not yet empirically shown is impossible to reach. We hope this survey helps reach this goal. Dynamical systems development: In math it is characterized by complex changes over time in the phase state. The complex interaction of nonlinear phenomena results in the production of unpredictable states of the system, often referred to as emergent states. This concept was borrowed by developmental psychologists \cite{Thelen94} to explain child development as the emergent product of the intricate and dynamic interaction of many decentralized and local interactions related to the child growing body and brain and her environment. Some definitions: Novelty, curiosity and surprise: focuses on the autonomy of learning, i.e. on the agents freedom to choose what, when and how it will learn. Intrinsic motivation (IM) is a mechanism to drive autonomous learning, not only in developmental robotics but also more broadly within the field of machine learning \cite{Mirolli13}\cite{Oudeyer07}. IM is task-independent, the agent could be placed in a completely new environment with no prior knowledge or experience, and through self-directed exploration the robot will potentially learn not only important features of the environment but also the behavioural skills necessary for dealing with the environment. Second, IM promotes the hierarchical learning rather than solving a specific, predefined task. The Theory of MInd (ToM): A robot can use its own theory of mind to improve the interaction with human users, for example. It aims at reading the intention of the others and reacting appropriately to the emotional, attentional and cognitive states of the other agents, to anticipate their reactions, and to modify its own behaviour to satisfy these expectation and needs \cite{Scasellati02}. Cascades: developmental theories refer to the far reach of early developments on later ones in terms of the “developmental cascade”. Saliency in robotics: In robotics vision, a set of feature extraction methods typically can be applied to derive a saliency map \cite{Itti01}, i.e. the identification of the parts of the image that are important for the robot behaviour. It can be created, e.g., by combining a set of feature extraction method for color, motion, orientation and brightness. These features are combined to generate the whole saliency map used by the model to fixate the objects. ![]() |
[link]
Through a likelihood-focused derivation of a variational inference (VI) loss, Variational Generative Experience Replay (VGER) presents the closest appropriate likelihood- focused alternative to Variational Continual Learning (VCL), the state-of the art prior-focused approach to continual learning. In non continual learning, the aim is to learn parameters $\omega$ using labelled training data $\mathcal{D}$ to infer $p(y|\omega, x)$. In the continual learning context, instead, the data is not independently and identically distributed (i.i.d.), but may be split into separate tasks $\mathcal{D}_t = (X_t, Y_t)$ whose examples $x_t^{n_t}$ and $y_t^{n_t}$ are assumed to be i.i.d. In \cite{Farquhar18}, as the loss at time $t$ cannot be estimated for previously discarded datasets, to approximate the distribution of past datasets $p_t(x,y)$, VGER (Variational Generative Experience Replay) trains a GAN $q_t(x, y)$ to produce ($\hat{x}, \hat{y}$) pairs for each class in each dataset as it arrives (generator is kept while data is discarded after each dataset is used). The variational free energy $\mathcal{F}_T$ is used to train on dataset $\mathcal{D}_T$ augmented with samples generated by the GAN. In this way the prior is set as the posterior approximation from the previous task. ![]() |
[link]
Exploring an environment with non-linearities in a continuous action space can be optimized by regulating the agent curiosity with an homeostatic drive. This means that a heterostatic drive to move away from habitual states is blended with a homeostatic motivation to encourage actions that lead to states where the agent is familiar with a state-action pair. This approach improves upon forward models and ICM Pathak et al 17 with an enhanced information gain that basically consists of the following: while the reward in \cite{Pathak17} is formulated as the forward model prediction error, the extended forward model loss in this paper is extended by substracting from the forward model prediction error the error knowing not only $s_t$ and $a_t$, but also $a_{t+1}$. Curiosity-driven reinforcement learning shows that an additional homeostatic drive enhances the information gain of a classical curious/heterostatic agent. Implementation: They take advantage of a new Bellman-like equation of information gain and simplify the computation of the local rewards. It could help by prioritizing the exploration of the state-action space according to how hard is to learn each region. Background: The concept of homeostatic regulation in social robots was first proposed in Breazeal et al. 04. They extend existing approaches by compensating the heterostacity drive encouraged by the curiosity reward with an additional homeostatic drive. 1) The first component implements the heterostatic drive (same as referred to in Pathak et al 17). In other words, this one refers to the tendency to push away our agent from its habitual state; 2) Homeostatic motivation: the second component is our novel contribution. It encourages taking actions $a_t$ that lead to future states $s_{t+1}$ where the corresponding future action $a_{t+1}$ gives us additional information about $s_{t+1}$. This situation happens when the agent is "familiar" with the state-action pair: $\{s_{t+1}, a_{t+1}\}$. The article misses exact comparison with Pathak et al regarding a joint task. In this paper the tasks consists of a 3 room navigation map is used to measure exploration. ![]() |