**TL;DR:** There are 'place cells' in the hippopotamus that are fired when passing through a location. You can take a rat and measure how its cells are activated in a maze, then monitor neurons during planning, rest or sleep. You'll see patterns that show it's thinking of locations in order and focusing on interesting locations. This paper looks at how RL agents do 'prioritized experience replay' and compare it to place cells in animals. The authors do a RL simulation and *qualitatively* compare the results to the activity observed in place cells.
> Neural activity recorded from hippocampal place cells during spatial navigation typically represents the animal’s spatial position, though it can sometimes represent locations ahead of the animal. For instance, during “sharp wave ripple” events, activity might progress sequentially from the animal’s current location towards a goal location. These “forward replay” ´sequences predict subsequent behavior and have been suggested to support a planning mechanism that links actions to their deferred consequences along a spatial trajectory. However, analogously to the human evidence, remote activity in the hippocampus can also represent locations behind the animal, and even altogether disjoint, ´remote locations (especially during rest or sleep) (Fig. 1a).
> we develop a normative theory to predict not just whether but which memories should be accessed at each time
to enable the most rewarding future decisions.
> To test the implications of our theory, we simulate a spatial navigation task where an agent generates and stores experiences which can be later retrieved. We show that an agent that accesses memories sequentially and in order of utility
produces patterns of sequential state consideration that resemble place cell replay, and reproduces qualitatively and with
no parameter fitting a wealth of empirical findings including (i) the existence and balance between forward and reverse replay; (ii) the content of replay; and (iii) effects of experience.
> we propose the unifying view that all patterns of replay during behavior, rest, and sleep reflect different instances of a more general state retrieval operation that integrates experiences across space and time to propagate value and guide decisions.
**My 2 cents**: I like this paper because prioritized experience replay reminds me of how we often dream or daydream of novel good or bad events that happened or that we anticipate. This paper drills much deeper into this connection.