First published: 2018/10/29 (6 years ago) Abstract: Efficient exploration is an unsolved problem in Reinforcement Learning. We
introduce Model-Based Active eXploration (MAX), an algorithm that actively
explores the environment. It minimizes data required to comprehensively model
the environment by planning to observe novel events, instead of merely reacting
to novelty encountered by chance. Non-stationarity induced by traditional
exploration bonus techniques is avoided by constructing fresh exploration
policies only at time of action. In semi-random toy environments where directed
exploration is critical to make progress, our algorithm is at least an order of
magnitude more efficient than strong baselines.