Model-Based Active Exploration
Pranav Shyam and Wojciech Jaśkowski and Faustino Gomez
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.LG, cs.AI, cs.IT, cs.NE, math.IT, stat.ML


Summary by CodyWild 2 years ago
Hi, I am one of the authors of the paper. This is a very nice summary! To clarify one point: ICM is a special case of the Exploration Bonus DQN baseline we used. Unlike ICM, we measure the state visitation frequency directly with an oracle instead of relying on prediction errors as a proxy (details in appendix). Hence, the numbers for ICM will be very similar to Exploration Bonus DQN baseline and the performance we report can be considered as an upper bound of ICM’s potential performance. We agree that the value of the proposed method is dependant on its scalability and we are currently working on it :)

Great, thanks for the clarification! In general, this wasn't any particular judgment that ICM is a better or more correct baseline, it's just that I wasn't familiar with Exploration Bonus DQN, and so couldn't comment on it.

