Critic Regularized Regression
Ziyu Wang
and
Alexander Novikov
and
Konrad Zolna
and
Jost Tobias Springenberg
and
Scott Reed
and
Bobak Shahriari
and
Noah Siegel
and
Josh Merel
and
Caglar Gulcehre
and
Nicolas Heess
and
Nando de Freitas
arXiv e-Print archive - 2020 via Local arXiv
Keywords:
cs.LG, cs.AI, stat.ML
First published: 2024/12/22 (just now) Abstract: Offline reinforcement learning (RL), also known as batch RL, offers the
prospect of policy optimization from large pre-recorded datasets without online
environment interaction. It addresses challenges with regard to the cost of
data collection and safety, both of which are particularly pertinent to
real-world applications of RL. Unfortunately, most off-policy algorithms
perform poorly when learning from a fixed dataset. In this paper, we propose a
novel offline RL algorithm to learn policies from data using a form of
critic-regularized regression (CRR). We find that CRR performs surprisingly
well and scales to tasks with high-dimensional state and action spaces --
outperforming several state-of-the-art offline RL algorithms by a significant
margin on a wide range of benchmark tasks.