December 9, 2023

Distributional Bellman Operators over Mean-embeddings

Abstract

We propose a novel algorithmic framework for distributional reinforcement learn-ing, based on learning finite-dimensional mean embeddings of return distribu-tions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence the-ory, and examine the empirical performance of the algorithms on a suite of tabulartasks. Further, we show that this approach can be straightforwardly combinedwith deep reinforcement learning, and obtain a new deep RL agent that improvesover baseline distributional approaches on the Arcade Learning Environment.

Authors

Li Kevin Wenliang, Grégoire Delétang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, Mark Rowland

Venue

arXiv