Deep RL Papers

Spring 2026

Imitation Learning / Learning from Demonstrations

Deep RL Algorithms

  • Allen Z. Ren, Justin Lidard, Lars L. Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz, "Diffusion Policy Policy Optimization", in Proc. of the International Conference on Learning Representations (ICLR), 2025.
  • Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin, "Simplifying Deep Temporal Difference Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Offline RL

  • Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, and Martin Riedmiller, "Offline Actor-Critic Reinforcement Learning Scales to Large Models", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Safe RL

Distributional RL

Unsupervised RL

Multi-Agent RL

Reinforcement Learning from Human Feedback (RLHF)

Model-Based RL