Deep RL Papers

Spring 2026

Imitation Learning / Learning from Demonstrations

Benjamin Eysenbach, Sergey Levine, and Ruslan Salakhutdinov, "Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Matteo Pirotta, Andrea Tirinzoni, Ahmed Touati, Alessandro Lazaric, Yann Ollivier, "Fast Imitation via Behavior Foundation Models", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Xingchen Cao, Fan-Ming Luo, Junyin Ye, Tian Xu, Zhilong Zhang, and Yang Yu, "Limited Preference Aided Imitation Learning from Imperfect Demonstrations", in Proc. of the International Conference on Machine Learning (ICML), 2024

Yunke Wang, Minjing Dong, Yukun Zhao, Bo Du, and Chang Xu, "Imitation Learning from Purified Demonstrations", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, and Shao-Hua Sun, "Diffusion-Reward Adversarial Imitation Learning", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, and Georg Martius, "Zero-Shot Offline Imitation Learning via Optimal Transport", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Deep RL Algorithms

Jongmin Lee, Wonseok Jeon, Byung-Jun Lee, Joelle Pineau, and Kee-Eung Kim, "OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation", in Proc. of the International Conference on Machine Learning (ICML), 2021.

Divyansh Garg, Joey Hejna, Matthieu Geist, and Stefano Ermon, "Extreme Q-Learning: MaxEnt RL without Entropy", in Proc. of the International Conference on Learning Representations (ICLR), 2023.

Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, and Jan Peters, "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Yiming Wang, Kaiyan Zhao, Furui Liu, and Leong Hou U, "Rethinking Exploration in Reinforcement Learning with Effective Metric-Based Exploration Bonus", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Allen Z. Ren, Justin Lidard, Lars L. Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz, "Diffusion Policy Policy Optimization", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin, "Simplifying Deep Temporal Difference Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Physical Intelligence et al., "$π^{*}_{0.6}$: a VLA That Learns From Experience", arXiv preprint, 2025.

Offline RL

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine, "Conservative Q-Learning for Offline Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2020.

Michael Janner, Qiyang Li, and Sergey Levine, "Offline Reinforcement Learning as One Big Sequence Modeling Problem", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine, "Planning with Diffusion for Flexible Behavior Synthesis", in Proc. of the International Conference on Machine Learning (ICML), 2022.

Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, and Jun Zhu, "Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2023.

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, and Martin Riedmiller, "Offline Actor-Critic Reinforcement Learning Scales to Large Models", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Huayu Chen, Kaiwen Zheng, Hang Su, and Jun Zhu, "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, and Seungyul Han, "Exclusively Penalized Q-learning for Offline Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Shiyuan Zhang, Weitong Zhang, and Quanquan Gu, "Energy-Weighted Flow Matching for Offline Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Safe RL

Adam Stooke, Joshua Achiam, and Pieter Abbeel, "Responsive Safety in Reinforcement Learning by PID Lagrangian Methods", in Proc. of the International Conference on Machine Learning (ICML), 2020.

Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, and Yaodong Yang, "SafeDreamer: Safe Reinforcement Learning with World Models", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Yarden As, Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Stelian Coros, and Andreas Krause, "ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Dohyeong Kim, Mineui Hong, Jeongho Park, and Songhwai Oh, "Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Distributional RL

Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos, "Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2018.

Daniel Wontae Nam, Younghoon Kim, and Chan Y. Park, "GMAC: A Distributional Perspective on Actor-Critic Framework", in Proc. of the International Conference on Machine Learning (ICML), 2021.

Li Kevin Wenliang, Gregoire Deletang, Matthew Aitchison, Marcus Hutter, Anian Ruoss, Arthur Gretton, and Mark Rowland, "Distributional Bellman Operators over Mean Embeddings", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bîrsan, and Martin Slowik, "ADDQ: Adaptive Distributional Double Q-Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Ke Sun, Yingnan Zhao, Enze Shi, Yafei Wang, Xiaodong Yan, Bei Jiang, and Linglong Kong, "Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2025.

Unsupervised RL

Ahmed Touati and Yann Ollivier, "Learning One Representation to Optimize All Rewards", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Tongzhou Wang, Antonio Torralba, Phillip Isola, and Amy Zhang, "Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning", in Proc. of the International Conference on Machine Learning (ICML), 2023.

Ahmed Touati, Jérémy Rapin, and Yann Ollivier, "Does Zero-Shot Reinforcement Learning Exist?", in Proc. of the International Conference on Learning Representations (ICLR), 2023.

Seohong Park, Kimin Lee, Youngwoon Lee, and Pieter Abbeel, "Controllability-Aware Unsupervised Skill Discovery", in Proc. of the International Conference on Machine Learning (ICML), 2023.

Seohong Park, Tobias Kreiman, and Sergey Levine, "Foundation Policies with Hilbert Representations", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, and Matteo Pirotta, "Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Marco Bagatella, Matteo Pirotta, Ahmed Touati, Alessandro Lazaric, and Andrea Tirinzoni, "TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2026.

Multi-Agent RL

Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson, "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2018.

Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, and Chongjie Zhang, "Efficient Multi-agent Reinforcement Learning by Planning", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, and Pan Xu, "Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Ian Gemp, Andreas Alexander Haupt, Luke Marris, Siqi Liu, and Georgios Piliouras, "Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Yueheng Li, Guangming Xie, and Zongqing Lu, "Revisiting Cooperative Off-Policy Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Reinforcement Learning from Human Feedback (RLHF)

Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee, "Preference Transformer: Modeling Human Preferences using Transformers for RL", in Proc. of the International Conference on Learning Representations (ICLR), 2023.

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn, "Direct Preference Optimization: Your Language Model is Secretly a Reward Model", Advances in Neural Information Processing Systems (NeurIPS), 2023.

Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Tangjie Lv, Changjie Fan, and Zhipeng Hu, "AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, and Dorsa Sadigh, "Contrastive Preference Learning: Learning from Human Feedback without Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Teng Xiao, Yige Yuan, Mingxiao Li, Zhengyu Chen, and Vasant G. Honavar, "On a Connection Between Imitation Learning and RLHF", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Model-Based RL

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba, "Mastering Atari with Discrete World Models", in Proc. of the International Conference on Learning Representations (ICLR), 2021.

Nicklas Hansen, Hao Su, and Xiaolong Wang, "TD-MPC2: Scalable, Robust World Models for Continuous Control", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Kwanyoung Park and Youngwoon Lee, "Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, and Xuguang Lan, "Bootstrapped Model Predictive Control", in Proc. of the International Conference on Learning Representations (ICLR), 2025.