Deep RL Papers

Spring 2025

Recent papers

Imitation Learning / Learning from Demonstrations
- Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Matthieu Geist, and Stefano Ermon, "IQ-Learn: Inverse soft-Q Learning for Imitation", Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Benjamin Eysenbach, Sergey Levine, and Ruslan Salakhutdinov, "Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification", Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, and Kee-Eung Kim"DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations", in Proc. of the International Conference on Learning Representations (ICLR), 2022.
- Joe Watson, Sandy Huang, and Nicolas Heess, "Coherent Soft Imitation Learning", Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Black, Kevin et al. "pi0: A Vision-Language-Action Flow Model for General Robot Control", arxiv preprint, 2024.
Deep RL Algorithms
- Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, and Sergey Levine, "Contrastive Learning as Goal-Conditioned Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Divyansh Garg, Joey Hejna, Matthieu Geist, and Stefano Ermon, "Extreme Q-Learning: MaxEnt RL without Entropy", in Proc. of the International Conference on Learning Representations (ICLR), 2023.
- Harshit Sikchi, Qinqing Zheng, Amy Zhang, and Scott Niekum, "Dual RL: Unification and New Methods for Reinforcement and Imitation Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2024.
- Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, and Huazhe Xu, "ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization", in Proc. of the International Conference on Machine Learning (ICML), 2024.
- Yiming Wang, Kaiyan Zhao, Furui Liu, and Leong Hou U, "Rethinking Exploration in Reinforcement Learning with Effective Metric-Based Exploration Bonus", Advances in Neural Information Processing Systems (NeurIPS), 2024.
- Shao, Zhihong et al. "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", arxiv preprint, 2024.
- Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin, "Simplifying Deep Temporal Difference Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.
Offline RL
- Michael Janner, Qiyang Li, and Sergey Levine, "Offline Reinforcement Learning as One Big Sequence Modeling Problem", Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Ilya Kostrikov, Ashvin Nair, and Sergey Levine, "Offline Reinforcement Learning with Implicit Q-Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2022.
- Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine, "Planning with Diffusion for Flexible Behavior Synthesis", in Proc. of theInternational Conference on Machine Learning (ICML), 2022.
- Zhendong Wang, Jonathan J Hunt, and Mingyuan Zhou, "Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2023.
- Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, and Seungyul Han, "Exclusively Penalized Q-learning for Offline Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2024.
Safe RL
- Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, and Animesh Garg, "Conservative Safety Critics for Exploration", in Proc. of the International Conference on Learning Representations (ICLR), 2021.
- Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, and Ding Zhao, "Constrained Variational Policy Optimization for Safe Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2022.
- Dohyeong Kim, Kyungjae Lee, and Songhwai Oh, "Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints", Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, and Songhwai Oh, "Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees", Advances in Neural Information Processing Systems (NeurIPS), 2024.
Distributional RL
- Marc G. Bellemare, Will Dabney, and Rémi Munos, "A Distributional Perspective on Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2017.
- Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos,
  "Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2018.
- Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, and Will Dabney, "Statistics and Samples in Distributional Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2019.
- Daniel Wontae Nam, Younghoon Kim, and Chan Y. Park, "GMAC: A Distributional Perspective on Actor-Critic Framework", in Proc. of theInternational Conference on Machine Learning (ICML), 2021.
- Yecheng Jason Ma, Dinesh Jayaraman, and Osbert Bastani, "Conservative Offline Distributional Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2021.
Unsupervised RL
- Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine, "Diversity is All You Need: Learning Skills without a Reward Function", in Proc. of theInternational Conference on Learning Representations (ICLR), 2019.
- Hao Liu and Pieter Abbeel, "APS: Active Pretraining with Successor Features", in Proc. of the International Conference on Machine Learning (ICML), 2021.
- Pierre-Alexandre Kamienny, Jean Tarbouriech, Sylvain Lamprier, Alessandro Lazaric, and Ludovic Denoyer, "Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching", in Proc. of the International Conference on Learning Representations (ICLR), 2022.
- Ahmed Touati, Jérémy Rapin, and Yann Ollivier, "Does Zero-Shot Reinforcement Learning Exist?", in Proc. of the International Conference on Learning Representations (ICLR), 2023.
- Seohong Park, Oleh Rybkin, and Sergey Levine, "METRA: Scalable Unsupervised RL with Metric-Aware Abstraction", in Proc. of the International Conference on Learning Representations (ICLR), 2024.
Multi-Agent RL
- Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch, "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments", Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson, "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2018.
- Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi, "QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2019.
- Bei Peng, Tabish Rashid, Christian Schroeder de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Boehmer, and Shimon Whiteson, "FACMAC: Factored Multi-Agent Centralised Policy Gradients", Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Xiangsen Wang, Haoran Xu, Yinan Zheng, and Xianyuan Zhan, "Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization", Advances in Neural Information Processing Systems (NeurIPS), 2023.
Reinforcement Learning from Human Feedback (RLHF)
- Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee, "Preference Transformer: Modeling Human Preferences using Transformers for RL", in Proc. of the International Conference on Learning Representations (ICLR), 2023.
- Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn, "Direct Preference Optimization: Your Language Model is Secretly a Reward Model", Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Joey Hejna and Dorsa Sadigh, "Inverse Preference Learning: Preference-based RL without a Reward Function", Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Munos, Remi et al. "Nash Learning From Human Feedback", in Proc. of the International Conference on Machine Learning (ICML), 2024.
- Xinran Wang, Qi Le, Ammar Ahmed, Enmao Diao, Yi Zhou, Nathalie Baracaldo, Jie Ding, and Ali Anwar, "MAP: Multi-Human-Value Alignment Palette", in Proc. of the International Conference on Learning Representations (ICLR), 2025.
Curriculum RL / Environment Design
- Pascal Klink, Haoyi Yang, Carlo D’Eramo, Jan Peters, and Joni Pajarinen, "Curriculum Reinforcement Learning via Constrained Optimal Transport", in Proc. of the International Conference on Machine Learning (ICML), 2022.
- Seungjae Lee, Daesol Cho, Jonghae Park, and H. Jin Kim, "CQM: Curriculum Reinforcement Learning with a Quantized World Model", Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, and Songhwai Oh, "Adversarial Environment Design via Regret-Guided Diffusion Models", Advances in Neural Information Processing Systems (NeurIPS), 2024.
Model-Based RL
- Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine, "When to Trust Your Model: Model-Based Policy Optimization", Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao, "Mastering Atari Games with Limited Data", Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba, "Mastering Atari with Discrete World Models", in Proc. of the International Conference on Learning Representations (ICLR), 2021.
- Nicklas Hansen, Xiaolong Wang, and Hao Su, "Temporal Difference Learning for Model Predictive Control", in Proc. of the International Conference on Machine Learning (ICML), 2022.
- Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, and Yang Yu, "Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2023.

Classic papers

MDP
- David Blackwell. "Discounted dynamic programming," The Annals of Mathematical Statistics, 1965.
- Emanuel Todorov. "Linearly-solvable Markov decision problems," Advances in neural information processing systems (NIPS), Dec, 2007.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]
RL
- Christopher JCH Watkins, and Peter Dayan, "Q-learning," Machine learning, 1992.
- R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
- Sham M. Kakade, "A natural policy gradient," Advances in Neural Information Processing Systems (NIPS), Dec. 2002.
- Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002.
- Rasmussen, Carl Edward, and Malte Kuss, "Gaussian Processes in Reinforcement Learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2003.
- Jens Kober, and Jan R. Peters, "Policy search for motor primitives in robotics," Advances in neural information processing systems (NIPS), Dec, 2008.
- Hado V Hasselt, "Double Q-learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2010.
- Peters, Jan, Katharina Mülling, and Yasemin Altun. "Relative Entropy Policy Search," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Jul, 2010.
- N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun, 2012.
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.

Deep RL Algorithms
- Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2013.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2015. [arXiv]
- H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2016.
- Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Junhyuk Oh, Valliappa Chockalingam, Satinder P. Singh, Honglak Lee, "Control of Memory, Active Perception, and Action in Minecraft," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Shixiang Gu, Timothy P. Lillicrap, Ilya Sutskever, Sergey Levine, "Continuous Deep Q-Learning with Model-based Acceleration," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, "Deep Exploration via Bootstrapped DQN," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, Garrett Thomas, "Value Iteration Networks," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Rémi Munos, "Unifying Count-Based Exploration and Intrinsic Motivation," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, "VIME: Variational Information Maximizing Exploration," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Tejas D. Kulkarni*, Karthik R. Narasimhan*, Ardavan Saeedi, Joshua B. Tenenbaum, "Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation." Advances in neural information processing systems (NIPS), Dec, 2016.
- John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel, "High-Dimensional Continuous Control Using Generalized Advantage Estimation," in Proc. of the International Conference of Learning Representations (ICLR), May, 2016.
- Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research (JMLR), 2016.
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, 2016.
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, "Mastering the game of Go without human knowledge," Nature, 2017.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, and D. P. Wierstra, "Continuous control with deep reinforcement learning," U.S. Patent Application No. 15/217,758, 2017. [arXiv]
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
- J. Schulman, P. Abbeel, and X. Chen, "Equivalence between policy gradients and soft Q-Learning," arXiv preprint arXiv:1704.06440, 2017.
- P. H. Richemond, and Brendan Maginnis, "A short variational proof of equivalence between policy gradients and soft Q learning," arXiv preprint arXiv:1712.08650, 2017.
- J. Achiam, D. Held, A. Tamar, P. Abbeel, "Constrained Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017. [arXiv]
- Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine, "Reinforcement Learning with Deep Energy-Based Policies," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu, "FeUdal Networks for Hierarchical Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Z. Wang, V. Bapst, N. Heess, V. Mnih,R. Munos, K. Kavukcuoglu, and N. de Freitas, "Sample efficient actor-critic with experience replay," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining policy gradient and Q-learning," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, "Q-Prop: Sample efficient policy gradient with an off-policy critic," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, and J. Ba, "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation," Advances in neural information processing systems (NIPS), Dec, 2017.
- O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Bridging the gap between value and policy based reinforcement learning," Advances in neural information processing systems (NIPS), Dec, 2017.
- Junhyuk Oh, Satinder Singh, Honglak Lee, "Value Prediction Network," Advances in neural information processing systems (NIPS), Dec, 2017.
- Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, "Deep Reinforcement Learning from Human Preferences," Advances in neural information processing systems (NIPS), Dec, 2017.
- Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "
  Hindsight experience replay,
  "
  Advances in neural information processing systems (NIPS), Dec, 2017.
- Justin Fu, John Co-Reyes, and Sergey Levine, "Ex2: Exploration with exemplar models for deep reinforcement learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine, "Path integral guided policy search," in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2017.
- O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Trust-PCL: An Off-Policy Trust Region Method for Continuous Control," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
- Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, "
  Deep Q-learning from Demonstrations
  ," Association for the Advancement of Artificial Intelligence (AAAI). 2018.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint],
- Scott Fujimoto, Herke Hoof, and David Meger. "Addressing Function Approximation Error in Actor-Critic Methods.", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine, "Data-Efficient Hierarchical Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
- Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, and Chun-Yi Lee, "Diversity-Driven Exploration Strategy for Deep Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
- Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh, "A Lyapunov-based Approach to Safe Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
- Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]
- Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov, "Exploration by Random Network Distillation", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
- Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine, "SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning." International Conference on Machine Learning. 2019.
- David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, Gregory Wayne, "Experience replay for continual learning." Advances in Neural Information Processing Systems. 2019.
- Fujimoto, Scott, David Meger, and Doina Precup. "Off-Policy Deep Reinforcement Learning without Exploration." International Conference on Machine Learning. 2019.
- Du, Yilun, and Karthik Narasimhan. "Task-agnostic dynamics priors for deep reinforcement learning." arXiv preprint arXiv:1905.04819 (2019).
- Matas, Jan, Stephen James, and Andrew J. Davison. "Sim-to-real reinforcement learning for deformable object manipulation." arXiv preprint arXiv:1806.07851 (2018).
- Dosovitskiy, Alexey, and Vladlen Koltun. "Learning to act by predicting the future." arXiv preprint arXiv:1611.01779 (2016).
- Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine, "Conservative Q-Learning for Offline Reinforcement Learning," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2020.

Learning from Demonstration (Behavior Cloning)
- S. Ross and D. Bagnell, “Efficient reductions for imitation learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, May, 2010.
- S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
- Sungjoon Choi, Eunwoo Kim, and Songhwai Oh, "Real-Time Navigation in Crowded Dynamic Environments Using Gaussian Process Motion Control," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Jun. 2014. [Video]
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
- Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
- Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). (http://rpg.ifi.uzh.ch/dronet.html)
- Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel, "Overcoming Exploration in Reinforcement Learning with Demonstrations," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May, 2018.
- Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine, "Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations," in Proc. of Robotics: Science and Systems (RSS), Jun, 2018.
- Bingyi Kang, Zequn Jie, and Jiashi Feng, "Policy Optimization with Demonstrations," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2018.
Inverse RL
- A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun, 2000.
- B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. of International Conference on Machine learning (ICML), Aug, 2005.
- N. D. Ratliff, J. A. Bagnell, and M. Zinkevich, “Maximum margin planning,” in Proc. of the International Conference on Machine Learning (ICML), Jun, 2006.
- N. D. Ratliff, D. M. Bradley, J. A. Bagnell, and J. E. Chestnutt, “Boosting structured prediction for imitation learning,” Advances in Neural Information Processing Systems (NIPS), Dec, 2007.
- D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Jan, 2007.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning,” in Proc. of the 23rd AAAI Conference on Artificial Intelligence (AAAI), July, 2008.
- N. D. Ratliff, D. Silver, and J. A. Bagnell, “Learning to search: Functional gradient techniques for imitation learning,” Autonomous Robots, Jul, 2009.
- K. Dvijotham and E. Todorov, “Inverse optimal control with linearly-solvable MDPs,” in Proc. of the International Conference on Machine Learning (ICML), Jun, 2010.
- B. D. Ziebart, “Modeling purposeful adaptive behavior with the principle of maximum causal entropy,” Ph.D. dissertation, Carnegie Mellon University, 2010.
- S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec, 2011.
- A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, Apr, 2011.
- Sergey Levine, Vladlen Koltun, "Continuous Inverse Optimal Control with Locally Optimal Examples," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2012.
- J. Choi and K. Kim, “Bayesian nonparametric feature construction for inverse reinforcement learning,” in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Aug, 2013.
- J. Zheng, S. Liu, and L. M. Ni, “Robust bayesian inverse reinforcement learning with sparse behavior noise,” in Proc. of the AAAI Conference on Artificial Intelligence (AAAI ), Jul, 2014.
- J. Choi and K.-E. Kim, “Hierarchical bayesian inverse reinforcement learning,” Cybernetics, IEEE Transactions on, 2015.
- Finn, Chelsea, Sergey Levine, and Pieter Abbeel. "Guided cost learning: Deep inverse optimal control via policy optimization," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- K. Shiarlis, J. Messias, M. van Someren, and S. Whiteson, “Inverse reinforcement learning from failure,” In Proc. of the 2016 International Conference on Autonomous Agents & Multiagent Systems, May, 2016.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.
- J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2016. [arXiv]
- Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor, "End-to-End Differentiable Adversarial Imitation Learning," in Proc. of International Conference on Machine learning (ICML), Aug, 2017.
- Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, and J. Andrew Bagnell, "Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction," in Proc. of International Conference on Machine learning (ICML), Aug, 2017.
- Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, and Ken Goldberg, "DART: Noise Injection for Robust Imitation Learning", in Proc. of the Conference on Robot Learning (CoRL), Nov. 2017.
- K. Hausman, Y. Chebotar, S. Schaal, G. S. Sukhatme, J. J. Lim, "Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- Ziyu Wang, Josh S. Merel, Scott E. Reed, Nando de Freitas, Gregory Wayne, Nicolas Heess, "Robust Imitation of Diverse Behaviors," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- Justin Fu, Katie Luo, and Sergey Levine, "
  Learning Robust Rewards with Adverserial Inverse Reinforcement Learning",
  in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
- Wonseok Jeon, Seokin Seo, and Kee-Eung Kim. "
  A Bayesian Approach to Generative Adversarial Imitation Learning.
  " in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "
  Maximum Causal Tsallis Entropy Imitation Learning
  ", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018.
  [Supplementary Material | arXiv preprint]
- Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, and Jonathan Tompson, "
  Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
  ", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
Other Topics
- S. Ross, “Interactive learning for sequential decisions and predictions,” Ph. D. dissertation, Carnegie Mellon University, 2013.
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2014.
- Y. Gal, "Uncertainty in deep learning," University of Cambridge, 2016.
- Y. Gal, Z. Ghahramani, "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- P. McClure and K. Nikolaus, "Representing inferential uncertainty in deep neural networks through sampling," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- A. Kendall and Y. Gal, "What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei, "Deep Reinforcement Learning from Human Preferences," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2017.
- M. Teye, H. Azizpour, and K. Smith, "Bayesian Uncertainty Estimation for Batch Normalized Deep Networks," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
- Joshua Achiam, Harrison Edwards, Dario Amodei, and Pieter Abbeel, "Variational option discovery algorithms." arXiv, 2018.

Deep RL Papers

Recent papers

Imitation Learning / Learning from Demonstrations

Deep RL Algorithms

Offline RL

Safe RL

Distributional RL

Unsupervised RL

Multi-Agent RL

Reinforcement Learning from Human Feedback (RLHF)

Curriculum RL / Environment Design

Model-Based RL

Classic papers

MDP

RL

Deep RL Algorithms

Learning from Demonstration (Behavior Cloning)

Inverse RL

Other Topics