Deep RL Papers

Spring 2024

Learning from Demonstration (Behavior Cloning)

S. Ross and D. Bagnell, “Efficient reductions for imitation learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, May, 2010.
S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
Sungjoon Choi, Eunwoo Kim, and Songhwai Oh, "Real-Time Navigation in Crowded Dynamic Environments Using Gaussian Process Motion Control," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Jun. 2014. [Video]
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667.
Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). (http://rpg.ifi.uzh.ch/dronet.html)
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel, "Overcoming Exploration in Reinforcement Learning with Demonstrations," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May, 2018.
Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine, "Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations," in Proc. of Robotics: Science and Systems (RSS), Jun, 2018.
Bingyi Kang, Zequn Jie, and Jiashi Feng, "Policy Optimization with Demonstrations," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2018.

Deep RL

Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2013.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2015. [arXiv]
H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2016.
Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Junhyuk Oh, Valliappa Chockalingam, Satinder P. Singh, Honglak Lee, "Control of Memory, Active Perception, and Action in Minecraft," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Shixiang Gu, Timothy P. Lillicrap, Ilya Sutskever, Sergey Levine, "Continuous Deep Q-Learning with Model-based Acceleration," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, "Deep Exploration via Bootstrapped DQN," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, Garrett Thomas, "Value Iteration Networks," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Rémi Munos, "Unifying Count-Based Exploration and Intrinsic Motivation," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, "VIME: Variational Information Maximizing Exploration," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
Tejas D. Kulkarni*, Karthik R. Narasimhan*, Ardavan Saeedi, Joshua B. Tenenbaum, "Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation." Advances in neural information processing systems (NIPS), Dec, 2016.
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel, "High-Dimensional Continuous Control Using Generalized Advantage Estimation," in Proc. of the International Conference of Learning Representations (ICLR), May, 2016.
Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research (JMLR), 2016.
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, 2016.
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, "Mastering the game of Go without human knowledge," Nature, 2017.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, and D. P. Wierstra, "Continuous control with deep reinforcement learning," U.S. Patent Application No. 15/217,758, 2017. [arXiv]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
J. Schulman, P. Abbeel, and X. Chen, "Equivalence between policy gradients and soft Q-Learning," arXiv preprint arXiv:1704.06440, 2017.
P. H. Richemond, and Brendan Maginnis, "A short variational proof of equivalence between policy gradients and soft Q learning," arXiv preprint arXiv:1712.08650, 2017.
J. Achiam, D. Held, A. Tamar, P. Abbeel, "Constrained Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017. [arXiv]
Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine, "Reinforcement Learning with Deep Energy-Based Policies," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu, "FeUdal Networks for Hierarchical Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Z. Wang, V. Bapst, N. Heess, V. Mnih,R. Munos, K. Kavukcuoglu, and N. de Freitas, "Sample efficient actor-critic with experience replay," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining policy gradient and Q-learning," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, "Q-Prop: Sample efficient policy gradient with an off-policy critic," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, and J. Ba, "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation," Advances in neural information processing systems (NIPS), Dec, 2017.
O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Bridging the gap between value and policy based reinforcement learning," Advances in neural information processing systems (NIPS), Dec, 2017.
Junhyuk Oh, Satinder Singh, Honglak Lee, "Value Prediction Network," Advances in neural information processing systems (NIPS), Dec, 2017.
Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, "Deep Reinforcement Learning from Human Preferences," Advances in neural information processing systems (NIPS), Dec, 2017.
Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "
Hindsight experience replay,
"
Advances in neural information processing systems (NIPS), Dec, 2017.
Justin Fu, John Co-Reyes, and Sergey Levine, "Ex2: Exploration with exemplar models for deep reinforcement learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine, "Path integral guided policy search," in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2017.
O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Trust-PCL: An Off-Policy Trust Region Method for Continuous Control," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, "
Deep Q-learning from Demonstrations
," Association for the Advancement of Artificial Intelligence (AAAI). 2018.
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint],
Scott Fujimoto, Herke Hoof, and David Meger. "Addressing Function Approximation Error in Actor-Critic Methods.", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine, "Data-Efficient Hierarchical Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Tsu-Jui Fu, and Chun-Yi Lee, "Diversity-Driven Exploration Strategy for Deep Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh, "A Lyapunov-based Approach to Safe Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov, "Exploration by Random Network Distillation", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine, "SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning." International Conference on Machine Learning. 2019.
David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, Gregory Wayne, "Experience replay for continual learning." Advances in Neural Information Processing Systems. 2019.
Fujimoto, Scott, David Meger, and Doina Precup. "Off-Policy Deep Reinforcement Learning without Exploration." International Conference on Machine Learning. 2019.
Du, Yilun, and Karthik Narasimhan. "Task-agnostic dynamics priors for deep reinforcement learning." arXiv preprint arXiv:1905.04819 (2019).
Matas, Jan, Stephen James, and Andrew J. Davison. "Sim-to-real reinforcement learning for deformable object manipulation." arXiv preprint arXiv:1806.07851 (2018).
Dosovitskiy, Alexey, and Vladlen Koltun. "Learning to act by predicting the future." arXiv preprint arXiv:1611.01779 (2016).
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine, "Conservative Q-Learning for Offline Reinforcement Learning," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2020.
Tengyu Xu, Yingbin Liang, and Guanghui Lan, "CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2021.
Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, and Animesh Garg, "Conservative Safety Critics for Exploration," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2022.
Ilya Kostrikov, Ashvin Nair, and Sergey Levine, "Offline Reinforcement Learning with Implicit Q-Learning," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2022.
Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, and Arthur Guez, "COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2022.
Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian Wang, and Dacheng Tao, "Penalized Proximal Policy Optimization for Safe Reinforcement Learning," in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Jul, 2022.
Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, and Ding Zhao, "Constrained Variational Policy Optimization for Safe Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2022.
Divyansh Garg, Joey Hejna, Matthieu Geist, and Stefano Ermon, "Extreme Q-Learning: MaxEnt RL without Entropy," in Proc. of the International Conference on Learning Representations (ICLR), May, 2023.
Dohyeong Kim, Kyungjae Lee, and Songhwai Oh, "Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2023.

Distributional RL

Marc G. Bellemare, Will Dabney, and Rémi Munos, "A Distributional Perspective on Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos, "Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2018.
Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
Will Dabney*, Georg Ostrovski*, David Silver, and Rémi Munos, "Implicit Quantile Networks for Distributional Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, and Tieyan Liu, "Fully Parameterized Quantile Function for Distributional Reinforcement Learning," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2019.
Yecheng Jason Ma, Dinesh Jayaraman, and Osbert Bastani, "Conservative Offline Distributional Reinforcement Learning," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2021. [arXiv]

Meta RL

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel, "RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning", arXiv, 2016.
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick, "Learning to Reinforcement Learn", arXiv, 2016.
Chelsea Finn, Pieter Abbeel, and Sergey Levine, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel, "A Simple Neural Attentive Meta-Learner", in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine, "Meta-Reinforcement Learning of Structured Exploration Strategies", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.

Inverse RL

A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun, 2000.
B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. of International Conference on Machine learning (ICML), Aug, 2005.
N. D. Ratliff, J. A. Bagnell, and M. Zinkevich, “Maximum margin planning,” in Proc. of the International Conference on Machine Learning (ICML), Jun, 2006.
N. D. Ratliff, D. M. Bradley, J. A. Bagnell, and J. E. Chestnutt, “Boosting structured prediction for imitation learning,” Advances in Neural Information Processing Systems (NIPS), Dec, 2007.
D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Jan, 2007.
B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning,” in Proc. of the 23rd AAAI Conference on Artificial Intelligence (AAAI), July, 2008.
N. D. Ratliff, D. Silver, and J. A. Bagnell, “Learning to search: Functional gradient techniques for imitation learning,” Autonomous Robots, Jul, 2009.
K. Dvijotham and E. Todorov, “Inverse optimal control with linearly-solvable MDPs,” in Proc. of the International Conference on Machine Learning (ICML), Jun, 2010.
B. D. Ziebart, “Modeling purposeful adaptive behavior with the principle of maximum causal entropy,” Ph.D. dissertation, Carnegie Mellon University, 2010.
S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec, 2011.
A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, Apr, 2011.
Sergey Levine, Vladlen Koltun, "Continuous Inverse Optimal Control with Locally Optimal Examples," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2012.
J. Choi and K. Kim, “Bayesian nonparametric feature construction for inverse reinforcement learning,” in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Aug, 2013.
J. Zheng, S. Liu, and L. M. Ni, “Robust bayesian inverse reinforcement learning with sparse behavior noise,” in Proc. of the AAAI Conference on Artificial Intelligence (AAAI ), Jul, 2014.
J. Choi and K.-E. Kim, “Hierarchical bayesian inverse reinforcement learning,” Cybernetics, IEEE Transactions on, 2015.
Finn, Chelsea, Sergey Levine, and Pieter Abbeel. "Guided cost learning: Deep inverse optimal control via policy optimization," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
K. Shiarlis, J. Messias, M. van Someren, and S. Whiteson, “Inverse reinforcement learning from failure,” In Proc. of the 2016 International Conference on Autonomous Agents & Multiagent Systems, May, 2016.
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.
J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2016. [arXiv]
Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor, "End-to-End Differentiable Adversarial Imitation Learning," in Proc. of International Conference on Machine learning (ICML), Aug, 2017.
Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, and J. Andrew Bagnell, "Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction," in Proc. of International Conference on Machine learning (ICML), Aug, 2017.
Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, and Ken Goldberg, "DART: Noise Injection for Robust Imitation Learning", in Proc. of the Conference on Robot Learning (CoRL), Nov. 2017.
K. Hausman, Y. Chebotar, S. Schaal, G. S. Sukhatme, J. J. Lim, "Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
Ziyu Wang, Josh S. Merel, Scott E. Reed, Nando de Freitas, Gregory Wayne, Nicolas Heess, "Robust Imitation of Diverse Behaviors," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
Justin Fu, Katie Luo, and Sergey Levine, "
Learning Robust Rewards with Adverserial Inverse Reinforcement Learning",
in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
Wonseok Jeon, Seokin Seo, and Kee-Eung Kim. "
A Bayesian Approach to Generative Adversarial Imitation Learning.
" in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "
Maximum Causal Tsallis Entropy Imitation Learning
", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018.
[Supplementary Material | arXiv preprint]
Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, and Jonathan Tompson, "
Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.

Multi-Agent RL

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch, "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments," in Proc. of the Neural Information Processing Systems (NIPS), Dec, 2017.
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson, "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2018.
Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, and Chongjie Zhang, "QPLEX: Duplex Dueling Multi-Agent Q-Learning," in Proc. of the International Conference on Learning Representations (ICLR), May, 2021.
Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, and Yaodong Yang, "Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2022.

Applications

MDP

David Blackwell. "Discounted dynamic programming," The Annals of Mathematical Statistics, 1965.
Emanuel Todorov. "Linearly-solvable Markov decision problems," Advances in neural information processing systems (NIPS), Dec, 2007.
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]

RL

Christopher JCH Watkins, and Peter Dayan, "Q-learning," Machine learning, 1992.
R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
Sham M. Kakade, "A natural policy gradient," Advances in Neural Information Processing Systems (NIPS), Dec. 2002.
Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002.
Rasmussen, Carl Edward, and Malte Kuss, "Gaussian Processes in Reinforcement Learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2003.
Jens Kober, and Jan R. Peters, "Policy search for motor primitives in robotics," Advances in neural information processing systems (NIPS), Dec, 2008.
Hado V Hasselt, "Double Q-learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2010.
Peters, Jan, Katharina Mülling, and Yasemin Altun. "Relative Entropy Policy Search," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Jul, 2010.
N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun, 2012.
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.

Deep RL Papers

Learning from Demonstration (Behavior Cloning)

Deep RL

Distributional RL

Meta RL

Inverse RL

Multi-Agent RL

Applications

MDP

RL

Other Topics

Copyright