Course Information
Instructor: Prof. Songhwai Oh (오성회) Email: songhwai (at) snu.ac.kr Office Hours: Friday 2:00-4:00PM Office: Building 133 Room 405 |
Course Number: 430.729 (005) Time: M/W 11:00-12:15PM Location: Building 302 Room 209 |
TA: Obin Kwon (권오빈) Email: obin.kwon (at) rllab.snu.ac.kr Office: Building 133 Room 610 |
Course Description
With recent developments in deep learning, deep reinforcement learning is getting attention as it can solve an increasing number of complex problems, including the classic game of Go, video games, self-driving vehicles, and robot manipulation. In this course, we will review recent advances in deep reinforcement learning. We will first review Markov decision processes (MDP) and traditional reinforcement learning techniques. Then we will review recent developments in robot learning, deep learning, and deep reinforcement learning, including topics such as behavior cloning, inverse reinforcement learning, policy gradient, deep Q-network (DQN), generative adversarial networks (GAN), and generative adversarial imitation learning. This is an advanced graduate course and substantial reading and programming assignments will be assigned. Students are expected to participate actively in class. Lectures will be in English.
Announcements
- List of Deep Reinforcement Learning Papers
- Project Schedule
- [04/19] Paper Presentation Announcement
- [04/12] Homework 5 [source code] (due: 04/18, 23:59 KST -> 04/23, 23:59 KST)
- [04/15 Update] The instruction file has been updated. If you are having trouble installing torcs, download the instruction file again.
- [04/05] Homework 4 [source code] (due: 04/11, 23:59 KST)
- [03/24] Homework 3 [source code] (due: 03/30, 23:59 KST)
- [03/17] Homework 2 [source code] (due: 03/23, 23:59 KST)
- [03/08] Homework 1 [source code] (due: 03/15, 23:59 KST)
-
- [Note] If you want to use other deep learning libraries such as PyTorch or TensorFlow-v2, you have to write your own code based on the given source code and submit your code.
- [02/23] Please read Ethics of Learning.
Schedule
- Week 1:
- 03/03: Introduction
- Ch. 3 from Reinforcement Learning: An Introduction; Ch. 17 from AIMA
- 03/03: Introduction
- Week 2:
- 03/08: Review on MDPs
- Deep Learning Short Course
- Ch. 3 from Reinforcement Learning: An Introduction; Ch. 17 from AIMA
- 03/10: Review on POMDPs, RL algorithms
- Ch. 4, Ch. 6 from Reinforcement Learning: An Introduction; Ch. 21 from AIMA
- 03/08: Review on MDPs
- Week 3:
- 03/15: Gaussian process regression
- Gaussian process regression (Ch. 2 from Gaussian Processes for Machine Learning)
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
- 03/17: Behavior cloning (leveraged Gaussian process regression)
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
- 03/15: Gaussian process regression
- Week 4:
- 03/22: DAgger, Behavior cloning applications
- S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
- Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667. [Project Page with Datasets]
- Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). [Project Page with Code and Datasets]
- 03/24: Deep Q Learning
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
- H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb. 2016.
- Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2016.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
- 03/22: DAgger, Behavior cloning applications
- Week 5:
- 03/29: Sparse MDPs
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]
- 03/31: Policy gradient
- R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
- Baxter, Jonathan, and Peter L. Bartlett. "Reinforcement learning in POMDP's via direct gradient ascent." ICML. 2000.
- J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.
- 03/29: Sparse MDPs
- Week 6:
- 04/05: GPS, TRPO, PPO
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul. 2015. [arXiv]
- Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002. (http://www.cs.cmu.edu/~./jcl/papers/aoarl/Final.pdf)
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017
- Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2013.
- 04/07: Actor-critic
- Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International Conference on Machine Learning, pp. 1928-1937. 2016.
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.
- Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
- N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun. 2012.
- 04/05: GPS, TRPO, PPO
- Week 7:
- 04/12: Maximum entropy RL
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug. 2018.
- Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]
- Jae In Kim, Mineui Hong, Kyungjae Lee, DongWook Kim, Yong-Lae Park, and Songhwai Oh, "Learning to Walk a Tripod Mobile Robot Using Nonlinear Soft Vibration Actuators with Entropy Adaptive Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2317-2324, Apr. 2020. [Supplementary Material | Video]
- 04/14: Inverse reinforcement learning (IRL), GPIRL
- A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun. 2000.
- Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." In Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, 2004. (supplementary)
- S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with Gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec. 2011.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.
- 04/12: Maximum entropy RL
- Week 8:
- 04/19: Maximum entropy IRL
- Ziebart, Brian D., Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. "Maximum Entropy Inverse Reinforcement Learning." In AAAI, vol. 8, pp. 1433-1438. 2008.
- Bagnell, J. Andrew, Nathan Ratliff, and Martin Zinkevich. "Maximum margin planning." In Proceedings of the International Conference on Machine Learning (ICML). 2006.
- 04/21: GAN, GAIL, MCTEIL
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec. 2014.
- J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2016. [arXiv]
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Maximum Causal Tsallis Entropy Imitation Learning", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018. [Supplementary Material | arXiv preprint]
- 04/19: Maximum entropy IRL
- Week 9: Paper Presentation
- 04/26:
- Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "Hindsight experience replay," Advances in neural information processing systems (NIPS), Dec, 2017.
- Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu, "Impala: Scalable Distributed Deep RL with Importance Weighted Actor-Learner Architectures", in Proc. of the International Conference on Machine Learning (ICML), Aug. 2018.
- Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney, "Recurrent Experience Replay in Distributed Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2019.
- 04/28:
- Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell, Curiosity-driven Exploration by Self-supervised Prediction , Proceedings of the International Conference on Machine Learning (ICML), 2017.
- Bingyi Kang, Zequn Jie, Jiashi Feng, Policy Optimization with Demonstrations, Proceedings of the International Conference on Machine Learning (ICML), 2018.
- Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel, Overcoming Exploration in Reinforcement Learning with Demonstrations, Proceedings of the International Conference on Robotics and Automation (ICRA), 2018.
- 04/26:
- Week 10: Paper Presentation
- 05/03:
- Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel, Constrained Policy Optimization, Proceedings of the International Conference on Machine Learning (ICML), 2017.
- Min Wen, Ufuk Topcu, Constrained Cross-Entropy Method for Safe Reinforcement Learning, Advances in neural information processing systems (NIPS), 2018.
- Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick, End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks, in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2019.
- 05/05: No Class (Holiday)
- 05/03:
- Week 11: Paper Presentation
- 05/10:
- Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine, SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning, Proceedings of the International Conference on Machine Learning (ICML), 2019.
- Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson, Learning Latent Dynamics for Planning from Pixels, Proceedings of the International Conference on Machine Learning, 2019.
- Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak, Planning to Explore via Self-Supervised World Models, Proceedings of the International Conference on Machine Learning, 2020.
- 05/12:
- Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski, Model-Based Reinforcement Learning for Atari
- Differentiable MPC for End-to-end Planning and Control, in Proc. of the International Conference on Learning Representations (ICLR), 2020.
- 05/10:
- Week 12: Paper Presentation
- 05/17:
- Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine, Diversity is All You Need: Learning Skills without a Reward Function, in Proc. of the International Conference on Learning Representations (ICLR), 2019.
- ,Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman, Dynamics-Aware Unsupervised Discovery of Skills, in Proc. of the International Conference on Learning Representations (ICLR), 2020.
- 05/19: No Class (Holiday)
- 05/17:
- Week 13: Paper Presentation
- 05/24:
- Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne, Hierarchical Visuomotor Control of Humanoids, in Proc. of the International Conference on Learning Representations (ICLR), 2019.
- Andrew Levy, George Konidaris, Robert Platt, Kate Saenko, Learning Multi-Level Hierarchies with Hindsight, in Proc. of the International Conference on Learning Representations (ICLR), 2019.
- 05/26:
- Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- Will Dabney*, Georg Ostrovski*, David Silver, and Rémi Munos, "Implicit Quantile Networks for Distributional Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- 05/24:
- Week 14: Paper Presentation
- Week 15: Paper Presentation
- Week 16:
- 06/16: Project Presentation
References
- Reinforcement Learning: An Introduction (2018, 2nd Edition) Richard S. Sutton, Andrew G. Barto
- Artificial Intelligence: A Modern Approach (3rd edition), Stuart Russell and Peter Norvig, Prentice Hall, 2009. (AIMA Website)
- Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, The MIT Press, 2006.
Prerequisites
- (430.457) Introduction to Intelligent Systems (지능시스템개론).
- Also requires strong background in algorithms, linear algebra, and probability.