Course Information
Instructor: Prof. Songhwai Oh (오성회) Email: songhwai (at) snu.ac.kr Office Hours: Friday 2:00-4:00PM Office: Building 133 Room 405 |
Course Number: 430.729 (003) Time: M/W 11:00-12:15PM Location: Building 301 Room 102 |
TA: Jaegu Choy (최재구) Email: jaegu.choy (at) rllab.snu.ac.kr Office: Building 133 Room 610 |
Course Description
With recent developments in deep learning, deep reinforcement learning is getting attention as it can solve an increasing number of complex problems, including the classic game of Go, video games, self-driving vehicles, and robot manipulation. In this course, we will review recent advances in deep reinforcement learning. We will first review Markov decision processes (MDP) and traditional reinforcement learning techniques. Then we will review recent developments in robot learning, deep learning, and deep reinforcement learning, including topics such as behavior cloning, inverse reinforcement learning, policy gradient, deep Q-network (DQN), generative adversarial networks (GAN), and generative adversarial imitation learning. This is an advanced graduate course and substantial reading and programming assignments will be assigned. Students are expected to participate actively in class. Lectures will be in English.
Announcements
- List of Deep Reinforcement Learning Papers
- Class Project
- Project Schedule
- [05/11] Project proposal (1 page, 2 columns)
- [06/15] Project summary: Submit Title, Abstract, and Digest (template) to TA
- [06/17] Project presentation
- [06/19] Project report (6 page, 2 columns)
- [02/25] Please read Ethics of Learning.
Schedule
- Week 1:
- 03/16: Introduction
- Ch. 3 from Reinforcement Learning: An Introduction; Ch. 17 from AIMA
- 03/18: Review on MDPs
- Deep Learning Short Course
- Ch. 3 from Reinforcement Learning: An Introduction; Ch. 17 from AIMA
- 03/16: Introduction
- Week 2:
- 03/23: Review on POMDPs, RL algorithms
- Ch. 4, Ch. 6 from Reinforcement Learning: An Introduction; Ch. 21 from AIMA
- 03/25: Gaussian process regression
- Gaussian process regression (Ch. 2 from Gaussian Processes for Machine Learning)
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
- 03/23: Review on POMDPs, RL algorithms
- Week 3:
- 03/30: Behavior cloning (leveraged Gaussian process regression)
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
- 04/01: DAgger, Behavior cloning applications
- S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
- Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667. [Project Page with Datasets]
- Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). [Project Page with Code and Datasets]
- 03/30: Behavior cloning (leveraged Gaussian process regression)
- Week 4:
- 04/06: Deep Q Learning
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
- H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2016.
- Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
- 04/08: Sparse MDPs
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]
- 04/06: Deep Q Learning
- Week 5:
- 04/13: Policy gradient
- R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
- Baxter, Jonathan, and Peter L. Bartlett. "Reinforcement learning in POMDP's via direct gradient ascent." ICML. 2000.
- J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.
- 04/15: No Class (Election Day)
- 04/13: Policy gradient
- Week 6:
- 04/20: Policy gradient
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2015. [arXiv]
- Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002. (http://www.cs.cmu.edu/~./jcl/papers/aoarl/Final.pdf)
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017
- Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2013.
- 04/22: Actor-critic
- Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International Conference on Machine Learning, pp. 1928-1937. 2016.
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.
- Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
- N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun, 2012.
- 04/20: Policy gradient
- Week 7:
- 04/27: Maximum entropy RL
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, and Songhwai Oh, "Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning," arXiv preprint: 1902.00137, 2019.
- Jae In Kim, Mineui Hong, Kyungjae Lee, DongWook Kim, Yong-Lae Park, and Songhwai Oh, "Learning to Walk a Tripod Mobile Robot Using Nonlinear Soft Vibration Actuators with Entropy Adaptive Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2317-2324, Apr. 2020. [Supplementary Material | Video]
- 04/29: Inverse reinforcement learning (IRL), GP IRL
- A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun, 2000.
- Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." In Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, 2004. (supplementary)
- S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with Gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec, 2011.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.
- 04/27: Maximum entropy RL
- Week 8:
- 05/04: Maximum entropy IRL
- Ziebart, Brian D., Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. "Maximum Entropy Inverse Reinforcement Learning." In AAAI, vol. 8, pp. 1433-1438. 2008.
- Bagnell, J. Andrew, Nathan Ratliff, and Martin Zinkevich. "Maximum margin planning." In Proceedings of the International Conference on Machine Learning (ICML). 2006.
- 05/06: GAN, GAIL, MCTEIL
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2014.
- J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2016. [arXiv]
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Maximum Causal Tsallis Entropy Imitation Learning", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018. [Supplementary Material | arXiv preprint]
- 05/04: Maximum entropy IRL
- Week 9:
- 05/11:
- Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine, "SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning." International Conference on Machine Learning. 2019.
- David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, Gregory Wayne, "Experience replay for continual learning." Advances in Neural Information Processing Systems. 2019.
- Fujimoto, Scott, David Meger, and Doina Precup. "Off-Policy Deep Reinforcement Learning without Exploration." International Conference on Machine Learning. 2019.
- 05/13:
- Shixiang Gu, Timothy P. Lillicrap, Ilya Sutskever, Sergey Levine, "Continuous Deep Q-Learning with Model-based Acceleration," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, Garrett Thomas, "Value Iteration Networks," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine, "Reinforcement Learning with Deep Energy-Based Policies," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- 05/11:
- Week 10:
- 05/18:
- S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, "Q-Prop: Sample efficient policy gradient with an off-policy critic," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining policy gradient and Q-learning," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, "VIME: Variational Information Maximizing Exploration," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- 05/20:
- Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "Hindsight experience replay," Advances in neural information processing systems (NIPS), Dec, 2017.
- Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, "Deep Q-learning from Demonstrations," Association for the Advancement of Artificial Intelligence (AAAI). 2018.
- 05/18:
- Week 11:
- 05/25:
- Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, "Deep Exploration via Bootstrapped DQN," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Bridging the gap between value and policy based reinforcement learning," Advances in neural information processing systems (NIPS), Dec, 2017.
- Scott Fujimoto, Herke Hoof, and David Meger. "Addressing Function Approximation Error in Actor-Critic Methods.", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- 05/27:
- Du, Yilun, and Karthik Narasimhan. "Task-agnostic dynamics priors for deep reinforcement learning." arXiv preprint arXiv:1905.04819 (2019).
- O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Trust-PCL: An Off-Policy Trust Region Method for Continuous Control," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
- Matas, Jan, Stephen James, and Andrew J. Davison. "Sim-to-real reinforcement learning for deformable object manipulation." arXiv preprint arXiv:1806.07851 (2018).
- 05/25:
- Week 12:
- 06/01:
- Dosovitskiy, Alexey, and Vladlen Koltun. "Learning to act by predicting the future." arXiv preprint arXiv:1611.01779 (2016).
- J. Achiam, D. Held, A. Tamar, P. Abbeel, "Constrained Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017. [arXiv]
- Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine, "Path integral guided policy search," in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2017
- 06/03:
- Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research (JMLR), 2016.
- Tejas D. Kulkarni*, Karthik R. Narasimhan*, Ardavan Saeedi, Joshua B. Tenenbaum, "Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation." Advances in neural information processing systems (NIPS), Dec, 2016.
- Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel, "RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning", arXiv, 2016.
- 06/01:
- Week 13:
- 06/08:
- Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine, "Data-Efficient Hierarchical Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
- Marc G. Bellemare, Will Dabney, and Rémi Munos, "A Distributional Perspective on Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos, "Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2018.
- 06/10:
- Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu, "FeUdal Networks for Hierarchical Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel, "A Simple Neural Attentive Meta-Learner", in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- 06/08:
- Week 14:
- 06/17: Project Presentation and Poster Session
References
- Reinforcement Learning: An Introduction (2018, 2nd Edition) Richard S. Sutton, Andrew G. Barto
- Artificial Intelligence: A Modern Approach (3rd edition), Stuart Russell and Peter Norvig, Prentice Hall, 2009. (AIMA Website)
- Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, The MIT Press, 2006.
Prerequisites
- (430.457) Introduction to Intelligent Systems (지능시스템개론).
- Also requires strong background in algorithms, linear algebra, and probability.