Course Information
Instructor: Prof. Songhwai Oh (오성회) Email: songhwai (at) snu.ac.kr Office Hours: Friday 2:00-4:00PM Office: Building 133 Room 405 |
Course Number: 430.729 (003) Time: T/Th 11:00-12:15PM Location: Building 301 Room 106 |
TA: Kyungjae Lee (이경재) Email: kyungjae.lee (at) cpslab.snu.ac.kr Office: Building 133 Room 610 |
Course Description
With recent developments in deep learning, deep reinforcement learning is getting attention as it can solve an increasing number of complex problems, including the classic game of Go, video games, self-driving vehicles, and robot manipulation. In this course, we will review recent advances in deep reinforcement learning. We will first review Markov decision processes (MDP) and traditional reinforcement learning techniques. Then we will review recent developments in robot learning, deep learning, and deep reinforcement learning, including topics such as behavior cloning, inverse reinforcement learning, policy gradient, deep Q-network (DQN), generative adversarial networks (GAN), and generative adversarial imitation learning. This is an advanced graduate course and substantial reading and programming assignments will be assigned. Students are expected to participate actively in class. Lectures will be in English.
Announcements
- Deep Reinforcement Learning Papers
- Class Project
- [03/01] Please read Ethics of Learning.
Assignments
- [03/13] Programming Assignment 1 (due: 03/20)
- [04/02] Project Proposal (1-2 pages, 2 columns) (due: 04/12, in class)
- [04/11] Paper Assignment, Question Sheet
- [04/16] Proposal Revision (due: 04/24, in class): Attach the old proposal to the revised proposal and turn in both.
- [04/23] Programming Assignment 2 (due: 05/01)
- [05/02] Programming Assignment 3 (due: 05/10)
- ---------------------------------------------------------------------------------------
- [06/14] Project Report (6-8 pages, 2 columns)
Schedule
- Week 1:
- 03/06: Introduction
- Ch. 3 from Reinforcement Learning: An Introduction; Ch. 17 from AIMA
- 03/08: Review on MDPs
- Ch. 3 from Reinforcement Learning: An Introduction; Ch. 17 from AIMA
- 03/06: Introduction
- Week 2:
- 03/13: Review on POMDPs, RL algorithms
- Ch. 4, Ch. 6 from Reinforcement Learning: An Introduction; Ch. 21 from AIMA
- 03/15: Gaussian process regression
- Gaussian process regression (Ch. 2 from Gaussian Processes for Machine Learning)
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
- 03/13: Review on POMDPs, RL algorithms
- Week 3:
- 03/20: Behavior cloning (leveraged Gaussian process regression)
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
- 03/22: DAgger, Behavior cloning applications
- S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
- Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667. [Project Page with Datasets]
- Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). [Project Page with Code and Datasets]
- 03/20: Behavior cloning (leveraged Gaussian process regression)
- Week 4:
- 03/27: Deep Q Learning
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
- H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2016.
- Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
- 03/29: Policy gradient
- R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
- J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.
- 03/27: Deep Q Learning
- Week 5:
- 04/03: Robot Learning
- Hyemin Ahn, Yoonseon Oh, Sungjoon Choi, Claire J. Tomlin, and Songhwai Oh, "Online Learning to Approach a Person with No-Regret," IEEE Robotics and Automation Letters, vol. 3, no. 1, pp. 52-59, Jan. 2018. [Supplementary Material | Video]
- Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh, "Text2Action: Generative Adversarial Synthesis from Language to Action," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2018. [Video | arXiv preprint | Software]
- Kyunghoon Cho, Junghun Suh, Claire J. Tomlin, and Songhwai Oh, "Cost-Aware Path Planning under Co-Safe Temporal Logic Specifications," IEEE Robotics and Automation Letters, vol. 2, no. 4, pp. 2308-2315, Oct. 2017.
- Kyunghoon Cho and Songhwai Oh, "Learning-Based Model Predictive Control under Signal Temporal Logic Specifications," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2018. [Video]
- 04/05: Sparse MDPs
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]
- 04/03: Robot Learning
- Week 6:
- 04/10: Policy gradient
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2015. [arXiv]
- Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2013.
- Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002. (http://www.cs.cmu.edu/~./jcl/papers/aoarl/Final.pdf)
- 04/12: Actor-critic
- Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International Conference on Machine Learning, pp. 1928-1937. 2016.
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.
- Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
- N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun, 2012.
- 04/10: Policy gradient
- Week 7:
- 04/17: Inverse reinforcement learning
- A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun, 2000.
- Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." In Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, 2004. (supplementary)
- 04/19: MaxEnt, MMP
- Ziebart, Brian D., Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. "Maximum Entropy Inverse Reinforcement Learning." In AAAI, vol. 8, pp. 1433-1438. 2008.
- Bagnell, J. Andrew, Nathan Ratliff, and Martin Zinkevich. "Maximum margin planning." In Proceedings of the International Conference on Machine Learning (ICML). 2006.
- 04/17: Inverse reinforcement learning
- Week 8:
- 04/24: GP IRL
- S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with Gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec, 2011.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.
- 04/26: GAN, GAIL
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2014.
- J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2016. [arXiv]
- 04/24: GP IRL
- Week 9:
- 05/01: Exploration
- Paper 1: Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, "Deep Reinforcement Learning from Human Preferences," Advances in neural information processing systems (NIPS), Dec, 2017.
- Paper 2: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, "Deep Exploration via Bootstrapped DQN," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Paper 3: Fu, Justin, John Co-Reyes, and Sergey Levine, "Ex2: Exploration with exemplar models for deep reinforcement learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- 05/03: Exploration
- Paper 1: Z. Wang, V. Bapst, N. Heess, V. Mnih,R. Munos, K. Kavukcuoglu, and N. de Freitas, "Sample efficient actor-critic with experience replay," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- Paper 2: Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Rémi Munos, "Unifying Count-Based Exploration and Intrinsic Motivation," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Paper 3: Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, "VIME: Variational Information Maximizing Exploration," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- 05/01: Exploration
- Week 10:
- 05/08: NAF, CPO, VIN
- Paper 1: Shixiang Gu, Timothy P. Lillicrap, Ilya Sutskever, Sergey Levine, "Continuous Deep Q-Learning with Model-based Acceleration," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Paper 2: Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, Garrett Thomas, "Value Iteration Networks," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Paper 3: J. Achiam, D. Held, A. Tamar, P. Abbeel, "Constrained Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017. [arXiv]
- 05/10: Soft Q-Learning, Q-Prop, PGQL
- Paper 1: Haarnoja, T., Tang, H., Abbeel, P., and Levine, S., "Reinforcement Learning with Deep Energy-Based Policies" in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Paper 2: S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, "Q-Prop: Sample efficient policy gradient with an off-policy critic," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- Paper 3: B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining policy gradient and Q-learning," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- 05/08: NAF, CPO, VIN
- Week 11:
- 05/15: PCL, PPO
- Paper 1: O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Bridging the gap between value and policy based reinforcement learning," Advances in neural information processing systems (NIPS), Dec, 2017.
- Paper 2: O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Trust-PCL: An Off-Policy Trust Region Method for Continuous Control," In Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- Paper 3: J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
- 05/17: Policy Search, Applications
- Paper 1: Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine, "Path integral guided policy search," in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2017.
- Paper 2: Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research (JMLR), 2016.
- Paper 3: Alexey Dosovitskiy, Vladlen Koltun, "Learning to Act by Predicting the Future", in Proc. of the International Conference of Learning Representations (ICLR), May, 2016.
- 05/15: PCL, PPO
- Week 12:
- 05/22: No class
- 05/24: No class
- Week 13:
- 05/29: HER, DQfD
- Paper 1: Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "Hindsight experience replay," Advances in neural information processing systems (NIPS), Dec, 2017.
- Paper 2: Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Paper 3: Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, "Deep Q-learning from Demonstrations," Association for the Advancement of Artificial Intelligence (AAAI). 2018.
- 05/31: Distributional RL
- Paper 1: Marc G. Bellemare, Will Dabney, Rémi Munos, "A Distributional Perspective on Reinforcement Learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Paper 2: Gabriel Barth-Maron, Matthew W. Hoffman, David Buddenbudden@google.com, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," In Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- Paper 3:
- 05/29: HER, DQfD
- Week 14:
- 06/05:
- 06/07: Project Presentation and Poster Session
- Week 15:
References
- Reinforcement Learning: An Introduction (2018, 2nd Edition) Richard S. Sutton, Andrew G. Barto
- Artificial Intelligence: A Modern Approach (3rd edition), Stuart Russell and Peter Norvig, Prentice Hall, 2009. (AIMA Website)
- Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, The MIT Press, 2006.
Prerequisites
- (430.457) Introduction to Intelligent Systems (지능시스템개론).
- Also requires strong background in algorithms, linear algebra, and probability.