Course Information

Robot Learning - Spring 2024

Instructor: Prof. Songhwai Oh (오성회) Email: songhwai (at) snu.ac.kr Office Hours: Friday 3:00-4:00PM Office: Building 133 Room 403	Course Number: M2608.002700 Time: M/W 3:30-4:45PM Location: Building 301 Room 106
TA: Jaeyeon Jeong (정재연) Email: jaeyeon.jeong (at) rllab.snu.ac.kr Office: Building 133 Room 610

Course Description

Robot learning is a field that combines robotics and artificial intelligence (AI) to research how robots can acquire new skills and knowledge through experience, similar to humans. Robots can adapt and learn from new environments, tasks, and situations through interactions with their surroundings. Robot learning includes various techniques and approaches to make robots more autonomous and capable of performing tasks without explicit programming for all possible scenarios. In this course, we will review recent advances in robot learning, including imitation learning and deep reinforcement learning. We will first review Markov decision processes (MDP) and reinforcement learning. Then we will discuss recent developments in imitation learning, deep learning, and deep reinforcement learning, including topics such as behavior cloning, inverse reinforcement learning, policy gradient, deep Q-network (DQN), generative adversarial imitation learning, maximum entropy reinforcement learning, safe reinforcement learning, and offline reinforcement learning. This is an advanced graduate course and substantial reading and programming exercises will be assigned. Students are expected to participate actively in class. Lectures will be in English.

Announcements

Project Page
Project Schedule
- [05/08] Project proposal (2 pages, 2 columns; no Abstract)
- [06/05] Project summary: submit (1) Title and (2) Abstract to TA
- [06/12] Project presentation and poster session
- [06/14] Project report (minimum 6 pages, 2 columns)
List of Deep Reinforcement Learning Papers
[03/03] Please read Ethics of Learning.

Schedule

Week 1:
- 03/04: Introduction
  - Ch. 3 from Reinforcement Learning: An Introduction; Ch. 16 from AIMA
- 03/06: Review on MDPs and POMDPs
  - Deep Learning Short Course
  - Ch. 3 from Reinforcement Learning: An Introduction; Ch. 16 from AIMA
Week 2:
- 03/11: Reinforcement learning
  - Ch. 4, Ch. 6 from Reinforcement Learning: An Introduction; Ch. 23 from AIMA
- 03/13: Review of probability theory; Gaussian process regression
Week 3:
- 03/18: Behavior cloning (leveraged Gaussian process regression)
  - Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
    - Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
    - Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
  - Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
- 03/20: DAgger, Behavior cloning applications
  - S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
  - Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667. [Project Page with Datasets]
  - Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). [Project Page with Code and Datasets]
Week 4:
- 03/25: Deep Q Learning
  - Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
  - H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb. 2016.
  - Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2016.
  - T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
- 03/27: Sparse MDPs
Week 5:
- 04/01: Policy gradient
  - R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
  - R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
  - Baxter, Jonathan, and Peter L. Bartlett. "Reinforcement learning in POMDP's via direct gradient ascent." ICML. 2000.
  - J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.
- 04/03: GPS, TRPO, PPO
  - J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul. 2015. [arXiv]
  - Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002. (http://www.cs.cmu.edu/~./jcl/papers/aoarl/Final.pdf)
  - J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017
  - Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2013.
Week 6:
- 04/08: Actor-critic
  - Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International Conference on Machine Learning, pp. 1928-1937. 2016.
  - David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.
  - Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
- 04/10: No Class (Election Day)
Week 7
- 04/15: Maximum entropy RL
  - Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug. 2018.
  - Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]
  - Jae In Kim, Mineui Hong, Kyungjae Lee, DongWook Kim, Yong-Lae Park, and Songhwai Oh, "Learning to Walk a Tripod Mobile Robot Using Nonlinear Soft Vibration Actuators with Entropy Adaptive Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2317-2324, Apr. 2020. [Supplementary Material | Video]
- 04/17: Inverse reinforcement learning (IRL), GPIRL
Week 8
- 04/22: Maximum entropy IRL
  - Ziebart, Brian D., Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. "Maximum Entropy Inverse Reinforcement Learning." In AAAI, vol. 8, pp. 1433-1438. 2008.
  - Bagnell, J. Andrew, Nathan Ratliff, and Martin Zinkevich. "Maximum margin planning." In Proceedings of the International Conference on Machine Learning (ICML). 2006.
- 04/24: GAN, GAIL, MCTEIL
  - Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec. 2014.
  - J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2016. [arXiv]
  - Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Maximum Causal Tsallis Entropy Imitation Learning", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018. [Supplementary Material | arXiv preprint]
Week 9:
- 04/29: Safe Reinforcement Learning
  - Dohyeong Kim and Songhwai Oh, "TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2621-2628, Apr. 2022.
  - Dohyeong Kim and Songhwai Oh, "Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk," IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7644-7651, Jul. 2022.
  - Garcıa, Javier, and Fernando Fernández. "A comprehensive survey on safe reinforcement learning." Journal of Machine Learning Research 16.1 (2015): 1437-1480.
- 05/01: Offline Reinforcement Learning
  - Fujimoto, Scott, David Meger, and Doina Precup. "Off-policy deep reinforcement learning without exploration." ICML, 2019.
  - Kumar, Aviral, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. "Stabilizing off-policy Q-learning via bootstrapping error reduction." NeurIPS, 2019.
  - Kumar, Aviral, Aurick Zhou, George Tucker, and Sergey Levine. "Conservative Q-learning for offline reinforcement learning." NeurIPS 2020.
  - Yang, Yiqin, Xiaoteng Ma, Li Chenghao, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. "Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning." NeurIPS 2021.
  - Sinha, Samarth, Ajay Mandlekar, and Animesh Garg. "S4RL: Surprisingly simple self-supervision for offline reinforcement learning in robotics." Conference on Robot Learning (CoRL), 2021.
  - Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. "MOReL: Model-based offline reinforcement learning." NeurIPS 2020.
  - Chen, Lili, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. "Decision transformer: Reinforcement learning via sequence modeling." NeurIPS 2021.
  - Mandlekar, Ajay, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, and Li Fei-Fei. “GTI: Learning to generalize across long-horizon tasks from human demonstrations." Robotics: Science and Systems (RSS), 2020.
Week 10: Paper Presentation
- 05/06: (Holiday)
- 05/08:
  - Will Dabney, Mark Rowland, Marc Bellemare, and Rémi Munos. "Distributional Reinforcement Learning with Quantile Regression," in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2018.
  - Yecheng Jason Ma, Dinesh Jayaraman, and Osbert Bastani. "Conservative Offline Distributional Reinforcement Learning," in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021.
  - Daniel Wontae Nam, Younghoon Kim, and Chan Y. Park. "GMAC: A Distributional Perspective on Actor-Critic Framework," in Proceedings of the International Conference on Machine Learning (ICML), 2021.
Week 11: Paper Presentation
- 05/13:
  - Michael Janner, Qiyang Li, and Sergey Levine. "Offline Reinforcement Learning as One Big Sequence Modeling Problem,"in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021.
  - Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine. "Planning with Diffusion for Flexible Behavior Synthesis," in Proceedings of the International Conference on Machine Learning (ICML), 2022.
  - Ilya Kostrikov, Ashvin Nair, and Sergey Levine. "Offline Reinforcement Learning with Implicit Q-Learning," in Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- 05/15: (Holiday)
Week 12: Paper Presentation
- 05/20:
  - Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, and Animesh Garg. "Conservative Safety Critics for Exploration," in Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  - Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, and Arthur Guez. "COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation," in Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- 05/22:
  - Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, and Ding Zhao. "Constrained Variational Policy Optimization for Safe Reinforcement Learning," in Proceedings of the International Conference on Machine Learning (ICML), 2022.
  - Dohyeong Kim,
    
    Kyungjae Lee,
    and
    Songhwai Oh.
    "Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints,"
    in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2023.
Week 13: Paper Presentation
- 05/27:
  - Hao Liu and Pieter Abbeel. "Behavior From the Void: Unsupervised Active Pre-Training," in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021.
  - Hao Liu and Pieter Abbeel. "APS: Active Pretraining with Successor Features," in Proceedings of the International Conference on Machine Learning (ICML), 2021.
- 05/29:
  - Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. "Diversity is All You Need: Learning Skills without a Reward Function," in Proceedings of the International Conference on Learning Representations (ICLR), 2019.
  - Seohong Park, Jongwook Choi, Jaekyeom Kim, Honglak Lee, and Gunhee Kim. "Lipschitz-constrained Unsupervised Skill Discovery," in Proceedings of the International Conference on Learning Representations (ICLR), 2022.
Week 14: Paper Presentation
- 06/03:
  - Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Matthieu Geist, and Stefano Ermon. "IQ-Learn: Inverse soft-Q Learning for Imitation," in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021.
  - Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments," in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2017.
  - Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning," in Proceedings of the International Conference on Machine Learning (ICML), 2018.
- 06/05:
  - Kimin Lee, Laura Smith, and Pieter Abbeel. "PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training," in Proceedings of the International Conference on Machine Learning (ICML), 2021.
  - Xinran Liang, Katherine Shu, Kimin Lee, and Pieter Abbeel. "Reward Uncertainty for Exploration in Preference-based Reinforcement Learning," in Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  - Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee. "Preference Transformer: Modeling Human Preferences using Transformers for RL," in Proceedings of the International Conference on Learning Representations (ICLR), 2023.
Week 15:
- 06/10:
- 06/12: Poster Presentation

References

Reinforcement Learning: An Introduction (2018, 2nd Edition) Richard S. Sutton, Andrew G. Barto
Artificial Intelligence: A Modern Approach (4th edition), Stuart Russell and Peter Norvig, Prentice Hall, 2022. (AIMA Website)
Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, The MIT Press, 2006.

Prerequisites

(430.457) Introduction to Intelligent Systems (지능시스템개론).
Also requires strong background in algorithms, linear algebra, probability, and programming.