Robot Learning (M2608.002700) Spring 2025

Instructor: Prof. Songhwai Oh (오성회) Email: songhwai (at) snu.ac.kr Office Hours: Friday 3:00-4:00PM Office: Building 133 Room 503	Course Number: M2608.002700 Time: M/W 3:30-4:45PM Location: Building 301 Room 106
TA: Hojun Chung (정호준) Email: hojun.chung (at) rllab.snu.ac.kr Office: Building 133 Room 610

Course Description

An intelligent robotic agent is required to be able to adapt and learn from new environments, tasks, and situations through interactions with their surroundings. Robot learning is a field that combines robotics and artificial intelligence (AI) to develop technologies for robotic agents which can acquire new skills and knowledge through experience, similar to humans. Robot learning includes various techniques and approaches to make robots more autonomous and capable of performing tasks without explicit programming for all possible scenarios. In this course, we will review recent advances in robot learning, including imitation learning and deep reinforcement learning. We will first review Markov decision processes (MDP) and reinforcement learning. Then we will discuss recent developments in imitation learning, deep learning, and deep reinforcement learning, including topics such as behavior cloning, inverse reinforcement learning, policy gradient, deep Q-network (DQN), generative adversarial imitation learning, maximum entropy reinforcement learning, safe reinforcement learning, and offline reinforcement learning. This is an advanced graduate course and substantial reading and programming projects will be assigned. Students are expected to participate actively in class. Lectures will be in English.

Announcements

Project Page

Project Schedule

[05/07] Project proposal (2 pages, 2 columns; no Abstract; IEEE double-column format)

[06/04] Project summary: submit (1) Title and (2) Abstract to TA

[06/09] Project video: submit one minute presentation video to TA

[06/11] Project presentation and poster session

[06/13] Project report (minimum 6 pages, 2 columns; IEEE double-column format)

List of Deep Reinforcement Learning Papers

[03/03] Please read Ethics of Learning.

Schedule

Week 1:

03/05: Introduction

Ch. 3 from Reinforcement Learning: An Introduction; Ch. 16 from AIMA

Week 2:

03/10: Review on MDPs and POMDPs

Deep Learning Short Course

Ch. 3 from Reinforcement Learning: An Introduction; Ch. 16 from AIMA

03/12: Reinforcement learning

Ch. 4, Ch. 6 from Reinforcement Learning: An Introduction; Ch. 23 from AIMA

Week 3:

03/17: Review of probability theory; Gaussian process regression

Gaussian process regression (Ch. 2 from Gaussian Processes for Machine Learning)

Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.

03/19: Behavior cloning (leveraged Gaussian process regression)

Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.

Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.

Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.

Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.

Week 4:

03/24: DAgger, Behavior cloning applications

S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.

Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667. [Project Page with Datasets]

Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). [Project Page with Code and Datasets]

03/26: Deep Q Learning

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.

H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb. 2016.

Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2016.

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.

Week 5:

03/31: Sparse MDPs

Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]

04/02: Policy gradient

R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.

R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.

Baxter, Jonathan, and Peter L. Bartlett. "Reinforcement learning in POMDP's via direct gradient ascent." ICML. 2000.

J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.

Week 6:

04/07: GPS, TRPO, PPO

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul. 2015. [arXiv]

Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002. (http://www.cs.cmu.edu/~./jcl/papers/aoarl/Final.pdf)

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017

Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2013.

04/09: Actor-critic

Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International Conference on Machine Learning, pp. 1928-1937. 2016.

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.

Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

Week 7

04/14: Maximum entropy RL

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug. 2018.

Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]

Jae In Kim, Mineui Hong, Kyungjae Lee, DongWook Kim, Yong-Lae Park, and Songhwai Oh, "Learning to Walk a Tripod Mobile Robot Using Nonlinear Soft Vibration Actuators with Entropy Adaptive Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2317-2324, Apr. 2020. [Supplementary Material | Video]

04/16: Inverse reinforcement learning (IRL), GPIRL

A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun. 2000.

Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." In Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, 2004. (supplementary)

S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with Gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec. 2011.

Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.

Week 8

04/21: Maximum entropy IRL

Ziebart, Brian D., Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. "Maximum Entropy Inverse Reinforcement Learning." In AAAI, vol. 8, pp. 1433-1438. 2008.

Bagnell, J. Andrew, Nathan Ratliff, and Martin Zinkevich. "Maximum margin planning." In Proceedings of the International Conference on Machine Learning (ICML). 2006.

04/23: GAN, GAIL, MCTEIL

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec. 2014.

J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2016. [arXiv]

Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Maximum Causal Tsallis Entropy Imitation Learning", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018. [Supplementary Material | arXiv preprint]

Week 9:

04/28: Safe Reinforcement Learning

Dohyeong Kim and Songhwai Oh, "TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2621-2628, Apr. 2022.

Dohyeong Kim and Songhwai Oh, "Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk," IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7644-7651, Jul. 2022.

Dohyeong Kim, Kyungjae Lee, and Songhwai Oh, "Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints," in Proc. of Neural Information Processing Systems (NeurIPS), Dec. 2023.

Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, and Songhwai Oh, "Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees," in Proc. of Neural Information Processing Systems (NeurIPS), Dec. 2024.

Garcıa, Javier, and Fernando Fernández. "A comprehensive survey on safe reinforcement learning." Journal of Machine Learning Research 16.1 (2015): 1437-1480.

04/30: Offline Reinforcement Learning

Fujimoto, Scott, David Meger, and Doina Precup. "Off-policy deep reinforcement learning without exploration." ICML, 2019.

Kumar, Aviral, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. "Stabilizing off-policy Q-learning via bootstrapping error reduction." NeurIPS, 2019.

Kumar, Aviral, Aurick Zhou, George Tucker, and Sergey Levine. "Conservative Q-learning for offline reinforcement learning." NeurIPS 2020.

Yang, Yiqin, Xiaoteng Ma, Li Chenghao, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. "Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning." NeurIPS 2021.

Sinha, Samarth, Ajay Mandlekar, and Animesh Garg. "S4RL: Surprisingly simple self-supervision for offline reinforcement learning in robotics." Conference on Robot Learning (CoRL), 2021.

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. "MOReL: Model-based offline reinforcement learning." NeurIPS 2020.

Chen, Lili, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. "Decision transformer: Reinforcement learning via sequence modeling." NeurIPS 2021.

Michael Janner, Qiyang Li, and Sergey Levine, "Offline Reinforcement Learning as One Big Sequence Modeling Problem", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Mandlekar, Ajay, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, and Li Fei-Fei. “GTI: Learning to generalize across long-horizon tasks from human demonstrations." Robotics: Science and Systems (RSS), 2020.

Week 10: Paper Presentation

05/05: (Holiday)

05/07: Imitation Learning

Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Matthieu Geist, and Stefano Ermon, "IQ-Learn: Inverse soft-Q Learning for Imitation", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, and Kee-Eung Kim"DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations", in Proc. of the International Conference on Learning Representations (ICLR), 2022.

Week 11: Paper Presentation

05/12: Deep RL

Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, and Sergey Levine, "Contrastive Learning as Goal-Conditioned Reinforcement Learning", Advances in Neural Information Processing Systems (NeurIPS), 2022.

Ilya Kostrikov, Ashvin Nair, and Sergey Levine, "Offline Reinforcement Learning with Implicit Q-Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2022.

05/14: Deep RL

Divyansh Garg, Joey Hejna, Matthieu Geist, and Stefano Ermon, "Extreme Q-Learning: MaxEnt RL without Entropy", in Proc. of the International Conference on Learning Representations (ICLR), 2023.

Harshit Sikchi, Qinqing Zheng, Amy Zhang, and Scott Niekum, "Dual RL: Unification and New Methods for Reinforcement and Imitation Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Week 12: Paper Presentation

05/19: Diffusion Models / Flow Matching

Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine, "Planning with Diffusion for Flexible Behavior Synthesis", in Proc. of theInternational Conference on Machine Learning (ICML), 2022.

Zhendong Wang, Jonathan J Hunt, and Mingyuan Zhou, "Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2023.

Xixi Hu, Bo Liu, Xingchao Lui, Qiang Liu, "AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies", Advances in Neural Information Processing Systems (NeurIPS), 2024.

05/21: Distributional RL

Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos,
"Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2018.

Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, and Will Dabney, "Statistics and Samples in Distributional Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2019.

Week 13: Paper Presentation

05/26: Unsupervised RL

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine, "Diversity is All You Need: Learning Skills without a Reward Function", in Proc. of theInternational Conference on Learning Representations (ICLR), 2019.

Seohong Park, Oleh Rybkin, and Sergey Levine, "METRA: Scalable Unsupervised RL with Metric-Aware Abstraction", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

05/28: Multi-Agent RL

Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Earl Hostallero, and Yung Yi, "QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2019.

Xiangsen Wang, Haoran Xu, Yinan Zheng, and Xianyuan Zhan, "Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization", Advances in Neural Information Processing Systems (NeurIPS), 2023.

Week 14: Paper Presentation

06/02: Reinforcement Learning from Human Feedback (RLHF)

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn, "Direct Preference Optimization: Your Language Model is Secretly a Reward Model", Advances in Neural Information Processing Systems (NeurIPS), 2023.

Joey Hejna and Dorsa Sadigh, "Inverse Preference Learning: Preference-based RL without a Reward Function", Advances in Neural Information Processing Systems (NeurIPS), 2023.

06/04: Curriculum RL / Environment Design

Pascal Klink, Haoyi Yang, Carlo D’Eramo, Jan Peters, and Joni Pajarinen, "Curriculum Reinforcement Learning via Constrained Optimal Transport", in Proc. of the International Conference on Machine Learning (ICML), 2022.

Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, and Songhwai Oh, "Adversarial Environment Design via Regret-Guided Diffusion Models", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Week 15: Paper & Poster Presentation

06/09: Model-Based RL

Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao, "Mastering Atari Games with Limited Data", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Yihao Sun, Jiaji Zhang, Chengxing Jia, Haoxin Lin, Junyin Ye, and Yang Yu, "Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2023.

06/11: Poster Presentation

Room 106 and 1st Floor, Building 301

References

Reinforcement Learning: An Introduction (2018, 2nd Edition) Richard S. Sutton, Andrew G. Barto

Artificial Intelligence: A Modern Approach (4th edition), Stuart Russell and Peter Norvig, Prentice Hall, 2022. (AIMA Website)

Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, The MIT Press, 2006.

Prerequisites

(430.457) Introduction to Intelligent Systems (지능시스템개론).

Also requires strong background in algorithms, linear algebra, probability, and programming.