Robot Learning (M2608.002700) Spring 2026

Instructor: Prof. Songhwai Oh (오성회) Email: songhwai (at) snu.ac.kr Office Hours: Friday 3:00-4:00PM Office: Building 133 Room 503	Course Number: M2608.002700 Time: M/W 3:30-4:45PM Location: Building 301 Room 106
TA: Geunje Cheon (천근제) Email: geunje.cheon (at) rllab.snu.ac.kr Office: Building 133 Room 610

Course Description

An intelligent robotic agent is required to be able to adapt and learn from new environments, tasks, and situations through interactions with its surroundings. Robot learning is a field that combines robotics and artificial intelligence (AI) to develop technologies for robotic agents that can acquire new skills and knowledge through experience, similar to humans. Robot learning encompasses various techniques and approaches that enable robots to become more autonomous and capable of performing tasks without explicit programming for all possible scenarios. In this course, we will review recent advances in robot learning, including imitation learning and deep reinforcement learning. We will first review Markov decision processes (MDP) and reinforcement learning. Then we will discuss recent developments in imitation learning, deep learning, and deep reinforcement learning, including topics such as behavior cloning, inverse reinforcement learning, policy gradient, deep Q-network (DQN), generative adversarial imitation learning, maximum entropy reinforcement learning, safe reinforcement learning, and offline reinforcement learning. Recent advances in robot foundation models will be discussed as well. This is an advanced graduate course and substantial reading and programming projects will be assigned. Students are expected to participate actively in class. Lectures will be in English.

Announcements

Project Schedule

[05/06] Project proposal (2 pages, 2 columns; no Abstract; IEEE double-column format)

[06/03] Project summary: submit (1) Title and (2) Abstract to TA

[06/08] Project video: submit one minute presentation video to TA

[06/10] Project presentation and poster session

[06/12] Project report (minimum 6 pages, 2 columns; IEEE double-column format)

List of Deep Reinforcement Learning Papers

[02/23] Please read Ethics of Learning.

Schedule

Week 1:

03/03: Introduction

Ch. 3 from Reinforcement Learning: An Introduction; Ch. 16 from AIMA

Week 2:

03/09: Review on MDPs and POMDPs

Deep Learning Short Course

Ch. 3 from Reinforcement Learning: An Introduction; Ch. 16 from AIMA

03/11: Reinforcement learning

Ch. 4, Ch. 6 from Reinforcement Learning: An Introduction; Ch. 23 from AIMA

Week 3:

HW1: Deep learning tutorial (03/16 ~ 03/23 23:59KST)

03/16: Review of probability theory; Gaussian process regression

Gaussian process regression (Ch. 2 from Gaussian Processes for Machine Learning)

Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.

03/18: Behavior cloning (leveraged Gaussian process regression)

Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.

Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.

Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.

Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.

Week 4:

HW2: Imitation learning (03/23 ~ 03/30 23:59KST)

03/23: DAgger, Behavior cloning applications

S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.

Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667. [Project Page with Datasets]

Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). [Project Page with Code and Datasets]

03/25: Deep Q Learning

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.

H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb. 2016.

Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2016.

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.

Week 5:

HW3: Deep Q learning (03/30 ~ 04/06 23:59KST)

03/30: Sparse MDPs

Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]

04/01: Policy gradient

R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.

R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.

Baxter, Jonathan, and Peter L. Bartlett. "Reinforcement learning in POMDP's via direct gradient ascent." ICML. 2000.

J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319--350, 2001.

Week 6:

HW4: Proximal policy optimization (04/06 ~ 04/13 23:59KST)

04/06: GPS, TRPO, PPO

J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul. 2015. [arXiv]

Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002. (http://www.cs.cmu.edu/~./jcl/papers/aoarl/Final.pdf)

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017

Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2013.

04/08: Actor-critic

Mnih, Volodymyr, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. "Asynchronous methods for deep reinforcement learning." In International Conference on Machine Learning, pp. 1928-1937. 2016.

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.

Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

Week 7:

HW5: Deep deterministic policy gradient (04/13 ~ 04/20 23:59KST)

04/13: Maximum entropy RL

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug. 2018.

Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]

Jae In Kim, Mineui Hong, Kyungjae Lee, DongWook Kim, Yong-Lae Park, and Songhwai Oh, "Learning to Walk a Tripod Mobile Robot Using Nonlinear Soft Vibration Actuators with Entropy Adaptive Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2317-2324, Apr. 2020. [Supplementary Material | Video]

04/15: Inverse reinforcement learning (IRL), GPIRL

A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun. 2000.

Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." In Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, 2004. (supplementary)

S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with Gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec. 2011.

Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.

Week 8

04/20: Maximum entropy IRL; GAN, GAIL, MCTEIL

Ziebart, Brian D., Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. "Maximum Entropy Inverse Reinforcement Learning." In AAAI, vol. 8, pp. 1433-1438. 2008.

Bagnell, J. Andrew, Nathan Ratliff, and Martin Zinkevich. "Maximum margin planning." In Proceedings of the International Conference on Machine Learning (ICML). 2006.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec. 2014.

J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2016. [arXiv]

Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Maximum Causal Tsallis Entropy Imitation Learning", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018. [Supplementary Material | arXiv preprint]

04/22: Safe Reinforcement Learning

Dohyeong Kim and Songhwai Oh, "TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2621-2628, Apr. 2022.

Dohyeong Kim and Songhwai Oh, "Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk," IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7644-7651, Jul. 2022.

Dohyeong Kim, Kyungjae Lee, and Songhwai Oh, "Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints," in Proc. of Neural Information Processing Systems (NeurIPS), Dec. 2023.

Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, and Songhwai Oh, "Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees," in Proc. of Neural Information Processing Systems (NeurIPS), Dec. 2024.

Garcıa, Javier, and Fernando Fernández. "A comprehensive survey on safe reinforcement learning." Journal of Machine Learning Research 16.1 (2015): 1437-1480.

Week 9

04/27: Offline Reinforcement Learning

Fujimoto, Scott, David Meger, and Doina Precup. "Off-policy deep reinforcement learning without exploration." ICML, 2019.

Kumar, Aviral, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. "Stabilizing off-policy Q-learning via bootstrapping error reduction." NeurIPS, 2019.

Kumar, Aviral, Aurick Zhou, George Tucker, and Sergey Levine. "Conservative Q-learning for offline reinforcement learning." NeurIPS 2020.

Yang, Yiqin, Xiaoteng Ma, Li Chenghao, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. "Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning." NeurIPS 2021.

Sinha, Samarth, Ajay Mandlekar, and Animesh Garg. "S4RL: Surprisingly simple self-supervision for offline reinforcement learning in robotics." Conference on Robot Learning (CoRL), 2021.

Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, and Thorsten Joachims. "MOReL: Model-based offline reinforcement learning." NeurIPS 2020.

Chen, Lili, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. "Decision transformer: Reinforcement learning via sequence modeling." NeurIPS 2021.

Michael Janner, Qiyang Li, and Sergey Levine, "Offline Reinforcement Learning as One Big Sequence Modeling Problem", Advances in Neural Information Processing Systems (NeurIPS), 2021.

Mandlekar, Ajay, Danfei Xu, Roberto Martín-Martín, Silvio Savarese, and Li Fei-Fei. “GTI: Learning to generalize across long-horizon tasks from human demonstrations." Robotics: Science and Systems (RSS), 2020.

04/29: Vision-Language-Action Models

A. Brohan, et al., “RT-1: Robotics Transformer for Real-World Control at Scale”, in Proc. of Robotics: Science and Systems (RSS), Jul, 2023.

A. Brohan, et al., “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control,” in Proc. of the Conference on Robot Learning (CoRL), Nov, 2023.

A. O’Neill, et al., “Open X-Embodiment: Robotic Learning Datasets and RT-X Models,” in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May, 2024.

D. Ghosh, et al., “Octo: An Open-Source Generalist Robot Policy,” in Proc. of Robotics: Science and Systems (RSS), Jul, 2024.

M. Kim, et al., “OpenVLA: An Open-Source Vision-Language-Action Model,” in Proc. of the Conference on Robot Learning (CoRL), Nov, 2024.

Q. Li, et al., “CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation,” arXiv:2411.19650, 2024.

K. Black, et al., “π_0: A Vision-Language-Action Flow Model for General Robot Control,” in Proc. of the Conference on Robot Learning (CoRL), Sep, 2025.

NVIDIA, “GR00T N1: An Open Foundation Model for Generalist Humanoid Robots,” arXiv:2503.14734, 2025.

Q. Zhao, et al., “CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun, 2025.

X. Li, et al., “What matters in Building Vision-Language-Action Models for Generalist Robots,” Nature Machine Intelligence, vol. 8, pp.158-172, 2026.

NVIDIA, “Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalization Autonomous Driving in the Long Tail,” arXiv:2511.00088, 2026

Week 10: Paper Presentation

05/04: Imitation Learning

Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, and Shao-Hua Sun, "Diffusion-Reward Adversarial Imitation Learning", Advances in Neural Information Processing Systems (NeurIPS), 2024.

Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, and Georg Martius, "Zero-Shot Offline Imitation Learning via Optimal Transport", in Proc. of the International Conference on Machine Learning (ICML), 2025.

05/06: RL Algorithm

Allen Z. Ren, Justin Lidard, Lars L. Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Benjamin Burchfiel, Hongkai Dai, and Max Simchowitz, "Diffusion Policy Policy Optimization", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Physical Intelligence et al., "$π^{*}_{0.6}$: a VLA That Learns From Experience", arXiv preprint, 2025.

Week 11: Paper Presentation

05/11: Offline RL

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, and Martin Riedmiller, "Offline Actor-Critic Reinforcement Learning Scales to Large Models", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Huayu Chen, Kaiwen Zheng, Hang Su, and Jun Zhu, "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control", Advances in Neural Information Processing Systems (NeurIPS), 2024.

05/13: Offline RL / Safe RL

Shiyuan Zhang, Weitong Zhang, and Quanquan Gu, "Energy-Weighted Flow Matching for Offline Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Yarden As, Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Stelian Coros, and Andreas Krause, "ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Week 12: Paper Presentation

05/18: Safe RL / Distributional RL

Dohyeong Kim, Mineui Hong, Jeongho Park, and Songhwai Oh, "Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bîrsan, and Martin Slowik, "ADDQ: Adaptive Distributional Double Q-Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

05/20: Unsupervised RL

Seohong Park, Tobias Kreiman, and Sergey Levine, "Foundation Policies with Hilbert Representations", in Proc. of the International Conference on Machine Learning (ICML), 2024.

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, and Matteo Pirotta, "Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

Week 13: Paper Presentation

05/25: (Hoilday)

05/27: Unsupervised RL / Multi-Agent RL

Marco Bagatella, Matteo Pirotta, Ahmed Touati, Alessandro Lazaric, and Andrea Tirinzoni, "TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2026.

Yueheng Li, Guangming Xie, and Zongqing Lu, "Revisiting Cooperative Off-Policy Multi-Agent Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), 2025.

Week 14: Paper Presentation

06/01: RLHF

Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Tangjie Lv, Changjie Fan, and Zhipeng Hu, "AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Teng Xiao, Yige Yuan, Mingxiao Li, Zhengyu Chen, and Vasant G. Honavar, "On a Connection Between Imitation Learning and RLHF", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

06/03: (Holiday)

Week 15

06/08: Model-Based RL

Nicklas Hansen, Hao Su, and Xiaolong Wang, "TD-MPC2: Scalable, Robust World Models for Continuous Control", in Proc. of the International Conference on Learning Representations (ICLR), 2024.

Kwanyoung Park and Youngwoon Lee, "Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning", in Proc. of the International Conference on Learning Representations (ICLR), 2025.

06/10: Poster Presentation

References

Reinforcement Learning: An Introduction (2018, 2nd Edition) Richard S. Sutton, Andrew G. Barto

Artificial Intelligence: A Modern Approach (4th edition), Stuart Russell and Peter Norvig, Prentice Hall, 2022. (AIMA Website)

Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, The MIT Press, 2006.

Prerequisites

(430.457) Introduction to Intelligent Systems (지능시스템개론).

Also requires strong background in algorithms, linear algebra, probability, and programming.