Two papers from RLLAB are accepted to IROS 2023

[2023.06.22]

Following papers are accepted to the I EEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023):

Object Rearrangement Planning for Target Retrieval in a Confined Space with Lateral View by Minjae Kang, Junseok Kim, Hogun Kee, and Songhwai Oh
- Abstract: In this paper, we perform an object rearrangement task for target retrieval in an environment with a confined space and limited observation directions. The agent must create a collision-free path to bring out the target object by relocating the surrounding objects using the prehensile action, i.e., pick-and-place. Object rearrangement in a confined space is a non-monotone problem, and finding a valid plan within a reasonable time is challenging. We propose a novel algorithm that divides the target retrieval task, which requires a long sequence of actions, into sequential sub-problems and explores each solution through subgoal-conditioned Monte Carlo tree search (MCTS). In the experiment, we verify that the proposed algorithm can find safe rearrangement plans with various objects efficiently compared to the existing planning methods. Furthermore, we show that the proposed method can be transferred to a real robot experiment without additional training.
- Video

Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning by Junseo Lee, Jaeseok Heo, Dohyeong Kim, Gunmin Lee, and Songhwai Oh
- Abstract: Satisfying safety constraints in reinforcement learning (RL) is an important issue, especially in real-world applications. Many studies have approached safe RL with the Lagrangian method, which introduces dual variables. The dual variables represent a weighted balance between safety and performance. However, since the optimal value of the dual variable depends on the environmental setting, such as the obstacle density, applying a trained policy with the optimal dual variable to a new environment can be hazardous. To this end, we propose a new framework, dual variable actor-critic (DVAC), that solves the safe RL problem by simultaneously training a single policy over different safety levels.We introduce a universal policy and universal Q-function, which have a dual variable as an argument. Then, we extend the soft actor-critic so that the universal policy is guaranteed to converge to the Pareto optimal policy sets. We can control the balance between safety and performance after training by choosing an appropriate value for the dual variable under the proposed safe RL framework. We evaluate the proposed method in simulation and real-world environments. The proposed method learns a universal policy that ranges from extremely safe to high performance according to the dual variables. The universal policy learned with the proposed method is nearly Pareto optimal compared to policies learned with the baseline methods. In addition, the agent is able to adapt to environments with unseen state distributions by identifying a suitable dual variable using the proposed method.
- Video