Seven papers from RLLAB are accepted to IROS 2022


Following papers are accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022):

  • SafeTAC: Safe Tsallis Actor-Critic Reinforcement Learning for Safer Exploration by Dohyeong Kim, Jaeseok Heo, and Songhwai Oh
    • Abstract: Satisfying safety constraints is the top priority in safe reinforcement learning (RL). However, without proper exploration, an overly conservative policy such as freezing at the same position can be generated. To this end, we utilize maximum entropy RL methods for exploration. In particular, an RL method with Tsallis entropy maximization, called Tsallis actor-critic (TAC), is used to synthesize policies which can explore with more promising actions. In this paper, we propose a Tsallis entropy-regularized safe RL method for safer exploration, called SafeTAC. For more expressiveness, we extend the TAC to use a Gaussian mixture model policy, which improves the safety performance. In addition, to stabilize the training process, the retrace estimators for safety critics are formulated and a safe policy update rule using a trust region method is proposed. SafeTAC is evaluated in simulation and real-world experiments, showing high returns from safer exploration while satisfying safety constraints.
    • Video
  • Towards Defensive Autonomous Driving: Collecting and Probing Driving Demonstrations of Mixed Qualities by Jeongwoo Oh, Gunmin Lee, Jeongeun Park, Wooseok Oh, Jaeseok Heo, Hojun Chung, Do Hyung Kim, Byungkyu Park, Chang-Gun Lee, Sungjoon Choi, and Songhwai Oh
    • Abstract: Designing or learning an autonomous driving policy is undoubtedly a challenging task as the policy has to maintain its safety in all corner cases. In order to secure safety in autonomous driving, the ability to detect hazardous situations, which can be seen as an out-of-distribution (OOD) detection problem, becomes crucial. However, conventional datasets often only contain expert driving demonstrations, although some non-expert or uncommon driving behavior data are needed to implement a safety guaranteed autonomous driving platform. To this end, we present a dataset called the R3 Driving Dataset, composed of driving data with different qualities. The dataset categorizes abnormal driving behaviors into eight categories and 369 different detailed situations. The situations include dangerous lane changes and near-collision situations. To further enlighten how these abnormal driving behaviors can be detected, we utilize different uncertainty estimation and anomaly detection methods for the proposed dataset. From the results of the proposed experiment, it can be inferred that by using both uncertainty estimation and anomaly detection, most of the abnormal cases in the proposed dataset can be discriminated. The dataset of this paper can be downloaded from
    • Video
  • Safety Guided Policy Optimization by Dohyeong Kim, Yunho Kim, Kyungjae Lee, and Songhwai Oh
    • Abstract: In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. While a number of safe RL methods have been introduced, it is difficult to develop a method that can be applied to real robots with various dynamics while satisfying the constraints as quickly as possible. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.
    • Video
  • Grasp Planning for Occluded Objects in a Confined Space with Lateral View Using Monte Carlo Tree Search by Minjae Kang, Hogun Kee, Junseok Kim, and Songhwai Oh
    • Abstract: In the lateral access environment, the robot behavior should be planned considering surrounding objects and obstacles because object observation directions and approach angles are limited. To safely retrieve a partially occluded target object in these environments, we have to relocate objects using prehensile actions to create a collision-free path for the target. We propose a learning-based method for object rearrangement planning applicable to objects of various types and sizes in the lateral environment. We plan the optimal rearrangement sequence by considering both collisions and approach angles at which objects can be grasped. The proposed method finds the grasping order through Monte Carlo tree search, significantly reducing the tree search cost using point cloud states. We process depth images into point cloud observations and transform and merge them to create new observations required for tree search. In the experiment, the proposed method shows the best and most stable performance in various scenarios compared to the existing TAMP methods. In addition, we confirm that the proposed method trained in simulation can be easily applied to a real robot without additional fine-tuning, showing the robustness of the proposed method.
    • Video
  • RIANet: Road Graph and Image Attention Network for Urban Autonomous Driving by Timothy Ha, Jeongwoo Oh, Hojun Chung, Gunmin Lee, and Songhwai Oh
    • Abstract: In this paper, we present a novel autonomous driving framework, called a road graph and image attention network (RIANet), which computes the attention scores of objects in the image using the road graph feature. The process of the proposed method is as follows: First, the feature encoder module encodes the road graph, image, and additional features of the scene. The attention network module then incorporates the encoded features and computes the scene context feature via the attention mechanism. Finally, the low-level controller module drives the ego-vehicle based on the scene context feature. In the experiments, we use an urban scene driving simulator named CARLA to train and test the proposed method. The results show that the proposed method outperforms existing autonomous driving methods.
    • Video
  • Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk by Dohyeong Kim and Songhwai Oh
    • Abstract: This paper aims to solve a safe reinforcement learning (RL) problem with risk measure-based constraints. As risk measures, such as conditional value at risk (CVaR), focus on the tail distribution of cost signals, constraining risk measures can effectively prevent a failure in the worst case. An on-policy safe RL method, called TRC, deals with a CVaR constrained RL problem using a trust region method and can generate policies with almost zero constraint violations with high returns. However, to achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient. To this end, we propose an off-policy safe RL method with CVaR constraints, called off-policy TRC. If off-policy data from replay buffers is directly used to train TRC, the estimation error caused by the distributional shift results in performance degradation. To resolve this issue, we propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers. The proposed method has been evaluated in simulation and real-world environments and satisfied safety constraints within a few steps while achieving high returns even in complex robotic tasks.
    • Video

  • Unsupervised 3D Link Segmentation of Articulated Objects with a Mixture of Coherent Point Drift by JaeGoo Choy, Geonho Cha, and Songhwai Oh
    • Abstract: In this paper, we address the 3D link segmentation problem of articulated objects using multiple point sets with different configurations. We are motivated by the fact that a point set of an object can be aligned to point sets with different configurations by applying rigid transformations to links. Since existing 3D part segmentation datasets are annotated based on the perspective of a human, we propose a novel dataset of articulated objects, which are annotated based on its kinematic models. We define the point set alignment process as a probability density estimation problem and find the optimal decomposition of the point set and deformations using the EM algorithm. In addition, to improve the segmentation performance, we propose a regularization loss designed with a physical prior of decomposition. We evaluate the proposed method on our dataset, demonstrating that the proposed method achieves the state-of-the-art performance compared to baseline methods. Finally, we also propose an effective target manipulating point proposer, which can be applied to collect multiple point sets from an unknown object with different configurations to better solve the 3D link segmentation problem.
    • Video