Three papers from RLLAB are accepted to IROS 2025

[2025.06.16]

Following papers are accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025):

Automatic Real-to-Sim-to-Real System through Iterative Interactions for Robust Robot Manipulation Policy Learning with Unseen Objects by Minjae Kang, Hogun Kee, Hosung Lee, and Songhwai Oh
- Abstract: Real-to-sim-to-real systems have been studied to overcome the challenges of robot policy learning in the real world by creating a virtual environment that replicates the actual workspace. However, previous studies have limitations, requiring human assistance, such as observing the workspace with a hand-held camera or manipulating objects with a hand. To solve these limitations, we propose a novel real-to-sim-to-real framework, ARIC, that performs without human help. First, ARIC observes real objects by repeatedly changing the object poses through the pre-trained robot policy via reinforcement learning. Through iterative interactions between the robot and the environment, ARIC gradually improves the accuracy of 3D object reconstruction. Next, ARIC learns task-specific robot policies in simulation using replicated objects and applies the policies to real-world scenarios without fine-tuning. We confirm that ARIC efficiently learns robotic tasks by achieving a success rate of 83.3% on average for three manipulation tasks in real-world experiments. Detailed experiment videos can be found in the attached material.
- Video
Context-Aware Multi-Agent Trajectory Transformer by Jeongho Park and Songhwai Oh
- Abstract: Transformer-based sequence models have proven effective in offline reinforcement learning for modeling agent trajectories using large-scale datasets. However, applying these models directly to multi-agent offline reinforcement learning introduces additional challenges, especially in managing complex inter-agent dynamics that arise as multiple agents interact with both their environment and each other. To overcome these issues, we propose the context-aware multi-agent trajectory transformer (COMAT), a novel model designed for offline multi-agent reinforcement learning tasks which predicts the future trajectory of each agent by incorporating the history of adjacent agents—referred to as context—into its sequence modeling. COMAT consists of three key modules: the transformer module to process input trajectories, the context encoder to extract relevant information from adjacent agents’ histories, and the context aggregator to integrate this information into the agent’s trajectory prediction process. Built upon these modules, COMAT predicts the agents’ future trajectories and actively leverages this capability as a tool for planning, enabling the search for optimal actions in multi-agent environments. We evaluate COMAT on multi-agent MuJoCo and StarCraft Multi-Agent Challenge tasks, on which it demonstrates superior performance compared to existing baselines.
- Video
Language-Guided Hierarchical Planning with Scene Graphs for Tabletop Object Rearrangement by Wooseok Oh, Hogun Kee and Songhwai Oh
- Abstract: Spatial relationships between objects are key to achieving well-arranged scenes. In this paper, we address the robotic rearrangement task by leveraging these relationships to reach configurations that are both well-arranged and satisfying the given language goal. We propose a hierarchical planning framework that bridges the gap between abstract language inputs and concrete robotic actions. A scene graph is central to this approach, serving as both an intermediate representation and the state for high-level planning, capturing the relationships among objects effectively and reducing planning complexity. This also enables the proposed method to handle more general language goals. To achieve this, we leverage a large language model (LLM) to convert language goals into a scene graph, which becomes the goal for high-level planning. In high-level planning, we plan transitions from the current scene graph to the goal scene graph. To integrate high-level and low-level planning, we introduce a network that generates a physical configuration of objects from a scene graph. Low-level planning then verifies the high-level plan’s feasibility, ensuring it can be executed through robotic manipulation. Through experiments, we show that the proposed method handles general language goals effectively and produces human-preferred rearrangements compared to other approaches, demonstrating its applicability on real robots without requiring sim-to-real adjustments.
- Video