[Invited Talk] Vision-Language Guided Program Search for Language-Conditioned Diffusion Driving Policies

Presenter: Mineui Hong, Carnegie Mellon University; Time: 4:30pm, Monday (2025/12/22); Location: Building 133 Room 205

Abstract

We present a scalable framework for constructing a large, language-annotated driving dataset by combining vision-language models (VLMs) with a diffusion-based planner. The VLM generates reward functions for maneuver-level natural language commands, and the diffusion planner uses these rewards to autonomously produce diverse trajectories that represent the intended behaviors. Through iterative generation and curation, the framework yields a rich set of language-annotated trajectories without manual labeling. Using this synthetic dataset, we train both maneuver-specific planners and a general language-conditioned driving agent capable of executing complex instructions. Experiments show that our synthetic data improves performance on language-following tasks, demonstrating the effectiveness of the approach for scalable data generation and autonomous driving policy learning.

Biography

Mineui Hong is a Postdoctoral Researcher at the Robotics Institute, Carnegie Mellon University. He received his Ph.D. and B.S. degrees in Electrical and Computer Engineering from Seoul National University under the supervision of Professor Songhwai Oh. His research focuses on learning-based approaches for robotics, especially on latent dynamics modeling, visual planning, and reinforcement learning for robotic control.