DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models answer structured questions about the environment to guide decision making. The system includes DriveLM-Data, a dataset built on driving environments such as nuScenes and CARLA, where human-written reasoning steps connect different layers of driving tasks. This design allows models to learn relationships between objects, behaviors, and navigation decisions through graph-structured logic.
Features
- Graph visual question answering framework for driving reasoning
- Dataset supporting perception, prediction, planning, and motion tasks
- Integration with simulation environments such as nuScenes and CARLA
- Human-written reasoning annotations connecting driving subtasks
- Benchmark metrics and evaluation tools for vision-language driving models
- Baseline agents demonstrating language-driven autonomous driving