# Impromptu-VLA **Repository Path**: flashdxy/Impromptu-VLA ## Basic Information - **Project Name**: Impromptu-VLA - **Description**: No description available - **Primary Language**: Unknown - **License**: CC-BY-SA-4.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-06-04 - **Last Updated**: 2025-06-06 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Impromptu-VLA This repository contains the code for the following work: > Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models ## [ProjectPage](http://Impromptu-VLA.c7w.tech/) Haohan Chi*,¹, [Huan-ang Gao*,¹](https://c7w.tech/), Ziming Liu†,², Jianing Liu¹, Chenyu Liu¹, Jinwei Li¹, Kaisen Yang¹, Yangcheng Yu¹, Zeda Wang¹, Wenyi Li¹, Leichen Wang², Xingtao Hu², Hao Sun², [Hang Zhao³](https://hangzhaomit.github.io/), [Hao Zhao¹,†](https://sites.google.com/view/fromandto/) ¹AIR, Tsinghua University, ²Bosch Research, ³IIIS, Tsinghua University, *Equal contribution, †Corresponding author

License arXiv Dataset Project Page

## Introductory Video Our dataset can be accessed at [huggingface](https://huggingface.co/datasets/aaaaaap/unstructed) If you want to create our benchmark QA data from scratch: 1. First, organize the data download based on `data_raw`. 2. Parse the data according to the code and instructions in the folder (for the `waymo` and `mapillary_sls` datasets). 3. Enter the main directory.Create a symbolic link for `navsim`: ```bash ln -s /data_raw/navsim /data_qa_generate/data_engine/data_storage/external_datasets/navsim ``` 4. After the data is successfully organized, run the following script: ```bash bash scripts/data_qa_generate.sh ``` --- ### ✨ Environment Configuration We leverage some powerful open-source libraries to make this project shine. To ensure a smooth experience, please configure your environment by referring to their official documentation. Here are the key players: * **sglang**: Your go-to for efficient large language model serving. Check out their setup guide here: [sglang](https://github.com/sgl-project/sglang) ✨ * **LLaMA-Factory**: A comprehensive and user-friendly framework for fine-tuning large language models. Dive into their documentation for installation details: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) 🛠️ * **vLLM**: For high-throughput and low-latency inference. Find out how to get it running here: [vllm](https://github.com/vllm-project/vllm) ⚡ **Pro Tip:** We highly recommend creating a dedicated virtual environment (using tools like `conda` or `venv`) to manage the dependencies for this project. This helps keep your workspace clean and avoids conflicts with other Python projects. Happy configuring! 👩‍💻

📊 Results

Open-loop trajectory prediction L2 errors (m) on the nuScenes dataset.
Method 1s 2s 3s Avg.
Closed-source API-only Models
GPT-4o1 0.28 0.93 2.02 1.07
Claude-3.5-Sonnet1 0.29 0.98 2.12 1.13
Claude-3.7-Sonnet1 0.28 0.94 2.04 1.09
Gemini-2.0-Flash1 0.31 1.08 2.36 1.25
Gemini-2.5-Pro1 0.37 1.35 2.96 1.56
Open-source Generalist VLMs
LLaVA-1.6-Mistral-7B2 1.49 3.38 4.09 2.98
Llama-3.2-11B-Vision-Instruct2 1.54 3.31 3.91 2.92
Qwen2-VL-7B-Instruct2 1.45 3.21 3.76 2.81
DeepSeek-VL2-16B1 0.66 1.68 2.92 1.75
DeepSeek-VL2-28B1 0.37 1.35 2.96 1.56
LLaMA-3.2-11B-Vision-Instruct1 0.52 1.42 2.68 1.54
LLaMA-3.2-90B-Vision-Instruct1 0.66 1.71 3.01 1.79
Qwen-2.5-VL-7B-Instruct1 0.46 1.33 2.55 1.45
Training-based Driving Specialists (Existing Methods)
UniAD3 0.42 0.64 0.91 0.66
VAD3 0.17 0.34 0.60 0.37
BEV-Planner3 0.16 0.32 0.57 0.35
Ego-MLP3* 0.15 0.32 0.59 0.35
Ours and Key Competitors (Specialized Driving Models)
DriveVLM3 0.18 0.34 0.68 0.40
OmniDrive3 0.14 0.29 0.55 0.33
DriveVLM-Dual3 0.15 0.29 0.48 0.31
EMMA (random init)3 0.15 0.33 0.63 0.37
EMMA3 0.14 0.29 0.54 0.32
EMMA+3 0.13 0.27 0.48 0.29
3B Base+nuScenes 0.14 0.30 0.58 0.34
3B Base+Impromptu+nuScenes 0.13 0.27 0.52 0.30
7B Base+nuScenes 0.13 0.28 0.55 0.32
7B Base+Impromptu+nuScenes 0.13 0.27 0.53 0.30
Note: Best results within each category are in bold, second best are underlined. 1 from LightEMMA, 2 from OpenEMMA, 3 from EMMA.
Results on NeuroNCAP
Source Method NeuroNCAP Score ↑ Collision rate (%) ↓
Avg. Stat. Frontal Side Avg. Stat. Frontal Side
CVPR 2023 UniAD2 0.73 0.84 0.10 1.26 88.6 87.8 98.4 79.6
ICCV 2023 VAD2 0.66 0.47 0.04 1.45 92.5 96.2 99.6 81.6
ICRA 2025 SparseDrive1 0.92 - - - 93.9 - - -
CVPR 2025 BridgeAD-S1 1.52 - - - 76.2 - - -
CVPR 2025 BridgeAD-B1 1.60 - - - 72.6 - - -
- Base+nuScenes 1.77 1.80 1.67 1.75 72.5 68.0 73.0 71.5
- Base+Impromptu+nuScenes 2.15 1.77 2.31 2.10 65.5 70.0 59.0 65.0
Note: Best scores in each category are in bold, second best are underlined. 1 from BridgeAD, 2 from NeuRAD
The improvements in both the overall NeuroNCAP score and, crucially, the reduction in collision rates suggest that our dataset helps the model develop a more nuanced understanding of complex road interactions, leading to more robust and safer driving policies.
### 📥 Download Pre-trained Models
Pre-trained Models Download Links
Method Download
3B Base+nuScenes HF Hub
3B Base+Impromptu HF Hub
3B Base+Impromptu+nuScenes HF Hub
7B Base+nuScenes HF Hub
7B Base+Impromptu HF Hub
7B Base+Impromptu+nuScenes HF Hub
### 🚀 Model Training To start training, simply run the following command: ```bash llamafactory-cli train ``` Replace `` with the path to your training configuration file. For example: ```bash llamafactory-cli train train/Qwen2_5-VL/QA_train_sub_fin_nu/3B_full_QA_train_bs8.yaml ``` This command will launch the training process based on the settings specified in your YAML config file. Make sure the path is correct and all necessary parameters are properly configured. Training and testing data for nuScenes can be found in [nuscenes_train.json](nuscenes_train.json) and [nuscenes_test.json](nuscenes_train.json) respectively. ### 🧠 Inference To run inference with a fine-tuned model, you need to use the following command: ```bash python train/inference_scripts/sglang_infer.py --model_name_or_path --dataset --save_name --template qwen2_vl --tensor_parallel_size 1 --data_parallel_size 1 ``` Replace the placeholders with your actual paths: * ``: Name or path to the original pretrained model (e.g., Qwen2-VL-3B-Instruct) * ``: dataset name in dataset_info.json folling [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) * ``: Path to save inference results ### 🎯 Prompts The prompts we use can be found in [prompts](prompts.md). ### 📊 Close-loop Evaluation with NeuroNCAP To understand the system's performance within a closed-loop simulation environment, delve into the specifics of our NeuroNCAP-based evaluation: [Close-loop Evaluation](neuroncap_evaluation/evaluation.md) 🎮 ### 🎬 Video Gallery The videos compare the driving behavior of the two models in three representative challenging scenarios: stationary, frontal, and side. For each scenario, **the left column shows the behavior of the base model, which is fine-tuned on nuScenes. The right column shows the performance of the model trained on a subset of our proposed dataset and then fine-tuned on nuScenes**. Compared to the base model, the model using our data can better avoid vehicles by turning, slowing down, etc. #### stationary Base+nuScenes                                                                                           Base+Impromptu+nuScenes #### side Base+nuScenes                                                                                           Base+Impromptu+nuScenes #### frontal Base+nuScenes                                                                                           Base+Impromptu+nuScenes