📊 Results

Open-loop trajectory prediction L2 errors (m) on the nuScenes dataset.
Method	1s	2s	3s	Avg.
Closed-source API-only Models
GPT-4o¹	0.28	0.93	2.02	1.07
Claude-3.5-Sonnet¹	0.29	0.98	2.12	1.13
Claude-3.7-Sonnet¹	0.28	0.94	2.04	1.09
Gemini-2.0-Flash¹	0.31	1.08	2.36	1.25
Gemini-2.5-Pro¹	0.37	1.35	2.96	1.56
Open-source Generalist VLMs
LLaVA-1.6-Mistral-7B²	1.49	3.38	4.09	2.98
Llama-3.2-11B-Vision-Instruct²	1.54	3.31	3.91	2.92
Qwen2-VL-7B-Instruct²	1.45	3.21	3.76	2.81
DeepSeek-VL2-16B¹	0.66	1.68	2.92	1.75
DeepSeek-VL2-28B¹	0.37	1.35	2.96	1.56
LLaMA-3.2-11B-Vision-Instruct¹	0.52	1.42	2.68	1.54
LLaMA-3.2-90B-Vision-Instruct¹	0.66	1.71	3.01	1.79
Qwen-2.5-VL-7B-Instruct¹	0.46	1.33	2.55	1.45
Training-based Driving Specialists (Existing Methods)
UniAD³	0.42	0.64	0.91	0.66
VAD³	0.17	0.34	0.60	0.37
BEV-Planner³	0.16	0.32	0.57	0.35
Ego-MLP³*	0.15	0.32	0.59	0.35
Ours and Key Competitors (Specialized Driving Models)
DriveVLM³	0.18	0.34	0.68	0.40
OmniDrive³	0.14	0.29	0.55	0.33
DriveVLM-Dual³	0.15	0.29	0.48	0.31
EMMA (random init)³	0.15	0.33	0.63	0.37
EMMA³	0.14	0.29	0.54	0.32
EMMA+³	0.13	0.27	0.48	0.29
3B Base+nuScenes	0.14	0.30	0.58	0.34
3B Base+Impromptu+nuScenes	0.13	0.27	0.52	0.30
7B Base+nuScenes	0.13	0.28	0.55	0.32
7B Base+Impromptu+nuScenes	0.13	0.27	0.53	0.30

Note: Best results within each category are in bold, second best are underlined. ¹ from LightEMMA, ² from OpenEMMA, ³ from EMMA.

Results on NeuroNCAP
Source	Method	NeuroNCAP Score ↑				Collision rate (%) ↓
Source	Method	Avg.	Stat.	Frontal	Side	Avg.	Stat.	Frontal	Side
CVPR 2023	UniAD²	0.73	0.84	0.10	1.26	88.6	87.8	98.4	79.6
ICCV 2023	VAD²	0.66	0.47	0.04	1.45	92.5	96.2	99.6	81.6
ICRA 2025	SparseDrive¹	0.92	-	-	-	93.9	-	-	-
CVPR 2025	BridgeAD-S¹	1.52	-	-	-	76.2	-	-	-
CVPR 2025	BridgeAD-B¹	1.60	-	-	-	72.6	-	-	-
-	Base+nuScenes	1.77	1.80	1.67	1.75	72.5	68.0	73.0	71.5
-	Base+Impromptu+nuScenes	2.15	1.77	2.31	2.10	65.5	70.0	59.0	65.0

Note: Best scores in each category are in bold, second best are underlined. ¹ from BridgeAD, ² from NeuRAD
The improvements in both the overall NeuroNCAP score and, crucially, the reduction in collision rates suggest that our dataset helps the model develop a more nuanced understanding of complex road interactions, leading to more robust and safer driving policies.

Pre-trained Models Download Links
Method	Download
3B Base+nuScenes	HF Hub
3B Base+Impromptu	HF Hub
3B Base+Impromptu+nuScenes	HF Hub
7B Base+nuScenes	HF Hub
7B Base+Impromptu	HF Hub
7B Base+Impromptu+nuScenes	HF Hub

Pre-trained Models Download Links

Method

Download

3B Base+nuScenes

HF Hub

3B Base+Impromptu

HF Hub

3B Base+Impromptu+nuScenes

HF Hub

7B Base+nuScenes

HF Hub

7B Base+Impromptu

HF Hub

7B Base+Impromptu+nuScenes

HF Hub