# Epona **Repository Path**: flashdxy/Epona ## Basic Information - **Project Name**: Epona - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-05 - **Last Updated**: 2025-09-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Epona: Autoregressive Diffusion World Model for Autonomous Driving

ICCV 2025

Kaiwen Zhang*, Zhenyu Tang*, Xiaotao Hu, Xingang Pan,
Xiaoyang Guo, Yuan Liu, Jingwei Huang, Yuan Li, Qian Zhang,
Xiaoxiao Long, Xun Cao, Wei Yin§

*Equal Contribution Project Adviser §Project Lead, Corresponding Author

arXiv page Huggingface

Versatile capabilities of Epona: Given historical driving context, our Epona can generate consistent minutes-long driving videos at high resolution (A). It can be controlled by diverse trajectories (B), and understand real-world traffic knowledge (C). In addition, our world model can predict future trajectories and serve as an end-to-end real-time motion planner (D).

## 🚀 Getting Started ### Installation ```bash conda create -n epona python=3.10 conda activate epona pip install -r requirements.txt ``` To run the code with CUDA properly, you can comment out `torch` and `torchvision` in `requirement.txt`, and install the appropriate version of `torch>=2.1.0+cu121` and `torchvision>=0.16.0+cu121` according to the instructions on [PyTorch](https://pytorch.org/get-started/locally/). ### Data Preparation Please refer to [data preparation](./data_preparation/README.md) for more details to prepare and preprocess data. After preprocessing, please change the `datasets_paths` in the config files (under `configs` folder) to your own data path. ### Inference You can first download our pre-trained models (including the world models and the finetuned temporal-aware DCAE) from [Huggingface](https://huggingface.co/Kevin-thu/Epona). In addition to our finetuned temporal-aware DCAE, you may also experiment with the original [DCAEs](https://github.com/mit-han-lab/efficientvit/blob/master/applications/dc_ae/README.md) provided by MIT Han Lab as the autoencoder: [dc-ae-f32c32-mix-1.0](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0) and [dc-ae-f32c32-sana-1.1](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.1). After downloading, please change the `vae_ckpt` in the config files to your own autoencoder checkpoint path. Then, you can run different scripts in `scripts/test` folder to test *Epona* for different uses: | Script Name | Dataset | Trajectory Type | Video Length | Use Case Description | | ------------------ | ------------ | ------------------------------- | --------------- | ------------------------------------------------------------ | | `test_nuplan.py` | NuPlan | Fixed (from dataset) | Fixed | Evaluation on NuPlan test set with fixed setup. | | `test_free.py` | NuPlan | Self-predicted | Variable (free) | **Long-term video generation** with autonomous predictions. | | `test_ctrl.py` | NuPlan | User-provided (`poses`, `yaws`) | Variable (free) | **Trajectory-controlled video generation**; requires manual inputs in the script. | | `test_traj.py` | NuPlan | Prediction only | N/A | Evaluates the model’s **trajectory prediction** accuracy. | | `test_nuscenes.py` | NuScenes | Fixed (from dataset) | Fixed | Evaluation on nuScenes validation set with fixed setup. | | `test_demo.py` | Custom input | Self-predicted | Variable (free) | Run *Epona* on your own input data. | For example, to test the model on NuPlan test set, you can run: ```bash python3 scripts/test/test_nuplan.py \ --exp_name "test-nuplan" \ --start_id 0 --end_id 100 \ --resume_path "pretrained/epona_nuplan.pkl" \ --config configs/dit_config_dcae_nuplan.py ``` where: - `exp_name` is the name of the experiment; - `start_id` and `end_id` are the range of the test samples; - `resume_path` is the path to the pre-trained world model; - `config` is the path to the config file. All the inference scripts can be run on a single NVIDIA 4090 GPU. ### Training / Finetuning We also provide a simple script `scripts/train_deepspeed.py` for you to train or finetune the world model with DeepSpeed. For example, to train the world model on NuPlan dataset, you can run: ```bash export NODES_NUM=4 export GPUS_NUM=8 torchrun --nnodes=$NODES_NUM --nproc_per_node=$GPUS_NUM \ scripts/train_deepspeed.py \ --batch_size 2 \ --lr 2e-5 \ --exp_name "train-nuplan" \ --config configs/dit_config_dcae_nuplan.py \ --resume_path "pretrained/epona_nuplan.pkl" \ # set `resume_path` to resume training on previous checkpoint --eval_steps 2000 ``` You can customize the configuration file in the `configs` folder (e.g., adjust image resolution, number of condition frames, model size, etc.). Additionally, you can finetune our base world model on your own dataset by modifying the `dataset` folder to implement a custom dataset class. ## ❤️ Ackowledgement Our implementation is based on [DrivingWorld](https://github.com/YvanYin/DrivingWorld), [Flux](https://github.com/black-forest-labs/flux) and [DCAE](https://github.com/mit-han-lab/efficientvit/tree/master/applications/dc_ae). Thanks for these great open-source works! ## 📌 Citation If any part of our paper or code is helpful to your research, please consider citing our work 📝 and give us a star ⭐. Thanks for your support! ```bibtex @inproceedings{zhang2025epona, author = {Zhang, Kaiwen and Tang, Zhenyu and Hu, Xiaotao and Pan, Xingang and Guo, Xiaoyang and Liu, Yuan and Huang, Jingwei and Yuan, Li and Zhang, Qian and Long, Xiao-Xiao and Cao, Xun and Yin, Wei}, title = {Epona: Autoregressive Diffusion World Model for Autonomous Driving}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2025} } ```