# TurboDiffusion
**Repository Path**: xt998/TurboDiffusion
## Basic Information
- **Project Name**: TurboDiffusion
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-23
- **Last Updated**: 2025-12-23
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# TurboDiffusion
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
|
Original, E2E Time: 4549s
|
TurboDiffusion, E2E Time: 38s
|
### Wan-2.1-T2V-1.3B-480P
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
|
Original, E2E Time: 184s
|
FastVideo, E2E Time: 5.3s
|
TurboDiffusion, E2E Time: 1.9s
|
### Wan-2.1-T2V-14B-720P
|
Original, E2E Time: 4767s
|
FastVideo, E2E Time: 72.6s
|
TurboDiffusion, E2E Time: 24s
|
|
Original, E2E Time: 4767s
|
FastVideo, E2E Time: 72.6s
|
TurboDiffusion, E2E Time: 24s
|
|
Original, E2E Time: 4767s
|
FastVideo, E2E Time: 72.6s
|
TurboDiffusion, E2E Time: 24s
|
### Wan-2.1-T2V-14B-480P
|
Original, E2E Time: 1676s
|
FastVideo, E2E Time: 26.3s
|
TurboDiffusion, E2E Time: 9.9s
|
|
Original, E2E Time: 1676s
|
FastVideo, E2E Time: 26.3s
|
TurboDiffusion, E2E Time: 9.9s
|
|
Original, E2E Time: 1676s
|
FastVideo, E2E Time: 26.3s
|
TurboDiffusion, E2E Time: 9.9s
|
|
Original, E2E Time: 1676s
|
FastVideo, E2E Time: 26.3s
|
TurboDiffusion, E2E Time: 9.9s
|
## Training
In this repo, we provide training code based on Wan2.1 and its synthetic data. The training builds on the rCM codebase (https://github.com/NVlabs/rcm), with infrastructure support including FSDP2, Ulysses CP, and selective activation checkpointing (SAC). For rCM training instructions, please refer to the original rCM repository; SLA training guidance is provided here.
#### Additional Installation
For rCM/SLA training, additionally run:
```bash
pip install megatron-core hydra-core wandb webdataset
pip install --no-build-isolation transformer_engine[pytorch]
```
#### Checkpoints Downloading
Download the Wan2.1 pretrained checkpoints in `.pth` format and VAE/text encoder to `assets/checkpoints`:
```bash
# make sure git lfs is installed
git clone https://huggingface.co/worstcoder/Wan assets/checkpoints
```
FSDP2 relies on [Distributed Checkpoint (DCP)](https://docs.pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html) for loading and saving checkpoints. Before training, convert `.pth` teacher checkpoints to `.dcp` first:
```bash
python -m torch.distributed.checkpoint.format_utils torch_to_dcp assets/checkpoints/Wan2.1-T2V-1.3B.pth assets/checkpoints/Wan2.1-T2V-1.3B.dcp
```
After training, the saved `.dcp` checkpoints can be converted to `.pth` using the script `scripts/dcp_to_pth.py`.
#### Dataset Downloading
We provide Wan2.1-14B-synthesized datasets. Download to `assets/datasets` using:
```bash
# make sure git lfs is installed
git clone https://huggingface.co/datasets/worstcoder/Wan_datasets assets/datasets
```
#### Start Training
We implement white-box SLA training by aligning the predictions of the SLA-enabled model with those of the full-attention pretrained model. Unlike black-box training in the original paper, which tunes the pretrained model using diffusion loss, white-box training mitigates distribution shift and is less sensitive to the training data.
Single-node training example:
```bash
WORKDIR="/your/path/to/turbodiffusion"
cd $WORKDIR
export PYTHONPATH=turbodiffusion
# the "IMAGINAIRE_OUTPUT_ROOT" environment variable is the path to save experiment output files
export IMAGINAIRE_OUTPUT_ROOT=${WORKDIR}/outputs
CHECKPOINT_ROOT=${WORKDIR}/assets/checkpoints
DATASET_ROOT=${WORKDIR}/assets/datasets/Wan2.1_14B_480p_16:9_Euler-step100_shift-3.0_cfg-5.0_seed-0_250K
# your Wandb information
export WANDB_API_KEY=xxx
export WANDB_ENTITY=xxx
registry=registry_sla
experiment=wan2pt1_1pt3B_res480p_t2v_SLA
torchrun --nproc_per_node=8 \
-m scripts.train --config=rcm/configs/${registry}.py -- experiment=${experiment} \
model.config.teacher_ckpt=${CHECKPOINT_ROOT}/Wan2.1-T2V-1.3B.dcp \
model.config.tokenizer.vae_pth=${CHECKPOINT_ROOT}/Wan2.1_VAE.pth \
model.config.text_encoder_path=${CHECKPOINT_ROOT}/models_t5_umt5-xxl-enc-bf16.pth \
model.config.neg_embed_path=${CHECKPOINT_ROOT}/umT5_wan_negative_emb.pt \
dataloader_train.tar_path_pattern=${DATASET_ROOT}/shard*.tar
```
Please refer to `turbodiffusion/rcm/configs/experiments/sla/wan2pt1_t2v.py` for the 14B config or perform modifications as needed.
#### Model Merging
The parameter updates from SLA training can be merged into rCM checkpoints using `turbodiffusion/scripts/merge_models.py`, enabling rCM to perform sparse attention inference. Specify `--base` as the rCM model, `--diff_base` as the pretrained model, and `--diff_target` as the SLA-tuned model.
## ComfyUI Integration
We thank the community effort [Comfyui_turbodiffusion](https://github.com/anveshane/Comfyui_turbodiffusion) for integrating TurboDiffusion into ComfyUI.
## Roadmap
We're actively working on the following features and improvements:
- [x] Organize and release training code
- [ ] Optimize infrastructure for better parallel
- [ ] vLLM-Omni integration
- [ ] Support for more video generation models
- [ ] Support for autoregressive video generation models
- [ ] More hardware-level operator optimizations
We welcome community members to help maintain and extend TurboDiffusion. Welcome to join the TurboDiffusion Team and contribute together!
## Citation
**If you use this code or find our work valuable, please cite:**
```
@article{zhang2025turbodiffusion,
title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E and Chen, Jianfei and Zhu, Jun},
journal={arXiv preprint arXiv:2512.16093},
year={2025}
}
@software{turbodiffusion2025,
title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
author={The TurboDiffusion Team},
url={https://github.com/thu-ml/TurboDiffusion},
year={2025}
}
@inproceedings{zhang2025sageattention,
title={SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration},
author={Zhang, Jintao and Wei, Jia and Zhang, Pengle and Zhu, Jun and Chen, Jianfei},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025}
}
@article{zhang2025sla,
title={SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention},
author={Zhang, Jintao and Wang, Haoxu and Jiang, Kai and Yang, Shuo and Zheng, Kaiwen and Xi, Haocheng and Wang, Ziteng and Zhu, Hongzhou and Zhao, Min and Stoica, Ion and others},
journal={arXiv preprint arXiv:2509.24006},
year={2025}
}
@article{zheng2025rcm,
title={Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency},
author={Zheng, Kaiwen and Wang, Yuji and Ma, Qianli and Chen, Huayu and Zhang, Jintao and Balaji, Yogesh and Chen, Jianfei and Liu, Ming-Yu and Zhu, Jun and Zhang, Qinsheng},
journal={arXiv preprint arXiv:2510.08431},
year={2025}
}
@inproceedings{zhang2024sageattention2,
title={Sageattention2: Efficient attention with thorough outlier smoothing and per-thread int4 quantization},
author={Zhang, Jintao and Huang, Haofeng and Zhang, Pengle and Wei, Jia and Zhu, Jun and Chen, Jianfei},
booktitle={International Conference on Machine Learning (ICML)},
year={2025}
}
```