1Nanyang Technological University
2Shanghai AI Laboratory
*Equal Contribution †Corresponding Authors
>**TL;DR**:
4DNeX is a feed-forward framework for generating 4D scene representations from a single image by fine-tuning a video diffusion model. It produces high-quality dynamic point clouds and enables downstream tasks such as novel-view video synthesis with strong generalizability.
## 🌟 Abstract
We present **4DNeX**, the first feed-forward framework for generating 4D (i.e., dynamic 3D) scene representations from a single image. In contrast to existing methods that rely on computationally intensive optimization or require multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D generation by fine-tuning a pretrained video diffusion model. Specifically, **1)** To alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scale dataset with high-quality 4D annotations generated using advanced reconstruction approaches. **2)** We introduce a unified 6D video representation that jointly models RGB and XYZ sequences, facilitating structured learning of both appearance and geometry. **3)** We propose a set of simple yet effective adaptation strategies to repurpose pretrained video diffusion models for the 4D generation task. 4DNeX produces high-quality dynamic point clouds that enable novel-view video synthesis. Extensive experiments demonstrate that 4DNeX achieves competitive performance compared to existing 4D generation approaches, offering a scalable and generalizable solution for single-image-based 4D scene generation.
## 🚧 TODO List
- [x] Data Preprocessing Scripts
- [x] Training Scripts
- [x] Inference Scripts
- [x] Pointmap Registration Scripts
- [x] Visualization Scripts
## 🚀 Quick Start
### Environment Setup
We use anaconda or miniconda to manage the python environment:
```bash
conda create -n "4dnex" python=3.10 -y
conda activate 4dnex
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# git lfs and rerun
conda install -c conda-forge git-lfs
conda install -c conda-forge rerun-sdk
```
### Pretrained Model
Our model is developed on top of [Wan2.1 I2V 14B](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers), please download the pretrained model from Hugging Face and place it in the `pretrained` directory as following structure:
```
4DNeX/
└── pretrained/
└── Wan2.1-I2V-14B-480P-Diffusers/
├── model_index.json
├── scheduler/
├── unet/
├── vae/
├── text_encoder/
├── tokenizer/
└── ...
```
Then, you may download our pretrained LoRA weights from HuggingFace [here](https://huggingface.co/FrozenBurning/4DNex-Lora) and place it in the `./pretrained` directory:
```bash
cd pretrained
mkdir 4dnex-lora
cd 4dnex-lora
huggingface-cli download FrozenBurning/4DNex-Lora --local-dir .
cd ../..
export PRETRAINED_LORA_PATH=./pretrained/4dnex-lora
```
### Inference
After setup the environment and pretrained model, you can run the following command to generate 4D scene representations from a single image, the output video and point map will be saved in the `OUTPUT_DIR` directory. Assuming we are going to save the results in the `./results` directory, we can run the following command:
```bash
export OUTPUT_DIR=./results
python inference.py --prompt ./example/prompt.txt --image ./example/image.txt --out $OUTPUT_DIR --sft_path ./pretrained/Wan2.1-I2V-14B-480P-Diffusers/transformer --type i2vwbw-demb-samerope --mode xyzrgb --lora_path $PRETRAINED_LORA_PATH --lora_rank 64
```
We store the path to the image in the `./example/image.txt` file, and the prompt in the `./example/prompt.txt` file for inference. Feel free to modify the prompt and image path to generate your own 4D scene representations.
### Visualization
To visualize the generated 4D scene representations, you may first perform pointmap registration using the following command:
```bash
python pm_registration.py --pkl_dir $OUTPUT_DIR
```
Then, you may visualize the pointmap registration results using [Rerun](https://github.com/rerun-io/rerun) as follows:
```bash
python rerun_vis.py --rr_recording test_log.rrd --pkl_dir $OUTPUT_DIR
rerun test_log.rrd --web-viewer
```

## 🔥 Training
### Prepare Data
Please checkout our 10M 4D dataset from [here](https://huggingface.co/datasets/3DTopia/4DNeX-10M), and place it in the `./data` directory.
The data can be organized in the following structure:
```
data/
├── dynamic/
│ ├── dynamic_1/
│ ├── dynamic_2/
│ └── dynamic_3/
├── static/
│ ├── static_1/
│ └── static_2/
├── caption/
│ └── dynamic_1_with_caption_upload.csv
│ └── dynamic_2_with_caption_upload.csv
│ └── dynamic_3_with_caption_upload.csv
│ └── static_1_with_caption_upload.csv
│ └── static_2_with_caption_upload.csv
└── raw/
├── dynamic/
│ ├── dynamic_1/
│ ├── dynamic_2/
│ └── dynamic_3/
└── static/
├── static_1/
└── static_2/
```
Run the command below to preprocess it:
```bash
python build_wan_dataset.py \
--data_dir ./data \
--out ./data/wan21
```
Once preprocessing is finished, the output directory will be organized as follows:
```
wan21/
├── cache/
├── videos/
├── first_frames/
├── pointmap/
├── pointmap_latents/
├── prompts.txt
├── videos.txt
└── generated_datalist.txt
```
### Launch Training
To launch training, we assume all data are in the `./data/wan21` directory, and run the following command:
```bash
bash scripts/finetune.sh
```
### Convert Zero Checkpoint to FP32
After training, you may convert the zero checkpoint to fp32 checkpoint for inference. For example, the output will be saved in the `./training/4dnex/5000-out` directory as follows:
```bash
python scripts/zero_to_fp32.py ./training/4dnex/checkpoint-5000 ./training/4dnex/5000-out --safe_serialization
```
## 📚 Citation
If you find our work useful for your research, please consider citing our paper:
```
@article{chen20254dnex,
title={4DNeX: Feed-Forward 4D Generative Modeling Made Easy},
author={Chen, Zhaoxi and Liu, Tianqi and Zhuo, Long and Ren, Jiawei and Tao, Zeng and Zhu, He and Hong, Fangzhou and Pan, Liang and Liu, Ziwei},
journal={arXiv preprint arXiv:2508.13154},
year={2025}
}
```