Zhaoxi Chen^1* Tianqi Liu^1* Long Zhuo^2* Jiawei Ren¹
Zeng Tao² He Zhu² Fangzhou Hong¹ Liang Pan^2† Ziwei Liu^1†

¹Nanyang Technological University ²Shanghai AI Laboratory

^*Equal Contribution ^†Corresponding Authors

>**TL;DR**: 4DNeX is a feed-forward framework for generating 4D scene representations from a single image by fine-tuning a video diffusion model. It produces high-quality dynamic point clouds and enables downstream tasks such as novel-view video synthesis with strong generalizability.

## 🌟 Abstract We present **4DNeX**, the first feed-forward framework for generating 4D (i.e., dynamic 3D) scene representations from a single image. In contrast to existing methods that rely on computationally intensive optimization or require multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D generation by fine-tuning a pretrained video diffusion model. Specifically, **1)** To alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scale dataset with high-quality 4D annotations generated using advanced reconstruction approaches. **2)** We introduce a unified 6D video representation that jointly models RGB and XYZ sequences, facilitating structured learning of both appearance and geometry. **3)** We propose a set of simple yet effective adaptation strategies to repurpose pretrained video diffusion models for the 4D generation task. 4DNeX produces high-quality dynamic point clouds that enable novel-view video synthesis. Extensive experiments demonstrate that 4DNeX achieves competitive performance compared to existing 4D generation approaches, offering a scalable and generalizable solution for single-image-based 4D scene generation. ## 🚧 TODO List - [x] Data Preprocessing Scripts - [x] Training Scripts - [x] Inference Scripts - [x] Pointmap Registration Scripts - [x] Visualization Scripts ## 🚀 Quick Start ### Environment Setup We use anaconda or miniconda to manage the python environment: ```bash conda create -n "4dnex" python=3.10 -y conda activate 4dnex pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt # git lfs and rerun conda install -c conda-forge git-lfs conda install -c conda-forge rerun-sdk ``` ### Pretrained Model Our model is developed on top of [Wan2.1 I2V 14B](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers), please download the pretrained model from Hugging Face and place it in the `pretrained` directory as following structure: ``` 4DNeX/ └── pretrained/ └── Wan2.1-I2V-14B-480P-Diffusers/ ├── model_index.json ├── scheduler/ ├── unet/ ├── vae/ ├── text_encoder/ ├── tokenizer/ └── ... ``` Then, you may download our pretrained LoRA weights from HuggingFace [here](https://huggingface.co/FrozenBurning/4DNex-Lora) and place it in the `./pretrained` directory: ```bash cd pretrained mkdir 4dnex-lora cd 4dnex-lora huggingface-cli download FrozenBurning/4DNex-Lora --local-dir . cd ../.. export PRETRAINED_LORA_PATH=./pretrained/4dnex-lora ``` ### Inference After setup the environment and pretrained model, you can run the following command to generate 4D scene representations from a single image, the output video and point map will be saved in the `OUTPUT_DIR` directory. Assuming we are going to save the results in the `./results` directory, we can run the following command: ```bash export OUTPUT_DIR=./results python inference.py --prompt ./example/prompt.txt --image ./example/image.txt --out $OUTPUT_DIR --sft_path ./pretrained/Wan2.1-I2V-14B-480P-Diffusers/transformer --type i2vwbw-demb-samerope --mode xyzrgb --lora_path $PRETRAINED_LORA_PATH --lora_rank 64 ``` We store the path to the image in the `./example/image.txt` file, and the prompt in the `./example/prompt.txt` file for inference. Feel free to modify the prompt and image path to generate your own 4D scene representations. ### Visualization To visualize the generated 4D scene representations, you may first perform pointmap registration using the following command: ```bash python pm_registration.py --pkl_dir $OUTPUT_DIR ``` Then, you may visualize the pointmap registration results using [Rerun](https://github.com/rerun-io/rerun) as follows: ```bash python rerun_vis.py --rr_recording test_log.rrd --pkl_dir $OUTPUT_DIR rerun test_log.rrd --web-viewer ``` ![rerun_demo_github](https://github.com/user-attachments/assets/433b2df2-711f-4360-a2d2-144837ef3944) ## 🔥 Training ### Prepare Data Please checkout our 10M 4D dataset from [here](https://huggingface.co/datasets/3DTopia/4DNeX-10M), and place it in the `./data` directory. The data can be organized in the following structure: ``` data/ ├── dynamic/ │ ├── dynamic_1/ │ ├── dynamic_2/ │ └── dynamic_3/ ├── static/ │ ├── static_1/ │ └── static_2/ ├── caption/ │ └── dynamic_1_with_caption_upload.csv │ └── dynamic_2_with_caption_upload.csv │ └── dynamic_3_with_caption_upload.csv │ └── static_1_with_caption_upload.csv │ └── static_2_with_caption_upload.csv └── raw/ ├── dynamic/ │ ├── dynamic_1/ │ ├── dynamic_2/ │ └── dynamic_3/ └── static/ ├── static_1/ └── static_2/ ``` Run the command below to preprocess it: ```bash python build_wan_dataset.py \ --data_dir ./data \ --out ./data/wan21 ``` Once preprocessing is finished, the output directory will be organized as follows: ``` wan21/ ├── cache/ ├── videos/ ├── first_frames/ ├── pointmap/ ├── pointmap_latents/ ├── prompts.txt ├── videos.txt └── generated_datalist.txt ``` ### Launch Training To launch training, we assume all data are in the `./data/wan21` directory, and run the following command: ```bash bash scripts/finetune.sh ``` ### Convert Zero Checkpoint to FP32 After training, you may convert the zero checkpoint to fp32 checkpoint for inference. For example, the output will be saved in the `./training/4dnex/5000-out` directory as follows: ```bash python scripts/zero_to_fp32.py ./training/4dnex/checkpoint-5000 ./training/4dnex/5000-out --safe_serialization ``` ## 📚 Citation If you find our work useful for your research, please consider citing our paper: ``` @article{chen20254dnex, title={4DNeX: Feed-Forward 4D Generative Modeling Made Easy}, author={Chen, Zhaoxi and Liu, Tianqi and Zhuo, Long and Ren, Jiawei and Tao, Zeng and Zhu, He and Hong, Fangzhou and Pan, Liang and Liu, Ziwei}, journal={arXiv preprint arXiv:2508.13154}, year={2025} } ```