# Tesseract
**Repository Path**: droliven/tesseract
## Basic Information
- **Project Name**: Tesseract
- **Description**: https://github.com/UMass-Embodied-AGI/TesserAct
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: https://github.com/UMass-Embodied-AGI/TesserAct
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-26
- **Last Updated**: 2025-06-27
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
TesserAct: Learning 4D Embodied World Models
arXiv 2025
Haoyu Zhen*,
Qiao Sun*,
Hongxin Zhang,
Junyan Li,
Siyuan Zhou,
Yilun Du,
Chuang Gan
We propose TesserAct, **the first open-source and generalized 4D World Model for robotics**, which takes input images and text instructions to generate RGB, depth, and normal videos, reconstructing a 4D scene and predicting actions.
Tabel of Contents
-
Installation
-
Data Preparation
-
Training
-
Inference
-
Citation
-
Acknowledgement
## News
- [2025-06-19] We provide an efficient RGB+Depth+Normal LoRA fine-tuning script for custom datasets.
- [2025-06-18] We provide a RGB-only LoRA inference script that achieves the best generalization ability for robotics video generation.
- [2025-06-06] We have released the training code and data generation scripts!
- [2025-05-05] We have updated the gallery and added more results on the [project website](https://tesseractworld.github.io).
- [2025-05-04] We add [USAGE.MD](doc/usage.md) to provide more details about the models and how to use the models on your own data!
- [2025-04-29] We have released the inference code and TesserAct-v0.1 model weights!
## Installation
Create a conda environment and install the required packages:
```bash
conda create -n tesseract python=3.9
conda activate tesseract
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
git clone https://github.com/UMass-Embodied-AGI/TesserAct.git
cd TesserAct
pip install -e .
```
## 完整环境列表
```
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
_openmp_mutex 4.5 2_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
absl-py 2.3.0 pypi_0 pypi
accelerate 1.8.1 pypi_0 pypi
aiohappyeyeballs 2.6.1 pypi_0 pypi
aiohttp 3.12.13 pypi_0 pypi
aiosignal 1.3.2 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
aom 3.6.1 h59595ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
async-timeout 5.0.1 pypi_0 pypi
attrs 25.3.0 pypi_0 pypi
bitsandbytes 0.46.0 pypi_0 pypi
blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
brotli-python 1.1.0 py310hf71b8c6_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
bzip2 1.0.8 h4bc722e_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
ca-certificates 2025.6.15 hbd8a1cb_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
certifi 2025.6.15 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
cffi 1.17.1 py310h8deb56e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
charset-normalizer 3.4.2 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
click 8.2.1 pypi_0 pypi
cpython 3.10.18 py310hd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
cuda-cudart 11.8.89 0 nvidia
cuda-cupti 11.8.87 0 nvidia
cuda-libraries 11.8.0 0 nvidia
cuda-nvrtc 11.8.89 0 nvidia
cuda-nvtx 11.8.86 0 nvidia
cuda-opencl 12.9.19 0 nvidia
cuda-runtime 11.8.0 0 nvidia
cuda-version 12.9 3 nvidia
datasets 3.3.2 pypi_0 pypi
decord 0.6.0 pypi_0 pypi
diffusers 0.32.1 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
ffmpeg 4.4.2 gpl_hdf48244_113 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
filelock 3.18.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
font-ttf-ubuntu 0.83 h77eed37_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
fontconfig 2.15.0 h7e30c49_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
fonts-conda-ecosystem 1 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
fonts-conda-forge 1 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
freetype 2.13.3 ha770c72_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
frozenlist 1.7.0 pypi_0 pypi
fsspec 2024.12.0 pypi_0 pypi
giflib 5.2.2 hd590300_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
gitdb 4.0.12 pypi_0 pypi
gitpython 3.1.44 pypi_0 pypi
gmp 6.3.0 hac33072_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
gmpy2 2.2.1 py310he8512ff_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
gnutls 3.7.9 hb077bed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
grpcio 1.73.1 pypi_0 pypi
h2 4.2.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
hf-transfer 0.1.9 pypi_0 pypi
hf-xet 1.1.5 pypi_0 pypi
hpack 4.1.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
huggingface-hub 0.33.1 pypi_0 pypi
hyperframe 6.1.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
icu 75.1 he02047a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
idna 3.10 pyhd8ed1ab_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
imageio 2.37.0 pypi_0 pypi
imageio-ffmpeg 0.6.0 pypi_0 pypi
importlib-metadata 8.7.0 pypi_0 pypi
intel-openmp 2022.0.1 h06a4308_3633 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jinja2 3.1.6 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
jsonlines 4.0.0 pypi_0 pypi
kornia 0.8.1 pypi_0 pypi
kornia-rs 0.1.9 pypi_0 pypi
lame 3.100 h166bdaf_1003 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
lcms2 2.17 h717163a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
ld_impl_linux-64 2.43 h1423503_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
lerc 4.0.0 h0aef613_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libasprintf 0.24.1 h8e693c7_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libblas 3.9.0 16_linux64_mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libcblas 3.9.0 16_linux64_mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libcublas 11.11.3.6 0 nvidia
libcufft 10.9.0.58 0 nvidia
libcufile 1.14.1.1 4 nvidia
libcurand 10.3.10.19 0 nvidia
libcusolver 11.4.1.48 0 nvidia
libcusparse 11.7.5.86 0 nvidia
libdeflate 1.24 h86f0d12_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libdrm 2.4.125 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libegl 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libexpat 2.7.0 h5888daf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libffi 3.4.6 h2dba641_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libfreetype 2.13.3 ha770c72_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libfreetype6 2.13.3 h48d6fc4_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgcc 15.1.0 h767d61c_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgcc-ng 15.1.0 h69a702a_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgettextpo 0.24.1 h5888daf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgl 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libglvnd 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libglx 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libgomp 15.1.0 h767d61c_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libiconv 1.18 h4ce23a2_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libidn2 2.3.8 ha4ef2c3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libjpeg-turbo 3.1.0 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
liblapack 3.9.0 16_linux64_mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
liblzma 5.8.1 hb9d3cd8_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libnpp 11.8.0.86 0 nvidia
libnsl 2.0.1 hb9d3cd8_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libnvjitlink 12.1.105 0 nvidia
libnvjpeg 11.9.0.86 0 nvidia
libpciaccess 0.18 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libpng 1.6.49 h943b412_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libsqlite 3.50.1 h6cd9bfd_6 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libstdcxx 15.1.0 h8f9b012_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libstdcxx-ng 15.1.0 h4852527_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libtasn1 4.20.0 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libtiff 4.7.0 hf01ce69_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libunistring 0.9.10 h7f98852_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libuuid 2.38.1 h0b41bf4_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libva 2.22.0 h4f16b4b_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libvpx 1.13.1 h59595ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libwebp 1.5.0 hae8dbeb_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libwebp-base 1.5.0 h851e524_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libxcb 1.17.0 h8a09558_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libxcrypt 4.4.36 hd590300_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libxml2 2.13.8 h4bc477f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
libzlib 1.3.1 hb9d3cd8_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
llvm-openmp 15.0.7 h0cdce71_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
markdown 3.8.2 pypi_0 pypi
markupsafe 3.0.2 py310h89163eb_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
mkl 2022.1.0 hc2b9512_224 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
modelscope 1.27.1 pypi_0 pypi
mpc 1.3.1 h24ddda3_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
mpfr 4.2.1 h90cbb55_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
mpmath 1.3.0 pyhd8ed1ab_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
multidict 6.5.1 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
natsort 8.4.0 pypi_0 pypi
ncurses 6.5 h2d0b736_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
nettle 3.9.1 h7ab15ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
networkx 3.4.2 pyh267e887_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
numpy 2.2.6 py310hefbff90_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
nvidia-ml-py 12.575.51 pypi_0 pypi
nvitop 1.5.1 pypi_0 pypi
ocl-icd 2.3.3 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
opencl-headers 2025.06.13 h5888daf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
opencv-python 4.11.0.86 pypi_0 pypi
openh264 2.3.1 hcb278e6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
openjpeg 2.5.3 h5fbd93e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
openssl 3.5.0 h7b32b05_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
p11-kit 0.24.1 hc5aa10d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
packaging 25.0 pypi_0 pypi
pandas 2.3.0 pypi_0 pypi
peft 0.15.2 pypi_0 pypi
pillow 11.2.1 py310h7e6dc6c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
pip 25.1.1 pyh8b19718_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
platformdirs 4.3.8 pypi_0 pypi
propcache 0.3.2 pypi_0 pypi
protobuf 6.31.1 pypi_0 pypi
psutil 7.0.0 pypi_0 pypi
pthread-stubs 0.4 hb9d3cd8_1002 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
pyarrow 20.0.0 pypi_0 pypi
pycparser 2.22 pyh29332c3_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
pydantic 2.11.7 pypi_0 pypi
pydantic-core 2.33.2 pypi_0 pypi
pysocks 1.7.1 pyha55dd90_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
python 3.10.18 hd6af730_0_cpython https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
python-dateutil 2.9.0.post0 pypi_0 pypi
python_abi 3.10 7_cp310 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
pytorch 2.5.1 py3.10_cuda11.8_cudnn9.1.0_0 pytorch
pytorch-cuda 11.8 h7e8668a_6 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2025.2 pypi_0 pypi
pyyaml 6.0.2 py310h89163eb_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
readline 8.2 h8c095d6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
regex 2024.11.6 pypi_0 pypi
requests 2.32.4 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
ruff 0.9.10 pypi_0 pypi
safetensors 0.5.3 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
sentry-sdk 2.31.0 pypi_0 pypi
setproctitle 1.3.6 pypi_0 pypi
setuptools 80.9.0 pyhff2d567_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
six 1.17.0 pypi_0 pypi
smmap 5.0.2 pypi_0 pypi
svt-av1 1.4.1 hcb278e6_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
sympy 1.13.1 pypi_0 pypi
tensorboard 2.19.0 pypi_0 pypi
tensorboard-data-server 0.7.2 pypi_0 pypi
tesseract 0.1 pypi_0 pypi
tk 8.6.13 noxft_hd72426e_102 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
tokenizers 0.21.2 pypi_0 pypi
torchao 0.11.0 pypi_0 pypi
torchaudio 2.5.1 py310_cu118 pytorch
torchdata 0.10.1 pypi_0 pypi
torchtriton 3.1.0 py310 pytorch
torchvision 0.20.1 py310_cu118 pytorch
tqdm 4.67.1 pypi_0 pypi
transformers 4.52.4 pypi_0 pypi
typing-inspection 0.4.1 pypi_0 pypi
typing_extensions 4.14.0 pyhe01879c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
tzdata 2025.2 pypi_0 pypi
urllib3 2.5.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
wandb 0.20.1 pypi_0 pypi
wayland 1.23.1 h3e06ad9_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
wayland-protocols 1.45 hd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
werkzeug 3.1.3 pypi_0 pypi
wheel 0.45.1 pyhd8ed1ab_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
x264 1!164.3095 h166bdaf_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
x265 3.5 h924138e_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xorg-libx11 1.8.12 h4f16b4b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xorg-libxau 1.0.12 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xorg-libxdmcp 1.1.5 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xorg-libxext 1.3.6 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xorg-libxfixes 6.0.1 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
xxhash 3.5.0 pypi_0 pypi
yaml 0.2.5 h7f98852_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
yarl 1.20.1 pypi_0 pypi
zipp 3.23.0 pypi_0 pypi
zstandard 0.23.0 py310ha75aee5_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
zstd 1.5.7 hb8e6e7a_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
```
## Data Preparation
Please refer to [DATA.md](DATA.md) for data generation scripts and dataset preparation.
## Training
### Pre-training or Full Fine-tuning
To pre-train the full TesserAct model from CogVideoX, we provide a training script based on [Finetrainers](https://github.com/a-r-r-o-w/finetrainers). The training code supports distributed training with multiple GPUs or multi-nodes.
To pre-train our TesserAct model, run the following command:
```bash
bash train_i2v_depth_normal_sft.sh
```
To fine-tune our released TesserAct model, modify the model loading code in [tesseract/i2v_depth_normal_sft.py](tesseract/i2v_depth_normal_sft.py):
```python
transformer = CogVideoXTransformer3DModel.from_pretrained_modify(
"anyeZHY/tesseract",
subfolder="tesseract_v01e_rgbdn_sft",
...
)
```
### LoRA Fine-tuning
You can efficiently fine-tune our TesserAct model using LoRA (Low-Rank Adaptation) with your own data (~100 videos). This approach requires approximately **~30GB GPU memory** and allows for efficient training (~2 days) on custom datasets.
To fine-tune using LoRA, run the following command:
```bash
bash train_i2v_depth_normal_lora.sh
```
> [!WARNING]
> LoRA fine-tuning is experimental and not fully tested yet.
> [!NOTE]
> We will give a detailed training guide in the future: why TesserAct has better generalization, how to set the hyperparameters and performance between different training methods (SFT vs LoRA).
>
> We don't have a clear plan for releasing the whole dataset yet, because depth data is usually stored as floats, which takes up a lot of space and makes uploading to Hugging Face very difficult. However, we will provide scripts later on to show how to prepare the data.
## Inference
Now TesserAct includes following models. The names of the models are in the format of `anyeZHY/tesseract/` (huggingface repo name) + `___`. In ``, postfix `p` indicates the model is production-ready and `e` means the model is experimental. We will keep updating the model weights and scaling the dataset to improve the performance of the models.
```
anyeZHY/tesseract/tesseract_v01e_rgbdn_sft
anyeZHY/tesseract/tesseract_v01e_rgb_lora
```
> [!IMPORTANT]
> It is recommended to read [USAGE.MD](doc/usage.md) for more details **before running the inference code on your own data.**
We provide a guide on how to prepare inputs, such as text prompt. We also analyze the model's limitations and performance, including:
>
> - Tasks that the model can reliably accomplish.
>
> - Tasks that are achievable but with certain success rates. In the future, this may be improved by using techniques like test-time scaling.
>
> - Tasks that are currently beyond the model's capabilities.
You can run the inference code with the following command (Optional flags: `--memory_efficient`).
```bash
python inference/inference_rgbdn_sft.py \
--weights_path anyeZHY/tesseract/tesseract_v01e_rgbdn_sft \
--image_path asset/images/fruit_vangogh.png \
--prompt "pick up the apple google robot"
```
This inference code will generate a video of the google robot picking up the apple in the Van Gogh Painting. Try other prompts like `pick up the pear Franka Emika Panda`! Or `asset/images/majo.jpg` with prompt `Move the cup near bottle Franka Emika Panda`!
For RGB-only generation using the LoRA model, you can use:
```bash
python inference/inference_rgb_lora.py \
--weights_path anyeZHY/tesseract/tesseract_v01e_rgb_lora \
--image_path asset/images/fruit_vangogh.png \
--prompt "pick up the apple google robot"
```
The RGB LoRA model offers the best generalization quality for RGB video generation, making it ideal for diverse robotic manipulation tasks.
For RGB+Depth+Normal generation using the LoRA model, you can use:
```bash
python inference/inference_rgbdn_lora.py \
--base_weights_path anyeZHY/tesseract/tesseract_v01e_rgbdn_sft \
--lora_weights_path ./your_local_lora_weights \
--image_path asset/images/fruit_vangogh.png \
--prompt "pick up the apple google robot"
```
You may find output videos in the `results` folder.
Note: When we test the model on another server, the results are exactly the same as those we uploaded to GitHub.
So if you find they are different and get unexpected results like noisy videos, please check your environment and the version of the packages you are using.
> [!WARNING]
> Because RT1 and Bridge normal data is generated by [Temporal Marigold](https://huggingface.co/docs/diffusers/en/using-diffusers/marigold_usage#frame-by-frame-video-processing-with-temporal-consistency), sometimes normal outputs are not perfect. We are working on improving the data using [NormalCrafter](https://github.com/Binyr/NormalCrafter).
Below is a list of TODOs for the inference part.
- [x] LoRA inference code
- [ ] Blender rendering code (check package [PyBlend](https://github.com/anyeZHY/PyBlend)!)
- [ ] Normal Integration
## Citation
If you find our work useful, please consider citing:
```bibtex
@article{zhen2025tesseract,
title={TesserAct: Learning 4D Embodied World Models},
author={Haoyu Zhen and Qiao Sun and Hongxin Zhang and Junyan Li and Siyuan Zhou and Yilun Du and Chuang Gan},
year={2025},
eprint={2504.20995},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.20995},
}
```
## Acknowledgements
We would like to thank the following works for their code and models:
- Training: [CogVideo](https://github.com/THUDM/CogVideo), [Finetrainers](https://github.com/a-r-r-o-w/finetrainers) and [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun)
- Data generation: [RollingDepth](https://github.com/prs-eth/rollingdepth), [Marigold](https://github.com/prs-eth/Marigold) and [DSINE](https://github.com/baegwangbin/DSINE)
- Datasets: [OpenX](https://robotics-transformer-x.github.io/), [RLBench](https://github.com/stepjam/RLBench), [Hiveformer](https://github.com/vlc-robot/hiveformer) and [Colosseum](https://github.com/robot-colosseum/robot-colosseum)
- Why normals: [BiNI](https://github.com/xucao-42/NormalIntegration), [ICON](https://github.com/YuliangXiu/ICON), [StableNormal](https://github.com/Stable-X/StableNormal) and [NormalCrafter](https://github.com/Binyr/NormalCrafter)
We are extremely grateful to Pengxiao Han for assistance with the baseline code, and to Yuncong Yang, Sunli Chen,
Jiaben Chen, Zeyuan Yang, Zixin Wang, Lixing Fang, and many other friends in our [Embodied AGI Lab](https://embodied-agi.cs.umass.edu/)
for their helpful feedback and insightful discussions.