# Tesseract **Repository Path**: droliven/tesseract ## Basic Information - **Project Name**: Tesseract - **Description**: https://github.com/UMass-Embodied-AGI/TesserAct - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://github.com/UMass-Embodied-AGI/TesserAct - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-06-26 - **Last Updated**: 2025-06-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

TesserAct: Learning 4D Embodied World Models

arXiv 2025

Haoyu Zhen*, Qiao Sun*, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan

Paper PDF Project Page Model Hugging Face

We propose TesserAct, **the first open-source and generalized 4D World Model for robotics**, which takes input images and text instructions to generate RGB, depth, and normal videos, reconstructing a 4D scene and predicting actions.

Logo


Tabel of Contents
  1. Installation
  2. Data Preparation
  3. Training
  4. Inference
  5. Citation
  6. Acknowledgement
## News - [2025-06-19] We provide an efficient RGB+Depth+Normal LoRA fine-tuning script for custom datasets. - [2025-06-18] We provide a RGB-only LoRA inference script that achieves the best generalization ability for robotics video generation. - [2025-06-06] We have released the training code and data generation scripts! - [2025-05-05] We have updated the gallery and added more results on the [project website](https://tesseractworld.github.io). - [2025-05-04] We add [USAGE.MD](doc/usage.md) to provide more details about the models and how to use the models on your own data! - [2025-04-29] We have released the inference code and TesserAct-v0.1 model weights! ## Installation Create a conda environment and install the required packages: ```bash conda create -n tesseract python=3.9 conda activate tesseract conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirements.txt git clone https://github.com/UMass-Embodied-AGI/TesserAct.git cd TesserAct pip install -e . ``` ## 完整环境列表 ``` # Name Version Build Channel _libgcc_mutex 0.1 conda_forge https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge _openmp_mutex 4.5 2_gnu https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge absl-py 2.3.0 pypi_0 pypi accelerate 1.8.1 pypi_0 pypi aiohappyeyeballs 2.6.1 pypi_0 pypi aiohttp 3.12.13 pypi_0 pypi aiosignal 1.3.2 pypi_0 pypi annotated-types 0.7.0 pypi_0 pypi aom 3.6.1 h59595ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge async-timeout 5.0.1 pypi_0 pypi attrs 25.3.0 pypi_0 pypi bitsandbytes 0.46.0 pypi_0 pypi blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge brotli-python 1.1.0 py310hf71b8c6_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge bzip2 1.0.8 h4bc722e_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ca-certificates 2025.6.15 hbd8a1cb_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge certifi 2025.6.15 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge cffi 1.17.1 py310h8deb56e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge charset-normalizer 3.4.2 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge click 8.2.1 pypi_0 pypi cpython 3.10.18 py310hd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge cuda-cudart 11.8.89 0 nvidia cuda-cupti 11.8.87 0 nvidia cuda-libraries 11.8.0 0 nvidia cuda-nvrtc 11.8.89 0 nvidia cuda-nvtx 11.8.86 0 nvidia cuda-opencl 12.9.19 0 nvidia cuda-runtime 11.8.0 0 nvidia cuda-version 12.9 3 nvidia datasets 3.3.2 pypi_0 pypi decord 0.6.0 pypi_0 pypi diffusers 0.32.1 pypi_0 pypi dill 0.3.8 pypi_0 pypi ffmpeg 4.4.2 gpl_hdf48244_113 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge filelock 3.18.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge font-ttf-inconsolata 3.000 h77eed37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge font-ttf-ubuntu 0.83 h77eed37_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge fontconfig 2.15.0 h7e30c49_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge fonts-conda-ecosystem 1 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge fonts-conda-forge 1 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge freetype 2.13.3 ha770c72_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge frozenlist 1.7.0 pypi_0 pypi fsspec 2024.12.0 pypi_0 pypi giflib 5.2.2 hd590300_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge gitdb 4.0.12 pypi_0 pypi gitpython 3.1.44 pypi_0 pypi gmp 6.3.0 hac33072_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge gmpy2 2.2.1 py310he8512ff_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge gnutls 3.7.9 hb077bed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge grpcio 1.73.1 pypi_0 pypi h2 4.2.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge hf-transfer 0.1.9 pypi_0 pypi hf-xet 1.1.5 pypi_0 pypi hpack 4.1.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge huggingface-hub 0.33.1 pypi_0 pypi hyperframe 6.1.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge icu 75.1 he02047a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge idna 3.10 pyhd8ed1ab_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge imageio 2.37.0 pypi_0 pypi imageio-ffmpeg 0.6.0 pypi_0 pypi importlib-metadata 8.7.0 pypi_0 pypi intel-openmp 2022.0.1 h06a4308_3633 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main jinja2 3.1.6 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge jsonlines 4.0.0 pypi_0 pypi kornia 0.8.1 pypi_0 pypi kornia-rs 0.1.9 pypi_0 pypi lame 3.100 h166bdaf_1003 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge lcms2 2.17 h717163a_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ld_impl_linux-64 2.43 h1423503_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge lerc 4.0.0 h0aef613_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libasprintf 0.24.1 h8e693c7_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libblas 3.9.0 16_linux64_mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libcblas 3.9.0 16_linux64_mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libcublas 11.11.3.6 0 nvidia libcufft 10.9.0.58 0 nvidia libcufile 1.14.1.1 4 nvidia libcurand 10.3.10.19 0 nvidia libcusolver 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libdeflate 1.24 h86f0d12_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libdrm 2.4.125 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libegl 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libexpat 2.7.0 h5888daf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libffi 3.4.6 h2dba641_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libfreetype 2.13.3 ha770c72_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libfreetype6 2.13.3 h48d6fc4_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgcc 15.1.0 h767d61c_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgcc-ng 15.1.0 h69a702a_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgettextpo 0.24.1 h5888daf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgl 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libglvnd 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libglx 1.7.0 ha4b6fd6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libgomp 15.1.0 h767d61c_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libiconv 1.18 h4ce23a2_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libidn2 2.3.8 ha4ef2c3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libjpeg-turbo 3.1.0 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge liblapack 3.9.0 16_linux64_mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge liblzma 5.8.1 hb9d3cd8_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libnpp 11.8.0.86 0 nvidia libnsl 2.0.1 hb9d3cd8_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libnvjitlink 12.1.105 0 nvidia libnvjpeg 11.9.0.86 0 nvidia libpciaccess 0.18 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libpng 1.6.49 h943b412_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libsqlite 3.50.1 h6cd9bfd_6 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libstdcxx 15.1.0 h8f9b012_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libstdcxx-ng 15.1.0 h4852527_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libtasn1 4.20.0 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libtiff 4.7.0 hf01ce69_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libunistring 0.9.10 h7f98852_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libuuid 2.38.1 h0b41bf4_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libva 2.22.0 h4f16b4b_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libvpx 1.13.1 h59595ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libwebp 1.5.0 hae8dbeb_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libwebp-base 1.5.0 h851e524_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libxcb 1.17.0 h8a09558_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libxcrypt 4.4.36 hd590300_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libxml2 2.13.8 h4bc477f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge libzlib 1.3.1 hb9d3cd8_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge llvm-openmp 15.0.7 h0cdce71_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge markdown 3.8.2 pypi_0 pypi markupsafe 3.0.2 py310h89163eb_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge mkl 2022.1.0 hc2b9512_224 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main modelscope 1.27.1 pypi_0 pypi mpc 1.3.1 h24ddda3_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge mpfr 4.2.1 h90cbb55_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge mpmath 1.3.0 pyhd8ed1ab_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge multidict 6.5.1 pypi_0 pypi multiprocess 0.70.16 pypi_0 pypi natsort 8.4.0 pypi_0 pypi ncurses 6.5 h2d0b736_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge nettle 3.9.1 h7ab15ed_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge networkx 3.4.2 pyh267e887_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge numpy 2.2.6 py310hefbff90_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge nvidia-ml-py 12.575.51 pypi_0 pypi nvitop 1.5.1 pypi_0 pypi ocl-icd 2.3.3 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge opencl-headers 2025.06.13 h5888daf_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge opencv-python 4.11.0.86 pypi_0 pypi openh264 2.3.1 hcb278e6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge openjpeg 2.5.3 h5fbd93e_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge openssl 3.5.0 h7b32b05_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge p11-kit 0.24.1 hc5aa10d_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge packaging 25.0 pypi_0 pypi pandas 2.3.0 pypi_0 pypi peft 0.15.2 pypi_0 pypi pillow 11.2.1 py310h7e6dc6c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pip 25.1.1 pyh8b19718_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge platformdirs 4.3.8 pypi_0 pypi propcache 0.3.2 pypi_0 pypi protobuf 6.31.1 pypi_0 pypi psutil 7.0.0 pypi_0 pypi pthread-stubs 0.4 hb9d3cd8_1002 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pyarrow 20.0.0 pypi_0 pypi pycparser 2.22 pyh29332c3_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pydantic 2.11.7 pypi_0 pypi pydantic-core 2.33.2 pypi_0 pypi pysocks 1.7.1 pyha55dd90_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge python 3.10.18 hd6af730_0_cpython https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge python-dateutil 2.9.0.post0 pypi_0 pypi python_abi 3.10 7_cp310 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge pytorch 2.5.1 py3.10_cuda11.8_cudnn9.1.0_0 pytorch pytorch-cuda 11.8 h7e8668a_6 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2025.2 pypi_0 pypi pyyaml 6.0.2 py310h89163eb_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge readline 8.2 h8c095d6_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge regex 2024.11.6 pypi_0 pypi requests 2.32.4 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ruff 0.9.10 pypi_0 pypi safetensors 0.5.3 pypi_0 pypi sentencepiece 0.2.0 pypi_0 pypi sentry-sdk 2.31.0 pypi_0 pypi setproctitle 1.3.6 pypi_0 pypi setuptools 80.9.0 pyhff2d567_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge six 1.17.0 pypi_0 pypi smmap 5.0.2 pypi_0 pypi svt-av1 1.4.1 hcb278e6_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge sympy 1.13.1 pypi_0 pypi tensorboard 2.19.0 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi tesseract 0.1 pypi_0 pypi tk 8.6.13 noxft_hd72426e_102 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge tokenizers 0.21.2 pypi_0 pypi torchao 0.11.0 pypi_0 pypi torchaudio 2.5.1 py310_cu118 pytorch torchdata 0.10.1 pypi_0 pypi torchtriton 3.1.0 py310 pytorch torchvision 0.20.1 py310_cu118 pytorch tqdm 4.67.1 pypi_0 pypi transformers 4.52.4 pypi_0 pypi typing-inspection 0.4.1 pypi_0 pypi typing_extensions 4.14.0 pyhe01879c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge tzdata 2025.2 pypi_0 pypi urllib3 2.5.0 pyhd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge wandb 0.20.1 pypi_0 pypi wayland 1.23.1 h3e06ad9_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge wayland-protocols 1.45 hd8ed1ab_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge werkzeug 3.1.3 pypi_0 pypi wheel 0.45.1 pyhd8ed1ab_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge x264 1!164.3095 h166bdaf_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge x265 3.5 h924138e_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xorg-libx11 1.8.12 h4f16b4b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xorg-libxau 1.0.12 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xorg-libxdmcp 1.1.5 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xorg-libxext 1.3.6 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xorg-libxfixes 6.0.1 hb9d3cd8_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge xxhash 3.5.0 pypi_0 pypi yaml 0.2.5 h7f98852_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge yarl 1.20.1 pypi_0 pypi zipp 3.23.0 pypi_0 pypi zstandard 0.23.0 py310ha75aee5_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge zstd 1.5.7 hb8e6e7a_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge ``` ## Data Preparation Please refer to [DATA.md](DATA.md) for data generation scripts and dataset preparation. ## Training ### Pre-training or Full Fine-tuning To pre-train the full TesserAct model from CogVideoX, we provide a training script based on [Finetrainers](https://github.com/a-r-r-o-w/finetrainers). The training code supports distributed training with multiple GPUs or multi-nodes. To pre-train our TesserAct model, run the following command: ```bash bash train_i2v_depth_normal_sft.sh ``` To fine-tune our released TesserAct model, modify the model loading code in [tesseract/i2v_depth_normal_sft.py](tesseract/i2v_depth_normal_sft.py): ```python transformer = CogVideoXTransformer3DModel.from_pretrained_modify( "anyeZHY/tesseract", subfolder="tesseract_v01e_rgbdn_sft", ... ) ``` ### LoRA Fine-tuning You can efficiently fine-tune our TesserAct model using LoRA (Low-Rank Adaptation) with your own data (~100 videos). This approach requires approximately **~30GB GPU memory** and allows for efficient training (~2 days) on custom datasets. To fine-tune using LoRA, run the following command: ```bash bash train_i2v_depth_normal_lora.sh ``` > [!WARNING] > LoRA fine-tuning is experimental and not fully tested yet. > [!NOTE] > We will give a detailed training guide in the future: why TesserAct has better generalization, how to set the hyperparameters and performance between different training methods (SFT vs LoRA). > > We don't have a clear plan for releasing the whole dataset yet, because depth data is usually stored as floats, which takes up a lot of space and makes uploading to Hugging Face very difficult. However, we will provide scripts later on to show how to prepare the data. ## Inference Now TesserAct includes following models. The names of the models are in the format of `anyeZHY/tesseract/` (huggingface repo name) + `___`. In ``, postfix `p` indicates the model is production-ready and `e` means the model is experimental. We will keep updating the model weights and scaling the dataset to improve the performance of the models. ``` anyeZHY/tesseract/tesseract_v01e_rgbdn_sft anyeZHY/tesseract/tesseract_v01e_rgb_lora ``` > [!IMPORTANT] > It is recommended to read [USAGE.MD](doc/usage.md) for more details **before running the inference code on your own data.** We provide a guide on how to prepare inputs, such as text prompt. We also analyze the model's limitations and performance, including: > > - Tasks that the model can reliably accomplish. > > - Tasks that are achievable but with certain success rates. In the future, this may be improved by using techniques like test-time scaling. > > - Tasks that are currently beyond the model's capabilities. You can run the inference code with the following command (Optional flags: `--memory_efficient`). ```bash python inference/inference_rgbdn_sft.py \ --weights_path anyeZHY/tesseract/tesseract_v01e_rgbdn_sft \ --image_path asset/images/fruit_vangogh.png \ --prompt "pick up the apple google robot" ``` This inference code will generate a video of the google robot picking up the apple in the Van Gogh Painting. Try other prompts like `pick up the pear Franka Emika Panda`! Or `asset/images/majo.jpg` with prompt `Move the cup near bottle Franka Emika Panda`! For RGB-only generation using the LoRA model, you can use: ```bash python inference/inference_rgb_lora.py \ --weights_path anyeZHY/tesseract/tesseract_v01e_rgb_lora \ --image_path asset/images/fruit_vangogh.png \ --prompt "pick up the apple google robot" ``` The RGB LoRA model offers the best generalization quality for RGB video generation, making it ideal for diverse robotic manipulation tasks. For RGB+Depth+Normal generation using the LoRA model, you can use: ```bash python inference/inference_rgbdn_lora.py \ --base_weights_path anyeZHY/tesseract/tesseract_v01e_rgbdn_sft \ --lora_weights_path ./your_local_lora_weights \ --image_path asset/images/fruit_vangogh.png \ --prompt "pick up the apple google robot" ``` You may find output videos in the `results` folder. Note: When we test the model on another server, the results are exactly the same as those we uploaded to GitHub. So if you find they are different and get unexpected results like noisy videos, please check your environment and the version of the packages you are using. > [!WARNING] > Because RT1 and Bridge normal data is generated by [Temporal Marigold](https://huggingface.co/docs/diffusers/en/using-diffusers/marigold_usage#frame-by-frame-video-processing-with-temporal-consistency), sometimes normal outputs are not perfect. We are working on improving the data using [NormalCrafter](https://github.com/Binyr/NormalCrafter). Below is a list of TODOs for the inference part. - [x] LoRA inference code - [ ] Blender rendering code (check package [PyBlend](https://github.com/anyeZHY/PyBlend)!) - [ ] Normal Integration ## Citation If you find our work useful, please consider citing: ```bibtex @article{zhen2025tesseract, title={TesserAct: Learning 4D Embodied World Models}, author={Haoyu Zhen and Qiao Sun and Hongxin Zhang and Junyan Li and Siyuan Zhou and Yilun Du and Chuang Gan}, year={2025}, eprint={2504.20995}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.20995}, } ``` ## Acknowledgements We would like to thank the following works for their code and models: - Training: [CogVideo](https://github.com/THUDM/CogVideo), [Finetrainers](https://github.com/a-r-r-o-w/finetrainers) and [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) - Data generation: [RollingDepth](https://github.com/prs-eth/rollingdepth), [Marigold](https://github.com/prs-eth/Marigold) and [DSINE](https://github.com/baegwangbin/DSINE) - Datasets: [OpenX](https://robotics-transformer-x.github.io/), [RLBench](https://github.com/stepjam/RLBench), [Hiveformer](https://github.com/vlc-robot/hiveformer) and [Colosseum](https://github.com/robot-colosseum/robot-colosseum) - Why normals: [BiNI](https://github.com/xucao-42/NormalIntegration), [ICON](https://github.com/YuliangXiu/ICON), [StableNormal](https://github.com/Stable-X/StableNormal) and [NormalCrafter](https://github.com/Binyr/NormalCrafter) We are extremely grateful to Pengxiao Han for assistance with the baseline code, and to Yuncong Yang, Sunli Chen, Jiaben Chen, Zeyuan Yang, Zixin Wang, Lixing Fang, and many other friends in our [Embodied AGI Lab](https://embodied-agi.cs.umass.edu/) for their helpful feedback and insightful discussions.