# pippo **Repository Path**: Junjiagit/pippo ## Basic Information - **Project Name**: pippo - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-17 - **Last Updated**: 2025-12-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Pippo: High-Resolution Multi-View Humans from a Single Image

CVPR, 2025 (Highlight)

Pippo

Yash Kant^1,2,3 · Ethan Weber^1,4 · Jin Kyu Kim¹ · Rawal Khirodkar¹ · Su Zhaoen¹ · Julieta Martinez¹
Igor Gilitschenski*^2,3 · Shunsuke Saito*¹ · Timur Bagautdinov*¹

* Joint Advising

¹ Meta Reality Labs · ² University of Toronto · ³ Vector Institute · ⁴ UC Berkeley

We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo. Pippo is a multi-view diffusion transformer and does not require any additional inputs — e.g., a fitted parametric model or camera parameters of the input image. #### This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256. ## Setup code Clone and add repository to your path: ``` git clone git@github.com:facebookresearch/pippo.git cd pippo export PATH=$PATH:$PWD ``` ## Prerequisites and Dependencies ``` conda create -n pippo python=3.10.1 -c conda-forge conda activate pippo # can adjust as required (we tested on below configuration) conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidia pip install -r requirements.txt ``` ## Download and Sample Training You can launch a sample training run on few samples of [Ava-256 dataset](https://github.com/facebookresearch/ava-256). We provide pre-packaged samples for this training stored as npy files [here](https://huggingface.co/datasets/yashkant/pippo/tree/main). Ensure you are authenticated to huggingface with login token to download the samples. ``` # download packaged Ava-256 samples python scripts/pippo/download_samples.py ``` We provide exact model configs to train Pippo models at different resolutions of 128, 512, and 1024 placed in `config/full/` directory. ``` # launch training (tested on single A100 GPU 80GB): full sized model python train.py config/full/128_4v.yml ``` Additionally, we provide a tiny model config to train on a smaller gpu: ``` # launch training (tested on single T4 GPU 16GB): tiny model python train.py config/tiny/128_4v_tiny.yml ``` ## Training on custom dataset (see https://github.com/facebookresearch/pippo/issues/9): You will have to prepare your dataset similar to the provided [Ava-256 samples stored in numpy files](https://huggingface.co/datasets/yashkant/pippo/tree/main/ava_samples) on your custom dataset. The difficult bits could be to create the Plucker Rays and Spatial Anchor images, and we have provided our implementations for those methods (using Ava-256 and Goliath data) [in this gist here](https://gist.github.com/yashkant/971e205d85b15e17d20d33edd29d6016). You can refer these methods to create these fields on your own custom datasets! ## Re-projection Error To compute the re-projection error between generated images and ground truth images, run the following command: ``` python scripts/pippo/reprojection_error.py ``` ## Useful Pointers Here is a list of useful things to borrow from this codebase: - ControlMLP to inject spatial control in Diffusion Transformers: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/control_mlp.py#L161) - Attention Biasing to run inference on 5x longer sequences: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/dit.py#L165) - Re-projection Error Metric: [see here](https://github.com/facebookresearch/pippo/blob/main/scripts/pippo/reprojection_error.py#L150) ## Todos We plan to add and update the following in the future: - Cleaning up fluff in pippo.py and dit.py - Inference script for pretrained models. ## License See LICENSE file for details. ## Citation If you benefit from this codebase, consider citing our work: ``` @article{Kant2024Pippo, title={Pippo: High-Resolution Multi-View Humans from a Single Image}, author={Yash Kant and Ethan Weber and Jin Kyu Kim and Rawal Khirodkar and Su Zhaoen and Julieta Martinez and Igor Gilitschenski and Shunsuke Saito and Timur Bagautdinov}, year={2025}, } ```