# mf-rae **Repository Path**: mirrors_sony/mf-rae ## Basic Information - **Project Name**: mf-rae - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-18 - **Last Updated**: 2026-05-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## MeanFlow Transformers with Representation Autoencoders (RAE)
Official PyTorch Implementation ### [Paper](https://www.arxiv.org/abs/2511.13019) This repository is based on: - [**Diffusion Transformers with Representation Autoencoders**](https://arxiv.org/abs/2510.11690). - and our previous work, CMT: [**Consistency Mid-Training**](https://github.com/sony/cmt). ## Environment ### Dependency Setup 1. Create environment and install via `uv`: ```bash conda create -n rae python=3.10 -y conda activate rae pip install uv # Install PyTorch 2.2.0 with CUDA 12.1 uv pip install torch==2.2.0 torchvision==0.17.0 torchaudio --index-url https://download.pytorch.org/whl/cu121 # Install other dependencies uv pip install timm==0.9.16 accelerate==0.23.0 torchdiffeq==0.2.5 wandb uv pip install "numpy<2" transformers einops omegaconf ``` ## Data Preparation 1. Download ImageNet-1k **raw data without preprocessing**. 2. Point Stage 1 and Stage 2 scripts to the training split via `--data-path`. ## Pre-Training: Flow Matching The RAE authors release flow-matching pre-traine models: RAE decoders, DiTDH diffusion transformers and stats for latent normalization. To download all models at once: ```bash cd RAE pip install huggingface_hub hf download nyu-visionx/RAE-collections \ --local-dir models ``` To download specific models, run: ```bash hf download nyu-visionx/RAE-collections \ \ --local-dir models ``` ## Consistent Mid-Training ```bash bash CMT_256.sh ``` ```bash bash CMT_512.sh ``` ## MFT and MFD Post-Training Make sure to input the CMT checkpoint path obtained from the previous stage. For instance, on ImageNet 512, they are ```bash bash MFT_512.sh ``` ```bash bash MFD_512.sh ``` ## Distributed sampling for evaluation Make sure to input the MeanFlow-RAE checkpoint path after training to the config file. We provide our trained MF-RAE on Google Drive: https://drive.google.com/drive/folders/1EYVyIDKRZeHn6NO7uF5aJ1ycR3lvfnJu?usp=drive_link ```bash bash Sample_256.sh ``` ```bash bash Sample_512.sh ``` ## Evaluation ### ADM Suite FID setup Use the ADM evaluation suite to score generated samples: 1. Clone the repo: ```bash git clone https://github.com/openai/guided-diffusion.git cd guided-diffusion/evaluation ``` 2. Create an environment and install dependencies: ```bash conda create -n adm-fid python=3.10 conda activate adm-fid pip install 'tensorflow[and-cuda]'==2.19 scipy requests tqdm ``` 3. Download ImageNet statistics (256×256 shown here): ```bash wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz ``` 4. Evaluate: ```bash python evaluator.py VIRTUAL_imagenet256_labeled.npz /path/to/samples.npz ``` ## Acknowledgement This code is built upon the following repositories: * [SiT](https://github.com/willisma/sit) - for diffusion implementation and training codebase. * [DDT](https://github.com/MCG-NJU/DDT) - for some of the DiTDH implementation. * [LightningDiT](https://github.com/hustvl/LightningDiT/) - for the PyTorch Lightning based DiT implementation. * [MAE](https://github.com/facebookresearch/mae) - for the ViT decoder architecture. * [RAE](https://github.com/bytetriper/RAE) - for the RAE model and checkpoints.