# 1d-tokenizer **Repository Path**: ByteDance/ByteDance_1d-tokenizer ## Basic Information - **Project Name**: 1d-tokenizer - **Description**: This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-06-23 - **Last Updated**: 2025-10-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 1D Visual Tokenization and Generation This repo hosts the code and models for the following projects: - FlowTok: [FlowTok: Flowing Seamlessly Across Text and Image Tokens](https://tacju.github.io/projects/flowtok.html) - TA-TiTok & MaskGen: [Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens](https://tacju.github.io/projects/maskgen.html) - RAR: [Randomized Autoregressive Visual Generation](https://yucornetto.github.io/projects/rar.html) - TiTok: [An Image is Worth 32 Tokens for Reconstruction and Generation](https://yucornetto.github.io/projects/titok.html) ## Updates - 03/16/2025: The [tech report](https://arxiv.org/abs/2503.10772) of FlowTok is available. FlowTok is a minimal yet powerful framework that seamlessly flows across text and images by encoding images into a compact 1D token representation. Code will be released soon. - 02/24/2025: We release the training code, inference code and model weights of MaskGen. - 01/17/2025: We release the training code, inference code and model weights of TA-TiTok. - 01/14/2025: The [tech report](https://arxiv.org/abs/2501.07730) of TA-TiTok and MaskGen is available. TA-TiTok is an innovative text-aware transformer-based 1-dimensional tokenizer designed to handle both discrete and continuous tokens. MaskGen is a powerful and efficient text-to-image masked generative model trained exclusively on open-data. For more details, refer to the [README_MaskGen](README_MaskGen.md). - 11/04/2024: We release the [tech report](https://arxiv.org/abs/2411.00776) and code for RAR models. - 10/16/2024: We update a set of TiTok tokenizer weights trained with an updated single-stage recipe, leading to easier training and better performance. We release the weight of different model size for both VQ and VAE variants TiTok, which we hope could facilitate the research in this area. More details are available in the [tech report](https://arxiv.org/abs/2501.07730) of TA-TiTok. - 09/25/2024: TiTok is accepted by NeurIPS 2024. - 09/11/2024: Release the training codes of generator based on TiTok. - 08/28/2024: Release the training codes of TiTok. - 08/09/2024: Better support on loading pretrained weights from huggingface models, thanks for the help from [@NielsRogge](https://github.com/NielsRogge)! - 07/03/2024: Evaluation scripts for reproducing the results reported in the paper, checkpoints of TiTok-B64 and TiTok-S128 are available. - 06/21/2024: Demo code and TiTok-L-32 checkpoints release. - 06/11/2024: The [tech report](https://arxiv.org/abs/2406.07550) of TiTok is available. ## Short Intro on [Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens](https://arxiv.org/abs/2501.07730) ([README](README_MaskGen.md)) We introduce TA-TiTok, a novel text-aware transformer-based 1D tokenizer designed to handle both discrete and continuous tokens while effectively aligning reconstructions with textual descriptions. Building on TA-TiTok, we present MaskGen, a versatile text-to-image masked generative model framework. Trained exclusively on open data, MaskGen demonstrates outstanding performance: with 32 continuous tokens, it achieves a FID score of 6.53 on MJHQ-30K, and with 128 discrete tokens, it attains an overall score of 0.57 on GenEval.