# UltraPixel **Repository Path**: lql0716/UltraPixel ## Basic Information - **Project Name**: UltraPixel - **Description**: No description available - **Primary Language**: Unknown - **License**: AGPL-3.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-05 - **Last Updated**: 2024-12-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks (NeurIPS 2024) [![arXiv](https://img.shields.io/badge/arXiv-paper-red)](https://arxiv.org/abs/2407.02158) [![Full Paper](https://img.shields.io/badge/Full_Paper-PDF-blue)](https://drive.google.com/file/d/1X18HH9kj7ltAnZorrkD84RJEdsJu4gDF/view?usp=sharing) [![Project Homepage](https://img.shields.io/badge/Project-Homepage-brightgreen)](https://jingjingrenabc.github.io/ultrapixel/) [![Hugging Face Demo](https://img.shields.io/badge/Hugging_Face-Demo-yellow)](https://huggingface.co/spaces/roubaofeipi/UltraPixel-demo) UltraPixel is designed to create exceptionally high-quality, detail-rich images at various resolutions, pushing the boundaries of ultra-high-resolution image synthesis. For more details and to see more stunning images, please visit the [Project Page](https://jingjingrenabc.github.io/ultrapixel/). The [arXiv version](https://arxiv.org/abs/2407.02158) of the paper contains compressed images, while the [full paper](https://drive.google.com/file/d/1X18HH9kj7ltAnZorrkD84RJEdsJu4gDF/view?usp=sharing) features uncompressed, high-quality images. ## 🔥 **Updates:** - **`2024/09/26`**: 🎉 UltraPixel has been accepted to NeurIPS 2024! - **`2024/09/19`**: 🤗 We released the [HuggingFace Space](https://huggingface.co/spaces/roubaofeipi/UltraPixel-demo), thanks to the HF team and [Gradio](https://github.com/gradio-app/gradio)! Gradio interface for text-to-image inference is also provided, and please see Inference section! - **`2024/09/19`**: We have updated the versions of PyTorch and Torchvision in our environment. On an RTX 4090 GPU, generating a 2560×5120 image (without stage_a_tiled) now takes approximately 60 seconds, compared to about three minutes in the previous setup. ![teaser](figures/teaser.jpg) ## Getting Started **1.** Install dependency by running: ``` pip install -r requirements.txt ``` **2.** Download pre-trained models from [StableCascade model downloading instructions](https://github.com/Stability-AI/StableCascade/tree/master/models). Small-big models (the small model for stage b and the big model for stage with bfloat16 format are used.) The big-big setting is also supported, while small-big favors more efficiency. **3.** Download newly added parameters of UltraPixel from [here](https://huggingface.co/roubaofeipi/UltraPixel). **Note**: All model downloading urls are provided [here](./models/models_checklist.txt). They should be put in the directory [models](./models). ## Inference ### Text-guided Image Generation We provide Gradio interface for inference. Run by : ``` CUDA_VISIBLE_DEVICES=0 python app.py ``` Or generate an image by running: ``` CUDA_VISIBLE_DEVICES=0 python inference/test_t2i.py ``` **Tips**: To generate aesthetic images, use detailed prompts with specific descriptions. It's recommended to include elements such as the subject, background, colors, lighting, and mood, and enhance your prompts with high-quality modifiers like "high quality", "rich detail", "8k", "photo-realistic", "cinematic", and "perfection". For example, use "A breathtaking sunset over a serene mountain range, with vibrant orange and purple hues in the sky, high quality, rich detail, 8k, photo-realistic, cinematic lighting, perfection". Be concise but detailed, specific and clear, and experiment with different word combinations for the best results. Several example prompts are provided [here](./prompt_list.txt). It is recommended to add "--stage_a_tiled" for decoding in stage a to save memory. The table below show memory requirements and running times on different GPUs. For the A100 with 80GB memory, tiled decoding is not necessary. **On 80G A100:** | Resolution | Stage C | Stage B | Stage A | |---------------------|----------|---------|--------| |2048*2048 |15.9G / 12s | 14.5G / 4s |**w/o tiled**: 11.2G / 1s | |4096*4096 |18.7G / 52s | 19.7G / 26s |**w/o tiled**: 45.3G / 2s, **tiled**: 9.3G / 128s| **On 32G V100** (only works using float32 on Stages C and B): | Resolution | Stage C | Stage B | Stage A | |---------------------|----------|---------|-------------------------------| |2048*2048 |16.7G / 83s | 11.7G / 22s |**w/o tiled**: 10.1G / 2s | |4096*4096 |18.0G / 287s | 22.7G / 172s |**w/o tiled**: OOM, **tiled**: 9.0G / 305s| **On 24G RTX4090:** | Resolution | Stage C | Stage B | Stage A | |---------------------|----------|---------|-------------------------------| |2048*2048 |15.5G / 83s | 13.2G / 22s |**w/o tiled**: 11.3G / 1s | |4096*4096 |19.9G / 153s | 23.4G / 44s |**w/o tiled**: OOM, **tiled**: 11.3G / 114s | ### Personalized Image Generation The repo provides a personalized model of a cat. Download the personalized model [here](https://huggingface.co/roubaofeipi/UltraPixel/blob/main/lora_cat.safetensors) and run the following command to generate personalized results. Note that in the text command you need to use identifier "cat [roubaobao]" to indicate the cat. ``` CUDA_VISIBLE_DEVICES=0 python inference/test_personalized.py ``` ### Controlnet Image Generation Download Canny [ControlNet](https://huggingface.co/stabilityai/stable-cascade/resolve/main/controlnet/canny.safetensors) provided by StableCascade and run the command: ``` CUDA_VISIBLE_DEVICES=0 python inference/test_controlnet.py ``` Note that ControlNet is used without further fine-tuning, so the supported highest resolution is 4K, e.g., 3840 * 2160, 2048 * 2048. ## T2I Training Put all your images and captions into a folder. Here's an example training dataset [here](./figures/example_dataset) for reference. Start training by running: ``` CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train/train_t2i.py configs/training/t2i.yaml ``` ## Personalized Training Put all your images into a folder. Here's an expample training dataset [here](./figures/example_dataset). The training prompt can be described as: a photo of a cat [roubaobao]. Start training by running: ``` CUDA_VISIBLE_DEVICES=0,1 python train/train_personalized.py \ configs/training/lora_personalization.yaml ``` ## Citation ```bibtex @article{ren2024ultrapixel, title={UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks}, author={Ren, Jingjing and Li, Wenbo and Chen, Haoyu and Pei, Renjing and Shao, Bin and Guo, Yong and Peng, Long and Song, Fenglong and Zhu, Lei}, journal={arXiv preprint arXiv:2407.02158}, year={2024} } ``` ## Contact Information To reach out to the paper’s authors, please refer to the contact information provided on the [project page](https://jingjingrenabc.github.io/ultrapixel/). ## Acknowledgements This project is build upon [StableCascade](https://github.com/Stability-AI/StableCascade) and [Trans-inr](https://github.com/yinboc/trans-inr). Thanks for their code sharing :)