November 30, 2023
November 28, 2023
We are releasing SDXL-Turbo, a lightning fast text-to image model. Alongside the model, we release a technical report
pip install streamlit-keyup
.checkpoints/
directory.streamlit run scripts/demo/turbo.py
.November 21, 2023
We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
deflickering decoder
.SVD
but finetuned
for 25 frame generation.scripts/demo/video_sampling.py
and a standalone python script scripts/sampling/simple_video_sample.py
for inference of both models.July 26, 2023
CreativeML Open RAIL++-M
license (see Inference for file
hashes):
SDXL-base-0.9
.SDXL-refiner-0.9
.July 4, 2023
June 22, 2023
SDXL-base-0.9
: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The
base model uses OpenCLIP-ViT/G
and CLIP-ViT/L for text encoding whereas the refiner model only uses
the OpenCLIP model.SDXL-refiner-0.9
: The refiner has been trained to denoise small noise levels of high quality data and as such is
not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.If you would like to access these models for your research, please apply using one of the following links: SDXL-0.9-Base model, and SDXL-0.9-Refiner. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access. We plan to do a full release soon (July).
Modularity is king. This repo implements a config-driven approach where we build and combine submodules by
calling instantiate_from_config()
on objects defined in yaml configs. See configs/
for many examples.
ldm
codebaseFor training, we use PyTorch Lightning, but it should be easy to use other
training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion
,
now DiffusionEngine
) has been cleaned up:
GeneralConditioner
,
see sgm/modules/encoders/modules.py
.sgm/modules/diffusionmodules/guiders.py
) from the
samplers (sgm/modules/diffusionmodules/sampling.py
), and the samplers are independent of the model.sgm/modules/diffusionmodules/denoiser.py
.sgm/modules/diffusionmodules/denoiser_weighting.py
), preconditioning of the
network (sgm/modules/diffusionmodules/denoiser_scaling.py
), and sampling of noise levels during
training (sgm/modules/diffusionmodules/sigma_sampling.py
).git clone https://github.com/Stability-AI/generative-models.git
cd generative-models
This is assuming you have navigated to the generative-models
root after cloning it.
NOTE: This is tested under python3.10
. For other python versions, you might encounter version conflicts.
PyTorch 2.0
# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install -r requirements/pt2.txt
sgm
pip3 install .
sdata
for trainingpip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata
This repository uses PEP 517 compliant packaging using Hatch.
To build a distributable wheel, install hatch
and run hatch build
(specifying -t wheel
will skip building a sdist, which is not necessary).
pip install hatch
hatch build -t wheel
You will find the built package in dist/
. You can install the wheel with pip install dist/*.whl
.
Note that the package does not currently specify dependencies; you will need to install the required packages, depending on your use case and PyTorch version, manually.
We provide a streamlit demo for text-to-image and image-to-image sampling
in scripts/demo/sampling.py
.
We provide file hashes for the complete file as well as for only the saved tensors in the file (
see Model Spec for a script to evaluate that).
The following models are currently supported:
File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b
Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7
File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f
Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81
Weights for SDXL:
SDXL-1.0:
The weights of SDXL-1.0 are available (subject to
a CreativeML Open RAIL++-M
license) here:
SDXL-0.9: The weights of SDXL-0.9 are available and subject to a research license. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0.9 model, and SDXL-refiner-0.9. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access.
After obtaining the weights, place them into checkpoints/
.
Next, start the demo using
streamlit run scripts/demo/sampling.py --server.port <your_port>
Images generated with our code use the invisible-watermark library to embed an invisible watermark into the model output. We also provide a script to easily detect that watermark. Please note that this watermark is not the same as in previous Stable Diffusion 1.x/2.x versions.
To run the script you need to either have a working installation as above or try an experimental import using only a minimal amount of packages:
python -m venv .detect
source .detect/bin/activate
pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark
To run the script you need to have a working installation as above. The script
is then useable in the following ways (don't forget to activate your
virtual environment beforehand, e.g. source .pt1/bin/activate
):
# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*
We are providing example training configs in configs/example_training
. To launch a training, run
python main.py --base configs/<config1.yaml> configs/<config2.yaml>
where configs are merged from left to right (later configs overwrite the same values). This can be used to combine model, training and data configs. However, all of them can also be defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST, run
python main.py --base configs/example_training/toy/mnist_cond.yaml
NOTE 1: Using the non-toy-dataset
configs configs/example_training/imagenet-f8_cond.yaml
, configs/example_training/txt2img-clipl.yaml
and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml
for training will require edits depending on the
used dataset (which is expected to stored in tar-file in
the webdataset-format). To find the parts which have to be adapted, search
for comments containing USER:
in the respective config.
NOTE 2: This repository supports both pytorch1.13
and pytorch2
for training generative models. However for
autoencoder training as e.g. in configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml
,
only pytorch1.13
is supported.
NOTE 3: Training latent generative models (as e.g. in configs/example_training/imagenet-f8_cond.yaml
) requires
retrieving the checkpoint from Hugging Face and replacing
the CKPT_PATH
placeholder in this line. The same is to be done
for the provided text-to-image configs.
The GeneralConditioner
is configured through the conditioner_config
. Its only attribute is emb_models
, a list of
different embedders (all inherited from AbstractEmbModel
) that are used to condition the generative model.
All embedders should define whether or not they are trainable (is_trainable
, default False
), a classifier-free
guidance dropout rate is used (ucg_rate
, default 0
), and an input key (input_key
), for example, txt
for
text-conditioning or cls
for class-conditioning.
When computing conditionings, the embedder will get batch[input_key]
as input.
We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated
appropriately.
Note that the order of the embedders in the conditioner_config
is important.
The neural network is set through the network_config
. This used to be called unet_config
, which is not general
enough as we plan to experiment with transformer-based diffusion backbones.
The loss is configured through loss_config
. For standard diffusion model training, you will have to
set sigma_sampler_config
.
As discussed above, the sampler is independent of the model. In the sampler_config
, we set the type of numerical
solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free
guidance.
For large scale training we recommend using the data pipelines from our data pipelines project. The project is contained in the requirement and automatically included when following the steps from the Installation section. Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of data keys/values, e.g.,
example = {"jpg": x, # this is a tensor -1...1 chw
"txt": "a beautiful image"}
where we expect images in -1...1, channel-first format.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。