# images_mixing
**Repository Path**: xiaoa2020/images_mixing
## Basic Information
- **Project Name**: images_mixing
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-06-28
- **Last Updated**: 2024-06-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# CLIP Guided Images Mixing With Stable Diffusion
Now you can use this images mixing in official diffusers repo.
This approach allows you to combine two images using standard diffusion models `without any prior models`.
Modified and extended existing clip guided stable diffusion algorithm to work with images.
```WARNING: It's hard to get a good result of image mixing the first time.```
## Code examples description
All examples you can find in ./jupyters folder:
| File Name | Description |
|---|---|
| example-no-CoCa.ipynb | Short minimal example for images mixing. The weakness of this approach is that you should write prompts for each image. |
| example-stable-diffusion-2-base.ipynb | Example with stable-diffusion-2-base. For prompt generation CoCa is used.|
| example-load-by-parts.ipynb | Example where each diffusers module is loading separately. |
| example-find-best-mix-result.ipynb | Step by step explained how to get the parameters for mixing. (By complete enumeration of each parameter. xD) |
| example-as-augmentation.ipynb.ipynb | Using image mixing for image augmentation. Summer to winter example. |
## Short Method Description
Algorithm based on idea of clip guided stable diffusion img2img. But with some modifications:
- Now two images and (optionaly) two prompts (description of each image) are expected.
- Using interpolated (content-style) CLIP image embedding. (CLIP text embedding in original)
- Using interpolated (content-style) text embedding for guidance. (text embedding in original)
- (Optionaly) Using CoCa model for generation image description
### Using different coefficients you can select type of mixing: from style to content or from content to style. Parameters description see below.
`Style to prompt` and `Prompt to style` give different result. Example.
## Getting Started
```
git clone https://github.com/TheDenk/images_mixing.git
cd images_mixing
pip -r install requirements.txt
```
## Short Example
```python
import torch
from PIL import Image
from diffusers import DiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPModel
# Loading additional models
feature_extractor = CLIPFeatureExtractor.from_pretrained(
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
)
clip_model = CLIPModel.from_pretrained(
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16
)
# Pipline creating
mixing_pipeline = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="./images_mixing.py",
clip_model=clip_model,
feature_extractor=feature_extractor,
torch_dtype=torch.float16,
)
mixing_pipeline.enable_attention_slicing()
mixing_pipeline = mixing_pipeline.to("cuda")
# Pipline running
generator = torch.Generator(device="cuda").manual_seed(117)
content_image = Image.open('./images/boromir.jpg').convert("RGB")
style_image = Image.open('./images/gigachad.jpg').convert("RGB")
pipe_images = mixing_pipeline(
content_prompt='boromir',
style_prompt='gigachad',
num_inference_steps=50,
content_image=content_image,
style_image=style_image,
noise_strength=0.6,
slerp_latent_style_strength=0.8,
slerp_prompt_style_strength=0.2,
slerp_clip_image_style_strength=0.2,
guidance_scale=9.0,
batch_size=1,
clip_guidance_scale=100,
generator=generator,
).images
pipe_images[0]
```
## Using as augmentation
With Segment anything you can effectively augmenting a dataset of images (Jupyter notebook example).
## Short Parameters Description
#### Each `slerp_` parameter has an impact on both images - style and content (more style - less content and and vice versa)
$content strength = 1.0 - stylestrength$
| Parameter Name | Description |
|---|---|
| **slerp_latent_style_strength** | parameter has an impact on start noised latent space. Calculate as spherical distance between latent spaces of style image and content image. |
| **slerp_prompt_style_strength** | parameter has an impact on each diffusion iteration as usual prompt and for clip-guided algorithm. Calculate with CLIP text model as spherical distance between clip text embeddings of style prompt and content prompt. |
| **slerp_clip_image_style_strength** | parameter has an impact on each diffusion iteration for clip-guided algorithm. Calculate with CLIP image model as spherical distance between clip image embeddings of style image and content image. |
| **noise_strength** | just noise coefficient. Less value - more original information from start latent space. Recommended minimum value - 0.5, maximum - 0.7. |
### From style to content recommended start parameters:
```
noise_strength=0.5
slerp_latent_style_strength=0.8
slerp_prompt_style_strength=0.2
slerp_clip_image_style_strength=0.2
```
### From content to style recommended start parameters:
```
noise_strength=0.5
slerp_latent_style_strength=0.2
slerp_prompt_style_strength=0.8
slerp_clip_image_style_strength=0.8
```
## Contacts
Issues should be raised directly in the repository. For professional support and recommendations please welcomedenk@gmail.com.