# PaddleMIX
**Repository Path**: Covirtue/PaddleMIX
## Basic Information
- **Project Name**: PaddleMIX
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: develop
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2024-05-28
- **Last Updated**: 2024-05-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
中文文档
## Introduction
PaddleMIX is a large multi-modal development kit based on PaddlePaddle, which aggregates multiple functions such as images, texts, and videos, and covers a variety of multi-modal tasks such as visual language pre-training, textual images, and textual videos. It provides an out-of-the-box development experience while meeting developers’ flexible customization needs and exploring general artificial intelligence.
## Updates
**2024.04.17**
* [PPDiffusers](./ppdiffusers/README.md) published version 0.24.0, it supports DiT and other Sora-related technologies. Supporting SVD and other video generation models
**2023.10.7**
* Published PaddleMIX version 1.0
* Newly added distributed training capability for image-text pre-training models. BLIP-2 supports training on scales up to one hundred billion parameters.
* Newly added cross-modal application pipeline [AppFlow](./applications/README.md), which supports automatic annotation, image editing, sound-to-image, and 11 other cross-modal applications with just one click.
* [PPDiffusers](./ppdiffusers/README.md) has released version 0.19.3, introducing SDXL and related tasks.
**2023.7.31**
* Published PaddleMIX version 0.1
* The PaddleMIX large multi-modal model development toolkit is released for the first time, integrating the PPDiffusers multi-modal diffusion model toolbox and widely supporting the PaddleNLP large-language models.
* Added 12 new large multi-modal models including EVA-CLIP, BLIP-2, miniGPT-4, Stable Diffusion, ControlNet, etc.
## Main Features
- **Rich Multi-Modal Functionality:** Encompassing image-text pre-training, text-to-image, multi-modal visual tasks, enabling diverse functions like image editing, image description, data annotation, and more.
- **Simplified Development Experience:** Unified model development interface facilitating efficient custom model development and feature implementation.
- **Efficient Training and Inference Workflow:** Streamlined end-to-end development process for training and inference, with standout performance in training and inference for key models such as BLIP-2, Stable Diffusion, etc., leading the industry.
- **Support for Ultra-Large Scale Training:** Capable of training models up to the scale of hundreds of billions for image-text pre-training, and base models up to the scale of tens of billions for text-to-image.
## Demo
- video Demo
https://github.com/PaddlePaddle/PaddleMIX/assets/29787866/8d32722a-e307-46cb-a8c0-be8acd93d2c8
## Installation
1. Environment Dependencies
```
pip install -r requirements.txt
```
Detailed [installation]((https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)) tutorials for PaddlePaddle
> Note: parts of Some models in ppdiffusers require CUDA 11.2 or higher. If your local machine does not meet the requirements, it is recommended to go to [AI Studio](https://aistudio.baidu.com/index) for model training and inference tasks.
> If you wish to train and infer using **bf16**, please use a GPU that supports **bf16**, such as the A100.
2. Manual Installation
```
git clone https://github.com/PaddlePaddle/PaddleMIX
cd PaddleMIX
pip install -e .
#ppdiffusers 安装
cd ppdiffusers
pip install -e .
```
## Tutorial
- [Quick Start](applications/README_en.md/#quick-start)
- [Fine-Tuning](paddlemix/tools/README_en.md)
- [Inference Deployment](deploy/README_en.md)
## Specialized Applications
1. Artistic Style QR Code Model
Try it out: https://aistudio.baidu.com/community/app/1339
2. Image Mixing
Try it out: https://aistudio.baidu.com/community/app/1340
## Datasets
|
Multi-modal Pre-training
|
Diffusion-based Models
|
|
Image-Text Pre-training
Open World Vision Models
More Multi-Modal Pre-trained Models
|
Text-to-Image
Text-to-Video
Audio Generation
|
For more information on additional model capabilities, please refer to the [Model Capability Matrix](./paddlemix/examples/README.md).
## LICENSE
This repository is licensed under the [Apache 2.0 license](LICENSE)