# PaddleMIX **Repository Path**: Covirtue/PaddleMIX ## Basic Information - **Project Name**: PaddleMIX - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: develop - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2024-05-28 - **Last Updated**: 2024-05-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

## Introduction PaddleMIX is a large multi-modal development kit based on PaddlePaddle, which aggregates multiple functions such as images, texts, and videos, and covers a variety of multi-modal tasks such as visual language pre-training, textual images, and textual videos. It provides an out-of-the-box development experience while meeting developers’ flexible customization needs and exploring general artificial intelligence. ## Updates **2024.04.17** * [PPDiffusers](./ppdiffusers/README.md) published version 0.24.0, it supports DiT and other Sora-related technologies. Supporting SVD and other video generation models **2023.10.7** * Published PaddleMIX version 1.0 * Newly added distributed training capability for image-text pre-training models. BLIP-2 supports training on scales up to one hundred billion parameters. * Newly added cross-modal application pipeline [AppFlow](./applications/README.md), which supports automatic annotation, image editing, sound-to-image, and 11 other cross-modal applications with just one click. * [PPDiffusers](./ppdiffusers/README.md) has released version 0.19.3, introducing SDXL and related tasks. **2023.7.31** * Published PaddleMIX version 0.1 * The PaddleMIX large multi-modal model development toolkit is released for the first time, integrating the PPDiffusers multi-modal diffusion model toolbox and widely supporting the PaddleNLP large-language models. * Added 12 new large multi-modal models including EVA-CLIP, BLIP-2, miniGPT-4, Stable Diffusion, ControlNet, etc. ## Main Features - **Rich Multi-Modal Functionality:** Encompassing image-text pre-training, text-to-image, multi-modal visual tasks, enabling diverse functions like image editing, image description, data annotation, and more. - **Simplified Development Experience:** Unified model development interface facilitating efficient custom model development and feature implementation. - **Efficient Training and Inference Workflow:** Streamlined end-to-end development process for training and inference, with standout performance in training and inference for key models such as BLIP-2, Stable Diffusion, etc., leading the industry. - **Support for Ultra-Large Scale Training:** Capable of training models up to the scale of hundreds of billions for image-text pre-training, and base models up to the scale of tens of billions for text-to-image. ## Demo - video Demo https://github.com/PaddlePaddle/PaddleMIX/assets/29787866/8d32722a-e307-46cb-a8c0-be8acd93d2c8 ## Installation 1. Environment Dependencies ``` pip install -r requirements.txt ``` Detailed [installation]((https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html)) tutorials for PaddlePaddle > Note: parts of Some models in ppdiffusers require CUDA 11.2 or higher. If your local machine does not meet the requirements, it is recommended to go to [AI Studio](https://aistudio.baidu.com/index) for model training and inference tasks. > If you wish to train and infer using **bf16**, please use a GPU that supports **bf16**, such as the A100. 2. Manual Installation ``` git clone https://github.com/PaddlePaddle/PaddleMIX cd PaddleMIX pip install -e . #ppdiffusers 安装 cd ppdiffusers pip install -e . ``` ## Tutorial - [Quick Start](applications/README_en.md/#quick-start) - [Fine-Tuning](paddlemix/tools/README_en.md) - [Inference Deployment](deploy/README_en.md) ## Specialized Applications 1. Artistic Style QR Code Model

Try it out: https://aistudio.baidu.com/community/app/1339

2. Image Mixing

Try it out: https://aistudio.baidu.com/community/app/1340

## Datasets

Multi-modal Pre-training

Diffusion-based Models

Image-Text Pre-training

Open World Vision Models

More Multi-Modal Pre-trained Models

ImageBind

Text-to-Image

Text-to-Video

LVDM

Audio Generation

AudioLDM

For more information on additional model capabilities, please refer to the [Model Capability Matrix](./paddlemix/examples/README.md). ## LICENSE This repository is licensed under the [Apache 2.0 license](LICENSE)