# mindone
**Repository Path**: mindspore-lab/mindone
## Basic Information
- **Project Name**: mindone
- **Description**: This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 11
- **Forks**: 10
- **Created**: 2023-05-26
- **Last Updated**: 2025-12-25
## Categories & Tags
**Categories**: ai
**Tags**: None
## README
# MindSpore ONE
This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.
ONE is short for "ONE for all"
## News
- [2025.12.24] We release [v0.5.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.5.0), compatibility with 🤗 Transformers v4.57.1 ([70+ new models](./mindone/transformers/SUPPORT_LIST.md)) and 🤗 Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
- [2025.11.02] [v0.4.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.4.0) is released, with 280+ transformers models and 70+ diffusers pipelines supported. See [here](https://github.com/mindspore-lab/mindone/blob/refs/tags/v0.4.0/CHANGELOG.md)
- [2025.04.10] We release [v0.3.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.3.0). More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
- [2025.02.21] We support DeepSeek [Janus-Pro](https://huggingface.co/deepseek-ai/Janus-Pro-7B), a SoTA multimodal understanding and generation model. See [here](examples/janus)
- [2024.11.06] [v0.2.0](https://github.com/mindspore-lab/mindone/releases/tag/v0.2.0) is released
## Quick tour
To install v0.5.0, please install [MindSpore 2.6.0 - 2.7.1](https://www.mindspore.cn/install) and run `pip install mindone`
Alternatively, to install the latest version from the `master` branch, please run:
```
git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .
```
We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) as an example.
**Hello MindSpore** from **Stable Diffusion 3**!
```py
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")
```
### run hf diffusers on mindspore
- mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
- compatible with 🤗 diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see [support list](./mindone/diffusers/SUPPORT_LIST.md)
- 18+ [training examples](./examples/diffusers) - controlnet, dreambooth, lora and more
### run hf transformers on mindspore
- mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
- compatibale with 🤗 transformers v4.57.1
- providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see [support list](./mindone/transformers/SUPPORT_LIST.md)
### supported models under mindone/examples
| task | model | inference | finetune | pretrain | institute |
| :--- | :--- | :---: | :---: | :---: | :-- |
| Text/Image-to-Video | [wan2.1](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_1) 🔥 | ✅ | ✖️ | ✖️ | Alibaba |
| Text/Image-to-Video | [wan2.2](https://github.com/mindspore-lab/mindone/blob/master/examples/wan2_2) 🔥🔥 | ✅ | ✅ | ✖️ | Alibaba |
| Audio/Image-Text-to-Text | [qwen2_5_omni](https://github.com/mindspore-lab/mindone/blob/master/examples/transformers/qwen2_5_omni) 🔥🔥| ✅ | ✅ | ✖️ | Alibaba |
| Image/Video-Text-to-Text | [qwen2_5_vl](https://github.com/mindspore-lab/mindone/tree/master/examples/transformers/qwen2_5_vl) 🔥🔥| ✅ | ✅ | ✖️ | Alibaba |
| Any-to-Any | [qwen3_omni_moe](https://github.com/mindspore-lab/mindone/tree/master/examples/transformers/qwen3_omni_moe) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
| Image-Text-to-Text | [qwen3_vl/qwen3_vl_moe](https://github.com/mindspore-lab/mindone/tree/master/examples/transformers/qwen3_vl) 🔥🔥🔥 | ✅ | ✖️ | ✖️ | Alibaba |
| Text-to-Image | [qwen_image](https://github.com/mindspore-lab/mindone/tree/master/examples/diffusers/qwenimage) 🔥🔥🔥 | ✅ | ✅ | ✖️ | Alibaba |
| Text-to-Text | [minicpm](https://github.com/mindspore-lab/mindone/tree/master/examples/transformers/minicpm) 🔥🔥 | ✅ | ✖️ | ✖️ | OpenBMB |
| Any-to-Any | [janus](https://github.com/mindspore-lab/mindone/blob/master/examples/janus) | ✅ | ✅ | ✅ | DeepSeek |
| Any-to-Any | [emu3](https://github.com/mindspore-lab/mindone/blob/master/examples/emu3) | ✅ | ✅ | ✅ | BAAI |
| Class-to-Image | [var](https://github.com/mindspore-lab/mindone/blob/master/examples/var) | ✅ | ✅ | ✅ | ByteDance |
| Text-to-Image | [omnigen2](https://github.com/mindspore-lab/mindone/blob/master/examples/omnigen2) 🔥 | ✅ | ✅ | ✖️ | VectorSpaceLab |
| Text/Image-to-Video | [hpcai open sora 1.2/2.0](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_hpcai) | ✅ | ✅ | ✅ | HPC-AI Tech |
| Text/Image-to-Video | [cogvideox 1.5 5B~30B ](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/cogvideox_factory) | ✅ | ✅ | ✅ | Zhipu |
| Image/Text-to-Text | [glm4v](https://github.com/mindspore-lab/mindone/tree/master/examples/transformers/glm4v) 🔥 | ✅ | ✖️ | ✖️ | Zhipu |
| Text-to-Video | [open sora plan 1.3](https://github.com/mindspore-lab/mindone/blob/master/examples/opensora_pku) | ✅ | ✅ | ✅ | PKU |
| Text-to-Video | [hunyuanvideo](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo) | ✅ | ✅ | ✅ | Tencent |
| Image-to-Video | [hunyuanvideo-i2v](https://github.com/mindspore-lab/mindone/blob/master/examples/hunyuanvideo-i2v) 🔥 | ✅ | ✖️ | ✖️ | Tencent |
| Text-to-Video | [movie gen 30B](https://github.com/mindspore-lab/mindone/blob/master/examples/moviegen) | ✅ | ✅ | ✅ | Meta |
| Segmentation | [lang_sam](https://github.com/mindspore-lab/mindone/tree/master/examples/lang_sam) 🔥 | ✅ | ✖️ | ✖️ | Meta |
| Segmentation | [sam2](https://github.com/mindspore-lab/mindone/tree/master/examples/sam2) |✅ | ✖️ |✖️ | Meta |
| Text-to-Video | [step_video_t2v](https://github.com/mindspore-lab/mindone/blob/master/examples/step_video_t2v) | ✅ | ✖️ | ✖️ | StepFun |
| Text-to-Speech | [sparktts](https://github.com/mindspore-lab/mindone/tree/master/examples/sparktts) |✅ | ✖️ | ✖️ | Spark Audio |
| Text-to-Image | [flux](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/dreambooth/README_flux.md) | ✅ | ✅ | ✖️ | Black Forest Lab |
| Text-to-Image | [stable diffusion 3](https://github.com/mindspore-lab/mindone/blob/master/examples/diffusers/dreambooth/README_sd3.md) | ✅ | ✅ | ✖️ | Stability AI |
### supported captioner
| task | model | inference | finetune | pretrain | features |
| :--- | :--- | :---: | :---: | :---: | :-- |
| Image-Text-to-Text | [pllava](https://github.com/mindspore-lab/mindone/tree/master/tools/captioners/PLLaVA) | ✅ | ✖️ | ✖️ | support video and image captioning |
### training-free acceleration
Introduce [dit infer acceleration](https://github.com/mindspore-lab/mindone/blob/master/examples/accelerated_dit_pipelines/README.md) - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.