diff --git a/mshub_res/assets/mindspore/2.5/animatediff.md b/mshub_res/assets/mindspore/2.5/animatediff.md index 1076ec43cead320624370238014f34032930a1eb..44ef4a7f6f1c5a069762e064fb6763f7474f27b1 100644 --- a/mshub_res/assets/mindspore/2.5/animatediff.md +++ b/mshub_res/assets/mindspore/2.5/animatediff.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -38,11 +38,11 @@ This repository is the MindSpore implementation of [AnimateDiff](https://arxiv.o ## Features -- [x] Text-to-video generation with AnimdateDiff v2, supporting 16 frames @512x512 resolution on Ascend Atlas 800T A2 machines -- [x] MotionLoRA inference -- [x] Motion Module Training -- [x] Motion LoRA Training -- [x] AnimateDiff v3 Inference +- ✔ Text-to-video generation with AnimdateDiff v2, supporting 16 frames @512x512 resolution on Ascend Atlas 800T A2 machines +- ✔ MotionLoRA inference +- ✔ Motion Module Training +- ✔ Motion LoRA Training +- ✔ AnimateDiff v3 Inference ## Requirements diff --git a/mshub_res/assets/mindspore/2.5/autoencoders.md b/mshub_res/assets/mindspore/2.5/autoencoders.md index 0f2554e5063193d0bc38fbcc41904b2388576d2e..84eb2f4b1b66f9a8f41a5d72e7c5b917e7ab52e1 100644 --- a/mshub_res/assets/mindspore/2.5/autoencoders.md +++ b/mshub_res/assets/mindspore/2.5/autoencoders.md @@ -20,7 +20,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -41,8 +41,8 @@ This repository contains SoTA image and video autoencoders and their training an ## Features - VAE (Image Variational AutoEncoder) - - [x] KL-reg with GAN loss (SD VAE) - - [x] VQ-reg with GAN loss (VQ-GAN) + - ✔ KL-reg with GAN loss (SD VAE) + - ✔ VQ-reg with GAN loss (VQ-GAN) ## Requirements diff --git a/mshub_res/assets/mindspore/2.5/cogview.md b/mshub_res/assets/mindspore/2.5/cogview.md index cd2371edaa85c4916c19706810d5d94b9727a20c..d25f4cc3f34def0462241bdcde902b83fe870b2f 100644 --- a/mshub_res/assets/mindspore/2.5/cogview.md +++ b/mshub_res/assets/mindspore/2.5/cogview.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore team @@ -32,7 +32,7 @@ summary: CogView4 is used for text-to-image generation --- -## News +# CogView4 based on MindSpore - 🔥🔥 `2025/03/05`: We have reproduced the inference of the excellent work CogView4, which was open-sourced by THUDM, on MindSpore. diff --git a/mshub_res/assets/mindspore/2.5/dit.md b/mshub_res/assets/mindspore/2.5/dit.md index 4938b94b6fcf244028f85803a762d8a2ebc6c9bc..9303e156bc9c84eaeff756783afcffc6c3fa19de 100644 --- a/mshub_res/assets/mindspore/2.5/dit.md +++ b/mshub_res/assets/mindspore/2.5/dit.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/emu3.md b/mshub_res/assets/mindspore/2.5/emu3.md index f6c528de74783f3eecec4caf0b229f30a2c3cbd3..a596e3b73bbb80fd97edd7f43e2100be73a4d220 100644 --- a/mshub_res/assets/mindspore/2.5/emu3.md +++ b/mshub_res/assets/mindspore/2.5/emu3.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/fit.md b/mshub_res/assets/mindspore/2.5/fit.md index 099fbd1282b163202802497ae6c75a83d180b430..93933a93ed78e332af8032e0750322a28d755f65 100644 --- a/mshub_res/assets/mindspore/2.5/fit.md +++ b/mshub_res/assets/mindspore/2.5/fit.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/hunyuan_dit.md b/mshub_res/assets/mindspore/2.5/hunyuan_dit.md index 5382cc998e9e4c4fed86a184f4f90c077d02de8d..5e1f913013fa28fa8b55785506f2c9352eb08191 100644 --- a/mshub_res/assets/mindspore/2.5/hunyuan_dit.md +++ b/mshub_res/assets/mindspore/2.5/hunyuan_dit.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -54,9 +54,9 @@ summary: HunyuanDiT is a multi-resolution diffusion transformer with fine-graine ### TODO -- [ ] EMA -- [ ] ControlNet training -- [ ] Enhance prompt +- ✖ EMA +- ✖ ControlNet training +- ✖ Enhance prompt ## Dependencies and Installation diff --git a/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md b/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md index 44d4ee191cc5616790454f4817a1201f9ae75137..4487e31c29ea82ab73c48b194185bf70f2281fb9 100644 --- a/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md +++ b/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -32,6 +32,8 @@ summary: HunyuanVideo-I2V is used for image-to-video generation --- +# HunyuanVideo-I2V based on MindSpore + This is a **MindSpore** implementation of [HunyuanVideo-I2V](https://github.com/Tencent/HunyuanVideo-I2V). It contains the code for **training** and **inference** of HunyuanVideo and 3D CausalVAE. ## 📑 Development Plan @@ -39,13 +41,13 @@ This is a **MindSpore** implementation of [HunyuanVideo-I2V](https://github.com/ Here is the development plan of the project: - CausalVAE: - - [x] Inference - - [ ] Evaluation - - [ ] Training + - ✔ Inference + - ✖ Evaluation + - ✖ Training - HunyuanVideo (13B): - - [x] Inference (w. and w.o. LoRA weight) - - [ ] Training - - [ ] LoRA finetune + - ✔ Inference (w. and w.o. LoRA weight) + - ✖ Training + - ✖ LoRA finetune ## 📦 Requirements diff --git a/mshub_res/assets/mindspore/2.5/hunyuanvideo.md b/mshub_res/assets/mindspore/2.5/hunyuanvideo.md index 2fb0152394187c8ed0fe2b3113d682aa3dc822dc..d29274d157d9b872870fcbabfd466bfaf598b48d 100644 --- a/mshub_res/assets/mindspore/2.5/hunyuanvideo.md +++ b/mshub_res/assets/mindspore/2.5/hunyuanvideo.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -41,18 +41,18 @@ This is a **MindSpore** implementation of [HunyuanVideo](https://arxiv.org/abs/2 Here is the development plan of the project: - CausalVAE: - - [x] Inference - - [x] Evaluation - - [x] Training + - ✔ Inference + - ✔ Evaluation + - ✔ Training - HunyuanVideo (13B): - - [x] Inference - - [x] Sequence Parallel (Ulysses SP) - - [x] VAE latent cache - - [x] Training up to `544x960x129` and `720x1280x129` with SP and VAE latent cache - - [x] Training stage 1: T2I 256px - - [ ] Training stage 2: T2I 256px 512px (buckets) - - [ ] Training stage 3: T2I/V up to 720x1280x129 (buckets) - - [ ] LoRA finetune + - ✔ Inference + - ✔ Sequence Parallel (Ulysses SP) + - ✔ VAE latent cache + - ✔ Training up to `544x960x129` and `720x1280x129` with SP and VAE latent cache + - ✔ Training stage 1: T2I 256px + - ✖ Training stage 2: T2I 256px 512px (buckets) + - ✖ Training stage 3: T2I/V up to 720x1280x129 (buckets) + - ✖ LoRA finetune ## 📦 Requirements @@ -137,7 +137,7 @@ If you want to run T2V inference using sequence parallel (Ulysses SP), please us ### Run Image-to-Video Inference -Please find more information about HunyuanVideo Image-to-Video Inference at this [url](https://github.com/mindspore-lab/mindone/tree/master/examples/hunyuanvideo-i2v). +Please find more information about HunyuanVideo Image-to-Video Inference at this [url](https://github.com/mindspore-lab/mindone/tree/v0.3.0/examples/hunyuanvideo-i2v). ## 🔑 Training diff --git a/mshub_res/assets/mindspore/2.5/hunyun3d_1.md b/mshub_res/assets/mindspore/2.5/hunyun3d_1.md index 76f64ef089caf0ad1275a7c551f7026d6695fa7b..0e451b16ba8be01d8eb9c3c7d9fb6cf3a169c4ac 100644 --- a/mshub_res/assets/mindspore/2.5/hunyun3d_1.md +++ b/mshub_res/assets/mindspore/2.5/hunyun3d_1.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/instantmesh.md b/mshub_res/assets/mindspore/2.5/instantmesh.md index 618ababf9a3c38fa95c4fd58c52343a5bddffd43..a22ba5e7d487b3968542db4ed7171a6cf86a98cc 100644 --- a/mshub_res/assets/mindspore/2.5/instantmesh.md +++ b/mshub_res/assets/mindspore/2.5/instantmesh.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -129,8 +129,6 @@ huggingface-cli download zxhezexin/openlrm-base-obj-1.0 # do this if your proxy Hurray! Now `mindone.transformers` supported pretrained ckpt loading for `xx_model.bin`. You can now bypass the conversion above. ---- - The image features are extracted with dino-vit, which depends on HuggingFace's transformer package. We reuse [the MindSpore's implementation](https://github.com/mindspore-lab/mindone/blob/master/mindone/transformers/modeling_utils.py#L499) and the only challenge remains to be that `.bin` checkpoint of [dino-vit](https://huggingface.co/facebook/dino-vitb16/tree/main) is not supported by MindSpore off-the-shelf. The checkpoint script above serves easy conversion purposes and ensures that dino-vit is still based on `MSPreTrainedModel` safe and sound. ### InstantMesh Checkpoint diff --git a/mshub_res/assets/mindspore/2.5/janus.md b/mshub_res/assets/mindspore/2.5/janus.md index bd925ef3978a83d79f40f0c2d97f3ea636146f63..ef9d894ca086ffaf56d9e183e4e3077ac1b3af91 100644 --- a/mshub_res/assets/mindspore/2.5/janus.md +++ b/mshub_res/assets/mindspore/2.5/janus.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -58,7 +58,7 @@ summary: Janus is a unified multimodal understanding and generation model We provide an efficient MindSpore implementation of [JanusPro](https://github.com/deepseek-ai/Janus). This repository is built on the models and code released by DeepSeek. We are grateful for their exceptional work and generous contribution to open source. -## News +# Janus-Pro based on MindSpore **2025.03.12**: We have reproduced the multi-modal training pipelines referring to the JanusPro [paper](https://github.com/deepseek-ai/Janus), see [docs/training.md](docs/training.md). diff --git a/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md b/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md index 82bd4b590abf8c61d57690fef5129237ccca9605..3ef51d2fd3d0575155a26157ec34d7f7c3bc3f74 100644 --- a/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md +++ b/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-03-10 -repo-link: +repo-link: user-id: MindSpore @@ -38,9 +38,9 @@ Here we provide a MindSpore implementation of [Kohya's Stable Diffusion trainers Currently, we support -- [x] SDXL LoRA training -- [x] SDXL LoRA (Dreambooth) training -- [x] SDXL Inference +- ✔ SDXL LoRA training +- ✔ SDXL LoRA (Dreambooth) training +- ✔ SDXL Inference > Notes: Basically, we've tried to provide a consistent implementation with the torch Kohya SD trainer, but we have limitations due to differences in the framework. Refer to the main difference between the two codebases listed [here](./Limitations.md) if needed. diff --git a/mshub_res/assets/mindspore/2.5/mvdream.md b/mshub_res/assets/mindspore/2.5/mvdream.md index 762a1ebf004f52aa8de1cbb9db1e61c2956d7ae3..136809b84dcf5d1a5077d1ccd3e1abcd9b260d16 100644 --- a/mshub_res/assets/mindspore/2.5/mvdream.md +++ b/mshub_res/assets/mindspore/2.5/mvdream.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -32,6 +32,8 @@ summary: MVDream is a diffusion model for multi-view consistent 3D generation --- +# MVDream based on MindSpore + We support the training/inference pipeline of a diffusion-prior based, neural implicit field rendered, 3D mesh generation work called MVDream here. ## Introduction diff --git a/mshub_res/assets/mindspore/2.5/openlrm.md b/mshub_res/assets/mindspore/2.5/openlrm.md index b1b3434a8ea5b4e66b7365081b5517d17c81eed6..618f672c3febaa51254a323b1d1c802b5a09dbcd 100644 --- a/mshub_res/assets/mindspore/2.5/openlrm.md +++ b/mshub_res/assets/mindspore/2.5/openlrm.md @@ -20,7 +20,7 @@ author: MindSpore team update-time: 2025-03-10 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/opensora_hpcai.md b/mshub_res/assets/mindspore/2.5/opensora_hpcai.md index 53e72ef6f453a9bf086025b689ae9761e5eeaa65..6057061611f39fcd45c1cdb357d91d762c571d87 100644 --- a/mshub_res/assets/mindspore/2.5/opensora_hpcai.md +++ b/mshub_res/assets/mindspore/2.5/opensora_hpcai.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: UCF-101 | WebVid | MixKit +train-dataset: UCF-101 author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -32,13 +32,13 @@ summary: OpenSora-HPCAI is a large video generation model for text-to-video gene --- -## Open-Sora: Democratizing Efficient Video Production for All +# Open-Sora: Democratizing Efficient Video Production for All Here we provide an efficient MindSpore implementation of [OpenSora](https://github.com/hpcaitech/Open-Sora), an open-source project that aims to foster innovation, creativity, and inclusivity within the field of content creation. This repository is built on the models and code released by HPC-AI Tech. We are grateful for their exceptional work and generous contribution to open source. -

Open-Sora is still at an early stage and under active development.

+Open-Sora is still at an early stage and under active development. ## 📰 News & States @@ -46,10 +46,10 @@ This repository is built on the models and code released by HPC-AI Tech. We are | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | | **[2025.03.12]** 🔥 We released **Open-Sora 2.0** (11B). 🎬 11B model achieves [on-par performance](#evaluation) with 11B HunyuanVideo & 30B Step-Video on 📐VBench & 📊Human Preference. 🛠️ Fully open-source: checkpoints and training codes for training with only **$200K**. [[report]](https://arxiv.org/abs/2503.09642) | Inference | | **[2024.06.17]** 🔥 HPC-AI released **Open-Sora 1.2**, which includes **3D-VAE**, **rectified flow**, and **score condition**. The video quality is greatly improved. [[checkpoints]](#model-weights) [[report]](https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md) | Text-to-Video | -| **[2024.04.25]** 🤗 HPC-AI Tech released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces. | N.A. | +| **[2024.04.25]** 🤗 HPC-AI Tech released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces. | N/A | | **[2024.04.25]** 🔥 HPC-AI Tech released **Open-Sora 1.1**, which supports **2s~15s, 144p to 720p, any aspect ratio** text-to-image, **text-to-video, image-to-video, video-to-video, infinite time** generation. In addition, a full video processing pipeline is released. [[checkpoints]]() [[report]](https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_02.md) | Image/Video-to-Video; Infinite time generation; Variable resolutions, aspect ratios, durations | -| **[2024.03.18]** HPC-AI Tech released **Open-Sora 1.0**, a fully open-source project for video generation. | ✅ VAE + STDiT training and inference | -| **[2024.03.04]** HPC-AI Tech Open-Sora provides training with 46% cost reduction | ✅ Parallel training on Ascend devices | +| **[2024.03.18]** HPC-AI Tech released **Open-Sora 1.0**, a fully open-source project for video generation. | ✔ VAE + STDiT training and inference | +| **[2024.03.04]** HPC-AI Tech Open-Sora provides training with 46% cost reduction | ✔ Parallel training on Ascend devices | ## Requirements @@ -64,10 +64,10 @@ The following videos are generated based on MindSpore and Ascend Atlas 800T A2 m ### OpenSora 2.0 Demo | 3s 576×1024 | 5s 576×1024 | -| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -|