From 6cb126b395f5d310e2b8655cf925b1dfb4dba1cd Mon Sep 17 00:00:00 2001 From: liuchuting Date: Thu, 8 May 2025 17:02:54 +0800 Subject: [PATCH] Format the document --- mshub_res/assets/mindspore/2.5/animatediff.md | 12 +-- .../assets/mindspore/2.5/autoencoders.md | 6 +- mshub_res/assets/mindspore/2.5/cogview.md | 6 +- mshub_res/assets/mindspore/2.5/dit.md | 2 +- mshub_res/assets/mindspore/2.5/emu3.md | 4 +- mshub_res/assets/mindspore/2.5/fit.md | 2 +- mshub_res/assets/mindspore/2.5/hunyuan_dit.md | 10 +-- .../assets/mindspore/2.5/hunyuanvideo-i2v.md | 18 +++-- .../assets/mindspore/2.5/hunyuanvideo.md | 28 +++---- mshub_res/assets/mindspore/2.5/hunyun3d_1.md | 4 +- mshub_res/assets/mindspore/2.5/instantmesh.md | 4 +- mshub_res/assets/mindspore/2.5/janus.md | 6 +- .../assets/mindspore/2.5/kohya_sd_scripts.md | 8 +- mshub_res/assets/mindspore/2.5/mvdream.md | 4 +- mshub_res/assets/mindspore/2.5/openlrm.md | 2 +- .../assets/mindspore/2.5/opensora_hpcai.md | 79 ++++++++++--------- .../assets/mindspore/2.5/opensora_pku.md | 38 ++++----- mshub_res/assets/mindspore/2.5/qwen2_vl.md | 4 +- mshub_res/assets/mindspore/2.5/sharegpt_4v.md | 6 +- .../assets/mindspore/2.5/step_video_t2v.md | 4 +- .../assets/mindspore/2.5/story_diffusion.md | 10 +-- mshub_res/assets/mindspore/2.5/var.md | 4 +- mshub_res/assets/mindspore/2.5/venhancer.md | 6 +- .../assets/mindspore/2.5/videocomposer.md | 33 ++++---- mshub_res/assets/mindspore/2.5/wan2_1.md | 22 +++--- 25 files changed, 165 insertions(+), 157 deletions(-) diff --git a/mshub_res/assets/mindspore/2.5/animatediff.md b/mshub_res/assets/mindspore/2.5/animatediff.md index 1076ec4..44ef4a7 100644 --- a/mshub_res/assets/mindspore/2.5/animatediff.md +++ b/mshub_res/assets/mindspore/2.5/animatediff.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -38,11 +38,11 @@ This repository is the MindSpore implementation of [AnimateDiff](https://arxiv.o ## Features -- [x] Text-to-video generation with AnimdateDiff v2, supporting 16 frames @512x512 resolution on Ascend Atlas 800T A2 machines -- [x] MotionLoRA inference -- [x] Motion Module Training -- [x] Motion LoRA Training -- [x] AnimateDiff v3 Inference +- ✔ Text-to-video generation with AnimdateDiff v2, supporting 16 frames @512x512 resolution on Ascend Atlas 800T A2 machines +- ✔ MotionLoRA inference +- ✔ Motion Module Training +- ✔ Motion LoRA Training +- ✔ AnimateDiff v3 Inference ## Requirements diff --git a/mshub_res/assets/mindspore/2.5/autoencoders.md b/mshub_res/assets/mindspore/2.5/autoencoders.md index 0f2554e..84eb2f4 100644 --- a/mshub_res/assets/mindspore/2.5/autoencoders.md +++ b/mshub_res/assets/mindspore/2.5/autoencoders.md @@ -20,7 +20,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -41,8 +41,8 @@ This repository contains SoTA image and video autoencoders and their training an ## Features - VAE (Image Variational AutoEncoder) - - [x] KL-reg with GAN loss (SD VAE) - - [x] VQ-reg with GAN loss (VQ-GAN) + - ✔ KL-reg with GAN loss (SD VAE) + - ✔ VQ-reg with GAN loss (VQ-GAN) ## Requirements diff --git a/mshub_res/assets/mindspore/2.5/cogview.md b/mshub_res/assets/mindspore/2.5/cogview.md index cd2371e..d25f4cc 100644 --- a/mshub_res/assets/mindspore/2.5/cogview.md +++ b/mshub_res/assets/mindspore/2.5/cogview.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore team @@ -32,7 +32,7 @@ summary: CogView4 is used for text-to-image generation --- -## News +# CogView4 based on MindSpore - 🔥🔥 `2025/03/05`: We have reproduced the inference of the excellent work CogView4, which was open-sourced by THUDM, on MindSpore. diff --git a/mshub_res/assets/mindspore/2.5/dit.md b/mshub_res/assets/mindspore/2.5/dit.md index 4938b94..9303e15 100644 --- a/mshub_res/assets/mindspore/2.5/dit.md +++ b/mshub_res/assets/mindspore/2.5/dit.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/emu3.md b/mshub_res/assets/mindspore/2.5/emu3.md index f6c528d..a596e3b 100644 --- a/mshub_res/assets/mindspore/2.5/emu3.md +++ b/mshub_res/assets/mindspore/2.5/emu3.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/fit.md b/mshub_res/assets/mindspore/2.5/fit.md index 099fbd1..93933a9 100644 --- a/mshub_res/assets/mindspore/2.5/fit.md +++ b/mshub_res/assets/mindspore/2.5/fit.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/hunyuan_dit.md b/mshub_res/assets/mindspore/2.5/hunyuan_dit.md index 5382cc9..5e1f913 100644 --- a/mshub_res/assets/mindspore/2.5/hunyuan_dit.md +++ b/mshub_res/assets/mindspore/2.5/hunyuan_dit.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -54,9 +54,9 @@ summary: HunyuanDiT is a multi-resolution diffusion transformer with fine-graine ### TODO -- [ ] EMA -- [ ] ControlNet training -- [ ] Enhance prompt +- ✖ EMA +- ✖ ControlNet training +- ✖ Enhance prompt ## Dependencies and Installation diff --git a/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md b/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md index 44d4ee1..4487e31 100644 --- a/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md +++ b/mshub_res/assets/mindspore/2.5/hunyuanvideo-i2v.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -32,6 +32,8 @@ summary: HunyuanVideo-I2V is used for image-to-video generation --- +# HunyuanVideo-I2V based on MindSpore + This is a **MindSpore** implementation of [HunyuanVideo-I2V](https://github.com/Tencent/HunyuanVideo-I2V). It contains the code for **training** and **inference** of HunyuanVideo and 3D CausalVAE. ## 📑 Development Plan @@ -39,13 +41,13 @@ This is a **MindSpore** implementation of [HunyuanVideo-I2V](https://github.com/ Here is the development plan of the project: - CausalVAE: - - [x] Inference - - [ ] Evaluation - - [ ] Training + - ✔ Inference + - ✖ Evaluation + - ✖ Training - HunyuanVideo (13B): - - [x] Inference (w. and w.o. LoRA weight) - - [ ] Training - - [ ] LoRA finetune + - ✔ Inference (w. and w.o. LoRA weight) + - ✖ Training + - ✖ LoRA finetune ## 📦 Requirements diff --git a/mshub_res/assets/mindspore/2.5/hunyuanvideo.md b/mshub_res/assets/mindspore/2.5/hunyuanvideo.md index 2fb0152..d29274d 100644 --- a/mshub_res/assets/mindspore/2.5/hunyuanvideo.md +++ b/mshub_res/assets/mindspore/2.5/hunyuanvideo.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -41,18 +41,18 @@ This is a **MindSpore** implementation of [HunyuanVideo](https://arxiv.org/abs/2 Here is the development plan of the project: - CausalVAE: - - [x] Inference - - [x] Evaluation - - [x] Training + - ✔ Inference + - ✔ Evaluation + - ✔ Training - HunyuanVideo (13B): - - [x] Inference - - [x] Sequence Parallel (Ulysses SP) - - [x] VAE latent cache - - [x] Training up to `544x960x129` and `720x1280x129` with SP and VAE latent cache - - [x] Training stage 1: T2I 256px - - [ ] Training stage 2: T2I 256px 512px (buckets) - - [ ] Training stage 3: T2I/V up to 720x1280x129 (buckets) - - [ ] LoRA finetune + - ✔ Inference + - ✔ Sequence Parallel (Ulysses SP) + - ✔ VAE latent cache + - ✔ Training up to `544x960x129` and `720x1280x129` with SP and VAE latent cache + - ✔ Training stage 1: T2I 256px + - ✖ Training stage 2: T2I 256px 512px (buckets) + - ✖ Training stage 3: T2I/V up to 720x1280x129 (buckets) + - ✖ LoRA finetune ## 📦 Requirements @@ -137,7 +137,7 @@ If you want to run T2V inference using sequence parallel (Ulysses SP), please us ### Run Image-to-Video Inference -Please find more information about HunyuanVideo Image-to-Video Inference at this [url](https://github.com/mindspore-lab/mindone/tree/master/examples/hunyuanvideo-i2v). +Please find more information about HunyuanVideo Image-to-Video Inference at this [url](https://github.com/mindspore-lab/mindone/tree/v0.3.0/examples/hunyuanvideo-i2v). ## 🔑 Training diff --git a/mshub_res/assets/mindspore/2.5/hunyun3d_1.md b/mshub_res/assets/mindspore/2.5/hunyun3d_1.md index 76f64ef..0e451b1 100644 --- a/mshub_res/assets/mindspore/2.5/hunyun3d_1.md +++ b/mshub_res/assets/mindspore/2.5/hunyun3d_1.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/instantmesh.md b/mshub_res/assets/mindspore/2.5/instantmesh.md index 618abab..a22ba5e 100644 --- a/mshub_res/assets/mindspore/2.5/instantmesh.md +++ b/mshub_res/assets/mindspore/2.5/instantmesh.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -129,8 +129,6 @@ huggingface-cli download zxhezexin/openlrm-base-obj-1.0 # do this if your proxy Hurray! Now `mindone.transformers` supported pretrained ckpt loading for `xx_model.bin`. You can now bypass the conversion above. ---- - The image features are extracted with dino-vit, which depends on HuggingFace's transformer package. We reuse [the MindSpore's implementation](https://github.com/mindspore-lab/mindone/blob/master/mindone/transformers/modeling_utils.py#L499) and the only challenge remains to be that `.bin` checkpoint of [dino-vit](https://huggingface.co/facebook/dino-vitb16/tree/main) is not supported by MindSpore off-the-shelf. The checkpoint script above serves easy conversion purposes and ensures that dino-vit is still based on `MSPreTrainedModel` safe and sound. ### InstantMesh Checkpoint diff --git a/mshub_res/assets/mindspore/2.5/janus.md b/mshub_res/assets/mindspore/2.5/janus.md index bd925ef..ef9d894 100644 --- a/mshub_res/assets/mindspore/2.5/janus.md +++ b/mshub_res/assets/mindspore/2.5/janus.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: N·A +train-dataset: N/A author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -58,7 +58,7 @@ summary: Janus is a unified multimodal understanding and generation model We provide an efficient MindSpore implementation of [JanusPro](https://github.com/deepseek-ai/Janus). This repository is built on the models and code released by DeepSeek. We are grateful for their exceptional work and generous contribution to open source. -## News +# Janus-Pro based on MindSpore **2025.03.12**: We have reproduced the multi-modal training pipelines referring to the JanusPro [paper](https://github.com/deepseek-ai/Janus), see [docs/training.md](docs/training.md). diff --git a/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md b/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md index 82bd4b5..3ef51d2 100644 --- a/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md +++ b/mshub_res/assets/mindspore/2.5/kohya_sd_scripts.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-03-10 -repo-link: +repo-link: user-id: MindSpore @@ -38,9 +38,9 @@ Here we provide a MindSpore implementation of [Kohya's Stable Diffusion trainers Currently, we support -- [x] SDXL LoRA training -- [x] SDXL LoRA (Dreambooth) training -- [x] SDXL Inference +- ✔ SDXL LoRA training +- ✔ SDXL LoRA (Dreambooth) training +- ✔ SDXL Inference > Notes: Basically, we've tried to provide a consistent implementation with the torch Kohya SD trainer, but we have limitations due to differences in the framework. Refer to the main difference between the two codebases listed [here](./Limitations.md) if needed. diff --git a/mshub_res/assets/mindspore/2.5/mvdream.md b/mshub_res/assets/mindspore/2.5/mvdream.md index 762a1eb..136809b 100644 --- a/mshub_res/assets/mindspore/2.5/mvdream.md +++ b/mshub_res/assets/mindspore/2.5/mvdream.md @@ -18,7 +18,7 @@ author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -32,6 +32,8 @@ summary: MVDream is a diffusion model for multi-view consistent 3D generation --- +# MVDream based on MindSpore + We support the training/inference pipeline of a diffusion-prior based, neural implicit field rendered, 3D mesh generation work called MVDream here. ## Introduction diff --git a/mshub_res/assets/mindspore/2.5/openlrm.md b/mshub_res/assets/mindspore/2.5/openlrm.md index b1b3434..618f672 100644 --- a/mshub_res/assets/mindspore/2.5/openlrm.md +++ b/mshub_res/assets/mindspore/2.5/openlrm.md @@ -20,7 +20,7 @@ author: MindSpore team update-time: 2025-03-10 -repo-link: +repo-link: user-id: MindSpore diff --git a/mshub_res/assets/mindspore/2.5/opensora_hpcai.md b/mshub_res/assets/mindspore/2.5/opensora_hpcai.md index 53e72ef..6057061 100644 --- a/mshub_res/assets/mindspore/2.5/opensora_hpcai.md +++ b/mshub_res/assets/mindspore/2.5/opensora_hpcai.md @@ -12,13 +12,13 @@ fine-tunable: True model-version: 2.5 -train-dataset: UCF-101 | WebVid | MixKit +train-dataset: UCF-101 author: MindSpore team update-time: 2025-04-22 -repo-link: +repo-link: user-id: MindSpore @@ -32,13 +32,13 @@ summary: OpenSora-HPCAI is a large video generation model for text-to-video gene --- -## Open-Sora: Democratizing Efficient Video Production for All +# Open-Sora: Democratizing Efficient Video Production for All Here we provide an efficient MindSpore implementation of [OpenSora](https://github.com/hpcaitech/Open-Sora), an open-source project that aims to foster innovation, creativity, and inclusivity within the field of content creation. This repository is built on the models and code released by HPC-AI Tech. We are grateful for their exceptional work and generous contribution to open source. -

Open-Sora is still at an early stage and under active development.

+Open-Sora is still at an early stage and under active development. ## 📰 News & States @@ -46,10 +46,10 @@ This repository is built on the models and code released by HPC-AI Tech. We are | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | | **[2025.03.12]** 🔥 We released **Open-Sora 2.0** (11B). 🎬 11B model achieves [on-par performance](#evaluation) with 11B HunyuanVideo & 30B Step-Video on 📐VBench & 📊Human Preference. 🛠️ Fully open-source: checkpoints and training codes for training with only **$200K**. [[report]](https://arxiv.org/abs/2503.09642) | Inference | | **[2024.06.17]** 🔥 HPC-AI released **Open-Sora 1.2**, which includes **3D-VAE**, **rectified flow**, and **score condition**. The video quality is greatly improved. [[checkpoints]](#model-weights) [[report]](https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_03.md) | Text-to-Video | -| **[2024.04.25]** 🤗 HPC-AI Tech released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces. | N.A. | +| **[2024.04.25]** 🤗 HPC-AI Tech released the [Gradio demo for Open-Sora](https://huggingface.co/spaces/hpcai-tech/open-sora) on Hugging Face Spaces. | N/A | | **[2024.04.25]** 🔥 HPC-AI Tech released **Open-Sora 1.1**, which supports **2s~15s, 144p to 720p, any aspect ratio** text-to-image, **text-to-video, image-to-video, video-to-video, infinite time** generation. In addition, a full video processing pipeline is released. [[checkpoints]]() [[report]](https://github.com/hpcaitech/Open-Sora/blob/main/docs/report_02.md) | Image/Video-to-Video; Infinite time generation; Variable resolutions, aspect ratios, durations | -| **[2024.03.18]** HPC-AI Tech released **Open-Sora 1.0**, a fully open-source project for video generation. | ✅ VAE + STDiT training and inference | -| **[2024.03.04]** HPC-AI Tech Open-Sora provides training with 46% cost reduction | ✅ Parallel training on Ascend devices | +| **[2024.03.18]** HPC-AI Tech released **Open-Sora 1.0**, a fully open-source project for video generation. | ✔ VAE + STDiT training and inference | +| **[2024.03.04]** HPC-AI Tech Open-Sora provides training with 46% cost reduction | ✔ Parallel training on Ascend devices | ## Requirements @@ -64,10 +64,10 @@ The following videos are generated based on MindSpore and Ascend Atlas 800T A2 m ### OpenSora 2.0 Demo | 3s 576×1024 | 5s 576×1024 | -| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -|