# MindSpeed-MM **Repository Path**: mazhuang_1004/MindSpeed-MM ## Basic Information - **Project Name**: MindSpeed-MM - **Description**: 华为昇腾面向大规模分布式训练的多模态大模型套件,支撑多模态生成、多模态理解。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://gitee.com/ascend/MindSpeed-MM - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 242 - **Created**: 2025-07-11 - **Last Updated**: 2025-12-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
Prompt: Ultra HD, 4K, cinematic composition, low contrast ratio, low saturation, cool tone; The queen wears an iron crown and rides on the dragon over the city. She holds a big flag that shows:" MindSpeed MM". |
Prompt: A fluffy white rabbit with soft, velvety fur and twitching pink nose sits curiously near a rustic wooden fence, surrounded by a lush garden of vibrant wildflowers and tall grasses swaying gently in the breeze. The rabbit's large, expressive eyes scan the environment, reflecting the golden hues of the setting sun. As it nibbles on a patch of clover, its ears perk up at the distant sound of chirping birds. The fence, weathered and covered in patches of moss, adds a charming, pastoral backdrop to this serene scene, capturing the essence of a peaceful countryside moment. |
Prompt: A majestic Berlin tower stands tall against the night sky, its structure bathed in a mesmerizing array of vibrant lights, casting a kaleidoscope of colors across the cityscape. The tower's intricate architectural details are highlighted by the illumination, creating a stunning contrast against the deep indigo sky. As the camera pans upward, the lights shift, revealing a dynamic play of shadows and hues that dance across the tower's surface. The surrounding city lights twinkle in harmony, enhancing the tower's grandeur and creating a breathtaking visual symphony that captures the essence of Berlin's vibrant nightlife. |
Prompt for generation: A coffee shop entrance features a chalkboard sign reading "MindSpeed Coffee 😊 $2 per cup," with a neon light displaying "MindSpeed MM". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "Welcome to use MindSpeed MM". Ultra HD, 4K, cinematic composition. (Qwen-Image) |
Prompt for edition: Change the decoration of the coffee shop to a modern style with white painting. (Flux.1-Kontext) |
|
Input image for both models:
Input text for both models: Please describe the image shortly Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene. Input text for Qwen2VL: 请用中文简短描述这张照片 Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上,背景是连绵的山脉和茂密的森林。天空多云,整体色调偏冷,给人一种宁静和自然的感觉。 |
| 模型任务 | 模型 | 参数量 | 任务 | 集群 | 精度格式 | NPU性能 | 参考性能 | 平均序列长度 | 认证 |
|---|---|---|---|---|---|---|---|---|---|
| 多模态生成 | |||||||||
| Lumina-mGPT 2.0 | 7B | 微调 | 1x8 | BF16 | 8.24 (SPS) | 8.79 (SPS) | 1024 | 【Pass】 | |
| OpenSoraPlan1.5 | 8.5B | 预训练 | 1x8 | BF16 | 0.83 (SPS) | / | / | 【北大贡献】 | |
| Wan2.2-T2V | 5B | 预训练 | 1x4 (A3) | BF16 | 3.18 (SPS) | 2.93 (SPS) | / | 【Test】 | |
| A14B | 预训练 | 1x8 (A3) | BF16 | 0.710 (SPS) | 0.292 (SPS) | / | 【Test】 | ||
| Wan2.2-TI2V | 5B | 预训练 | 1x4 (A3) | BF16 | 3.18 (SPS) | 2.93 (SPS) | / | 【Test】 | |
| Wan2.2-I2V | A14B | 预训练 | 1x8 (A3) | BF16 | 0.671 (SPS) | 0.294 (SPS) | / | 【Test】 | Wan2.1-T2V | 1.3B | 预训练 | 1x8 | BF16 | 0.918 (SPS) | 1.04 (SPS) | / | 【Pass】 |
| 1.3B | Lora微调 | 1x8 | BF16 | 0.954 (SPS) | 1.042 (SPS) | / | 【Pass】 | ||
| 14B | 预训练 | 1x8 | BF16 | 0.160 (SPS) | 0.160 (SPS) | / | 【Pass】 | ||
| 14B | Lora微调 | 1x8 | BF16 | 0.179 (SPS) | 0.174 (SPS) | / | 【Pass】 | Wan2.1-I2V | 1.3B | 预训练 | 1x8 | BF16 | 0.76 (SPS) | / | / | 【Pass】 |
| 14B | 预训练 | 1x8 | BF16 | 0.130 (SPS) | / | / | 【Pass】 | ||
| 14B | Lora微调 | 1x8 | BF16 | 0.179 (SPS) | 0.173 (SPS) | / | 【Pass】 | Self-Forcing | 1.3B | DMD蒸馏 | 1x8 | BF16 | 0.225 (FPS) | 0.282 (FPS) | / | 【Test】 | HunyuanVideo-T2V | 13B | 预训练 | 1x8 | BF16 | 0.171 (SPS) | 0.181 (SPS) | / | 【Pass】 | HunyuanVideo-I2V | 13B | 预训练 | 1x8 | BF16 | 0.164 (SPS) | 0.202 (SPS) | / | 【Pass】 |
| OpenSora 1.0 | 5.5B | 预训练 | 1x8 | BF16 | 3.18 (SPS) | 2.04 (SPS) | / | 【Pass】 | |
| OpenSora 1.2 | 5.2B | 预训练 | 1x8 | BF16 | 7.31 (SPS) | 8.15 (SPS) | / | 【Test】 | |
| OpenSora 2.0-T2V | 11B | 预训练 | 1x8 | BF16 | 1.33 (SPS) | 1.46 (SPS) | / | 【Pass】 | |
| OpenSoraPlan 1.2 | 8.7B | 预训练 | 1x8 | BF16 | 0.42 (SPS) | 0.37 (SPS) | / | 【Pass】 | |
| OpenSoraPlan 1.3-T2V | 8.6B | 预训练 | 1x8 | BF16 | 1.29 (SPS) | 1.27 (SPS) | / | 【Pass】 | |
| OpenSoraPlan 1.3-I2V | 8.6B | 预训练 | 1x8 | BF16 | 1.17 (SPS) | 1.15 (SPS) | / | 【Pass】 | |
| WFVAE | 0.18B | 预训练 | 1x8 | BF16 | 23.860 (SPS) | 26.091 (SPS) | / | 【Pass】 | |
| CogVideoX-T2V | 5B | 预训练 | 1x8 | BF16 | 1.14 (SPS) | 1.00 (SPS) | 6976 | 【Pass】 | |
| CogVideoX-I2V | 5B | 预训练 | 1x8 | BF16 | 1.13 (SPS) | 0.84 (SPS) | 6976 | 【Pass】 | |
| CogVideoX 1.5-T2V | 5B | 预训练 | 1x8 | BF16 | 1.44 (SPS) | 1.75 (SPS) | 6976 | 【Pass】 | |
| 5B | Lora微调 | 1x8 | BF16 | 2.76 (SPS) | 2.64 (SPS) | / | 【Pass】 | ||
| CogVideoX 1.5-I2V | 5B | 预训练 | 1x8 | BF16 | 1.43 (SPS) | 1.44 (SPS) | 6976 | 【Pass】 | |
| 5B | Lora微调 | 1x8 | BF16 | 2.33 (SPS) | 2.04 (SPS) | / | 【Pass】 | ||
| Qihoo-T2X | 1.1B | 推理 | 1x1 | BF16 | / | / | / | 【奇虎360贡献】 | |
| SDXL | 3.5B | 预训练 | 1x8 | BF16 | 29.92 (FPS) | 30.65 (FPS) | / | 【Pass】 | |
| 3.5B | 预训练 | 1x8 | FP16 | 28.51 (FPS) | 30.23 (FPS) | / | 【Pass】 | ||
| SD3 | 2B | 全参微调 | 1x8 | BF16 | 16.09 (FPS) | 16.01 (FPS) | / | 【Pass】 | |
| SD3.5 | 8.1B | 全参微调 | 1x8 | BF16 | 26.20 (FPS) | 28.33 (FPS) | / | 【Pass】 | |
| 8.1B | Lora微调 | 1x8 | FP16 | 47.93 (FPS) | 47.95 (FPS) | / | 【Pass】 | ||
| Flux | 12B | 全参微调 | 1x8 | BF16 | 55.23 (FPS) | 53.65 (FPS) | / | 【Pass】 | |
| Flux-Kontext | 12B | 全参微调 | 1x8 | BF16 | 1.97 (FPS) | 2.00 (FPS) | / | 【Pass】 | |
| Sana | 1.6B | Lora微调 | 1x8 | BF16 | 28.7 (FPS) | 32.8 (FPS) | / | 【Pass】 | |
| HiDream | 17B | Lora微调 | 1x8 | BF16 | 18.37 (FPS) | 19.61 (FPS) | / | 【Pass】 | |
| Kolors | 2.6B | 推理 | 1x1 | FP16 | / | / | / | 【Test】 | |
| Qwen-Image | 27B | Lora微调 | 1x8 | BF16 | 23.02 (FPS) | 21.54 (FPS) | / | 【Pass】 | |
| Qwen-Image-Edit | 27B | Lora微调 | 1x8 | BF16 | 20.59 (FPS) | 17.47 (FPS) | / | 【Test】 | |
| 多模态理解 | GLM-4.1V | 9B | 微调 | 1x8 | BF16 | 1074.64(TPS) | 908.49(TPS) | 707 | 【Pass】 |
| LLaVA 1.5 | 7B | 全参微调 | 1x8 | BF16 | 3632.31 (TPS) | 3757.98 (TPS) | 602 | 【Test】 | |
| InternVL 2.0 | 2B | 微调 | 1x8 | BF16 | 7653.12 (TPS) | 5089.99 (TPS) | 1813 | 【Pass】 | |
| 8B | 微调 | 1x8 | BF16 | 2914.39 (TPS) | 2492.87 (TPS) | 1813 | 【Pass】 | ||
| 26B | 微调 | 1x8 | BF16 | 750.12 (TPS) | 738.79 (TPS) | 1813 | 【Pass】 | ||
| 76B | 全参微调 | 8x16 | BF16 | 214 (TPS) | 191 (TPS) | 1813 | 【Pass】 | ||
| InternVL 2.5 | 78B | 微调 | 8x8 | BF16 | 228.33 | / | 1896 | 【Test】 | |
| InternVL 3.0 | 8B | 微调 | 1x8 | BF16 | 2344.58 (TPS) | 2211.93 (TPS) | 2653 | 【Pass】 | |
| 78B | 微调 | 4x8 (A3) | BF16 | 228.82 (TPS) | 283.15 (TPS) | 1932 | 【Pass】 | ||
| InternVL 3.5 | 30B | 微调 | 1x8 (A3) | BF16 | 52.76 (TPS) | 47.73 (TPS) | 201 | 【Test】 | |
| Qwen2-VL | 2B | 微调 | 1x8 | BF16 | 2941.17 (TPS) | 3004.04 (TPS) | 689 | 【Pass】 | |
| 7B | 微调 | 1x8 | BF16 | 1143.74 (TPS) | 1004.22 (TPS) | 689 | 【Pass】 | ||
| 72B | 微调 | 4x8 (A3) | BF16 | 261.25 (TPS) | 257.63 (TPS) | 689 | 【Pass】 | ||
| Qwen2.5-VL | 3B | 微调 | 1x8 | BF16 | 2047.19 (TPS) | 1876.66 (TPS) | 689 | 【Pass】 | |
| 7B | 微调 | 1x8 | BF16 | 1620.87 (TPS) | 1091.20 (TPS) | 689 | 【Pass】 | ||
| 32B | 微调 | 2x8 | BF16 | 257.50 (TPS) | / | 689 | 【Pass】 | ||
| 72B | 微调 | 4x8 (A3) | BF16 | 322.96 (TPS) | 256.28 (TPS) | 689 | 【Pass】 | ||
| Qwen3-VL | 8B | 微调 | 1x8 | BF16 | 146.54 (TPS) | 129.71 (TPS) | 179 | 【Test】 | |
| 30B | 微调 | 1x8 (A3) | BF16 | 179.57 (TPS) | / | 185 | 【Test】 | ||
| 235B | 微调 | 16x8 (A3) | BF16 | 598.05 (TPS) | / | 16116 | 【Test】 | ||
| Qwen2.5-Omni | 7B | 微调 | 1x8 | BF16 | 575.01 (TPS) | 534.28 (TPS) | 296 | 【Pass】 | |
| Qwen3-Omni | 30B | 微调 | 2x4 (A3) | BF16 | 131.3 (TPS) | 16.4 (TPS) | 288 | 【Test】 | |
| 语音识别 | Whisper | 1.5B | 预训练 | 1x8 | BF16 | 93.38 (SPS) | 109.23 (SPS) | / | 【Test】 |