# MindSpeed-MM **Repository Path**: zhangbangzheng/MindSpeed-MM ## Basic Information - **Project Name**: MindSpeed-MM - **Description**: 华为昇腾面向大规模分布式训练的多模态大模型套件，支撑多模态生成、多模态理解。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: https://gitee.com/ascend/MindSpeed-MM - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 243 - **Created**: 2025-06-04 - **Last Updated**: 2025-06-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

MindSpeed-MM是面向大规模分布式训练的昇腾多模态大模型套件，同时支持多模态生成及多模态理解，旨在为华为 [昇腾芯片](https://www.hiascend.com/) 提供端到端的多模态训练解决方案, 包含预置业界主流模型，数据工程，分布式训练及加速，预训练、微调、在线推理任务等特性。 --- # 🔥🔥🔥Latest News * [Apr. 03, 2025]: 🚀 MindSpeed-MM支持Qwen2.5VL-32B模型【Prototype】 * [Mar. 27, 2025]: 🚀 MindSpeed-MM支持Wan2.1-1.3B/14B模型【Prototype】 * [Mar. 26, 2025]: 🚀 MindSpeed-MM支持Qwen2.5VL-3B/7B/72B模型【Prototype】 * [Feb. 20, 2025]: 🚀 MindSpeed-MM支持InternVL2.5-78B模型【Prototype】 * [Feb. 18, 2025]: 🚀 MindSpeed-MM支持HunyuanVideo模型 * [Feb. 17, 2025]: 🔥 MindSpeed-MM支持Mindspeed-Core & Megatron 0.8.0版本 * [Feb. 15, 2025]: 🚀 MindSpeed-MM支持Sana模型 * [Jan. 24, 2025]: 🚀 MindSpeed-MM支持CogVideoX 1.5模型 * [Dec. 30, 2024]: 🔥 MindSpeed-MM版本1.0.0发布 * [Dec. 16, 2024]: 🚀 MindSpeed-MM支持Qihoo-T2X模型 * [Dec. 03, 2024]: 🚀 MindSpeed-MM支持SD3.5模型 * [Nov. 30, 2024]: 🎉 MindSpeed-MM支持多模态理解测评 * [Nov. 22, 2024]: 🚀 MindSpeed-MM支持CogVideoX模型 * [Nov. 06, 2024]: 🚀 MindSpeed-MM支持FLUX模型 * [Oct. 30, 2024]: 🚀 MindSpeed-MM支持OpenSoraPlan 1.3模型 * [Oct. 21, 2024]: 🚀 MindSpeed-MM支持InternVL2、以及Qwen2VL模型 * [Oct. 16, 2024]: 🌱 MindSpeed-MM首版本1.0.RC3发布 > 注： **Prototype**特性未经过充分验证，可能存在不稳定和bug问题，**beta**表示非商用特性 --- # 效果展示 ## 图生视频： OpensoraPlan 1.3 I2V

输入图片：

Prompt: A rocket ascends slowly into the sky

## 图生视频： Wan 2.1 I2V

输入图片：

Prompt: An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot.

## 文生图：Flux T2I

Prompt: A cat holding a sign that says hello world

Prompt: A cat holding a sign that says MindSpeed

## 理解模型：InternVL2 & Qwen2VL

Input image for both models:

Input text for both models: Please describe the image shortly

InternVL2推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm water. The water reflects the surrounding landscape, which includes dense forests and a mountain range in the background. The sky is partly cloudy, adding to the tranquil atmosphere of the scene

Qwen2VL推理结果: The image depicts a serene lakeside scene with a wooden dock extending into the calm waters. The dock is made of weathered wooden planks and leads to a small platform with a ladder, suggesting it is used for swimming or diving. The lake is surrounded by lush green forests and mountains in the background, creating a picturesque and tranquil setting. The sky is overcast, adding to the calm and peaceful atmosphere of the scene.

Input text for InternVL2: 请简短描述这张照片

InternVL2推理结果: 这张图片展示了一个宁静的湖泊，湖面平静，反射着天空和周围景物的影像。湖的中央有一个木制码头，延伸到湖中，码头上有几根柱子支撑。湖的远端是一片茂密的森林，树木高大，覆盖着茂密的绿色植被。森林的尽头是一座高耸的山峰，山峰上覆盖着积雪，显得格外壮丽。天空中有一些云朵，但整体上是晴朗的，阳光从云层中透出，照亮了整个湖面和周围的景色。这张图片整体给人一种宁静、祥和的感觉，仿佛是一个远离尘嚣的世外桃源

Input text for Qwen2VL: 请用中文简短描述这张照片

Qwen2VL推理结果: 这张图片展示了一座木制码头延伸到平静的湖面上，背景是连绵的山脉和茂密的森林。天空多云，整体色调偏冷，给人一种宁静和自然的感觉。

--- # 环境部署具体部署步骤请查看[部署文档](https://gitee.com/ascend/MindSpeed-MM/blob/master/docs/features/install_guide.md) --- # 快速上手快速上手操作请查看[快速上手文档](https://gitee.com/ascend/MindSpeed-MM/blob/master/docs/features/quick_start.md) --- # 特性/模型介绍 ## 已支持特性概览 | 模型 \ 特性 | [TP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/tensor-parallel.md) | [TP-SP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/sequence-parallel.md) | [VPP](docs/features/virtual_pipeline_parallel.md) | [PP](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/pipeline-parallel.md) | CP | [Distributed Optimizer](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/distributed-optimizer.md) | [Recomputation](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/recomputation.md) | [LoRA](./docs/features/lora_finetune.md) | |:-------------------:|:------:|:------:|:------:|:---------------------------------------------------------------------------------------:|:------:|:------:|:------:|:------:| | Wan2.1 | | | | | CP (Ulysses) | ✔ | ✔ | Prototype | | HunyuanVideo | ✔ | ✔ | | | CP (Ulysses) | ✔ | ✔ | | | CogVideoX系列-T2V | ✔ | ✔ | | | CP (Ulysses) | ✔ | ✔ | Prototype | | CogVideoX系列-I2V | ✔ | ✔ | | | CP (Ulysses) | ✔ | ✔ | Prototype | | Opensora1.2 | | | | | DSP | ✔ | ✔ | | | OpensoraPlan1.3-T2V | ✔ | ✔ | ✔ | ✔ | CP (Ulysses) | ✔ | ✔ | | | OpensoraPlan1.3-I2V | ✔ | ✔ | ✔ | ✔ | CP (Ulysses) | ✔ | ✔ | | | InternVL2-2B | | | ✔ | ✔ | | ✔ | ✔ | | | InternVL2-8B | | | ✔ | ✔ | | ✔ | ✔ | | | InternVL2-26B | | | ✔ | ✔ | | ✔ | ✔ | | | InternVL2-76B | | | ✔ | ✔ | | ✔ | ✔ | | | Qwen2VL-2B | | | | | | ✔ | ✔ | ✔ | | Qwen2VL-7B | ✔ | | | ✔ | | ✔ | ✔ | ✔ | | Qwen2VL-72B | ✔ | | | ✔ | | ✔ | ✔ | ✔ | | Qwen2.5VL-3B | | | | | | ✔ | | | | Qwen2.5VL-7B | ✔ | | | ✔ | | ✔ | | | | Qwen2.5VL-32B | ✔ | | | ✔ | | ✔ | | | | Qwen2.5VL-72B | ✔ | | | ✔ | | ✔ | | | 备注： * TP: [Tensor Parallel](https://arxiv.org/abs/1909.08053) * TP-SP: [Tensor Parallel with Sequence Parallel](https://arxiv.org/abs/2205.05198) * VPP: [Virtual Pipeline Parallel](https://arxiv.org/abs/2104.04473) * PP: [Pipeline Parallel](https://arxiv.org/abs/2104.04473) * DSP: [Dynamic Sequence Parallel](https://arxiv.org/abs/2403.10266) * CP (Ulysses): [Context Parallel](https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html) by leveraging [Deepspeed Ulysses](https://arxiv.org/abs/2309.14509) with Sequence Parallel * CP (Ring Attention): Context Parallel with [Ring Attention](https://arxiv.org/abs/2310.01889) * Distributed Optimizer: [Zero Redundancy Optimizer](https://arxiv.org/abs/1910.02054) (ZeRO) * Recomputation: Reducing Activation [Recomputation](https://arxiv.org/abs/2205.05198) * LoRA: [Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) --- ## 配套版本与支持模型【现版本实测性能（硬件信息：Atlas 900 A2 PODc）】下述列表中支持的模型，我们在各模型的**README**文件中提供了相应的使用说明，里面有详细的模型训练、推理、微调等流程 **模型**列中的超链接指向各模型的文件夹地址， **参数量**列中的超链接指向模型的社区资源地址 **认证**【Pass】表示已经通过测试的模型，【Test】表示测试中的模型 Samples per Second 为 (SPS); Frames per Second 为 (FPS); Tokens per Second 为 (TPS) (注：此处SPS、FPS展示集群吞吐；TPS展示单卡吞吐) **平均序列长度**是指在性能测试过程中所使用数据集的平均序列长度，通过统计各个序列长度的出现频率进行加权平均计算得出 **亲和场景**为调整少量结构或参数，使得模型更加亲和昇腾，性能更优 **A3** 为硬件 Atlas A3 训练系列产品

MindSpeed-MM模型列表
模型任务	模型	参数量	任务	集群	精度格式	NPU性能	参考性能	平均序列长度	认证
多模态生成
	Wan2.1-T2V	1.3B	预训练	1x8	BF16	0.918 (SPS)	1.04 (SPS)	/	【Test】
		1.3B	Lora微调	1x8	BF16	0.954 (SPS)	1.042 (SPS)	/	【Test】
		14B	预训练	1x8	BF16	0.160 (SPS)	0.160 (SPS)	/	【Test】
		14B	Lora微调	1x8	BF16	0.179 (SPS)	0.174 (SPS)	/	【Test】
	Wan2.1-I2V	1.3B	预训练	1x8	BF16	0.76 (SPS)	/	/	【Test】
		14B	预训练	1x8	BF16	0.130 (SPS)	/	/	【Test】
		14B	Lora微调	1x8	BF16	0.179 (SPS)	0.173 (SPS)	/	【Test】
	HunyuanVideo	13B	预训练	1x8	BF16	0.171 (SPS)	0.181 (SPS)	/	【Test】
	OpenSora 1.0	5.5B	预训练	1x8	BF16	3.18 (SPS)	2.04 (SPS)	/	【Pass】
	OpenSora 1.2	5.2B	预训练	1x8	BF16	7.31 (SPS)	8.15 (SPS)	/	【Test】
	OpenSoraPlan 1.2	8.7B	预训练	1x8	BF16	0.42 (SPS)	0.37 (SPS)	/	【Pass】
	OpenSoraPlan 1.3-T2V	8.6B	预训练	1x8	BF16	1.29 (SPS)	1.27 (SPS)	/	【Pass】
	OpenSoraPlan 1.3-I2V	8.6B	预训练	1x8	BF16	1.17 (SPS)	1.15 (SPS)	/	【Pass】
	WFVAE	0.18B	预训练	1x8	BF16	23.860 (SPS)	26.091 (SPS)	/	【Pass】
	CogVideoX-T2V	5B	预训练	1x8	BF16	0.37 (SPS)	0.46 (SPS)	/	【Pass】
	CogVideoX-I2V	5B	预训练	1x8	BF16	0.37 (SPS)	0.46 (SPS)	/	【Pass】
	CogVideoX 1.5-T2V	5B	预训练	1x8	BF16	1.88 (SPS)	2.09 (SPS)	/	【Pass】
	CogVideoX 1.5-T2V	5B	Lora微调	1x8	BF16	2.89 (SPS)	3.03 (SPS)	/	【Test】
	CogVideoX 1.5-I2V	5B	预训练	1x8	BF16	1.81 (SPS)	2.01 (SPS)	/	【Pass】
	CogVideoX 1.5-I2V	5B	Lora微调	1x8	BF16	3.44 (SPS)	3.92 (SPS)	/	【Test】
	Qihoo-T2X	1.1B	推理	1x1	BF16	/	/	/	【奇虎360贡献】
	SDXL	3.5B	预训练	1x8	BF16	29.92 (FPS)	30.65 (FPS)	/	【Pass】
	SDXL	3.5B	预训练	1x8	FP16	28.51 (FPS)	30.23 (FPS)	/	【Pass】
	SD3	2B	全参微调	1x8	BF16	16.09 (FPS)	16.01 (FPS)	/	【Pass】
	SD3.5	8.1B	全参微调	1x8	BF16	26.20 (FPS)	28.33 (FPS)	/	【Pass】
	SD3.5	8.1B	Lora微调	1x8	FP16	47.93 (FPS)	47.95 (FPS)	/	【Pass】
	Flux	12B	全参微调	1x8	BF16	55.23 (FPS)	53.65 (FPS)	/	【Pass】
	Sana	1.6B	Lora微调	1x8	BF16	28.7 (FPS)	32.8 (FPS)	/	【Pass】
	Kolors	2.6B	推理	1x1	FP16	/	/	/	【Test】
多模态理解	LLaVA 1.5	7B	全参微调	1x8	BF16	48.27 (SPS)	49.94 (SPS)	/	【Test】
	InternVL 2.0	2B	微调	1x8	BF16	33.77 (SPS)	22.46 (SPS)	/	【Pass】
		8B	微调	1x8	BF16	12.86 (SPS)	11.00 (SPS)	/	【Pass】
		26B	微调	1x8	BF16	3.31 (SPS)	3.26 (SPS)	/	【Pass】
		76B	全参微调	8x16	BF16	214 (TPS)	191 (TPS)	/	【Test】
	InternVL 2.5	78B	微调	8x8	BF16	/	/	/	【Test】
	Qwen2-VL	2B	微调	1x8	BF16	34.15 (SPS)	34.88 (SPS)	563	【Pass】
		7B	微调	1x8	BF16	13.28 (SPS)	11.66 (SPS)	563	【Pass】
		72B	微调	4x8 (A3)	BF16	261.25 (TPS)	257.63 (TPS)	563	【Pass】
	Qwen2.5-VL	3B	微调	1x8	BF16	23.77 (SPS)	21.79 (SPS)	563	【Test】
		7B	微调	1x8	BF16	14.20 (SPS)	12.67 (SPS)	563	【Test】
		32B	微调	2x8	BF16	249.94 (TPS)	/	563	【Test】
		72B	微调	8x8	BF16	/	/	563	【Test】
语音识别	Whisper	1.5B	预训练	1x8	BF16	93.38 (SPS)	109.23 (SPS)	/	【Test】

---

其他已适配昇腾的多模态大模型
模型	参数量	任务	集群	精度格式	NPU性能	参考性能	认证
CogVLM-2	8B	微调	1x8	BF16	3.9 (s/it)	3.3 (s/it)	【Pass】
PLLaVA	7B	预训练	1x8	BF16	0.841 (s/step)	0.935 (s/step)	【Pass】
PLLaVA	7B	预训练	1x8	FP32	0.935 (s/step)	1.08 (s/step)	【Pass】
miniCPM-V 2.5	8B	全参微调	1x8	BF16	1046 (s)/50-200steps	847 (s)/50-200steps	【Pass】
miniCPM-V 2.5	8B	Lora微调	1x8	BF16	603 (s)/50-200steps	490 (s)/50-200steps	【Pass】
HunYuanDiT	1.5B	预训练	1x8	BF16	1099.5 (ms/step)	1059.3 (ms/step)	【Pass】
InternVL 1.5	26B	微调训练	1x8	BF16	4.952 (FPS)	5.151 (FPS)	【Pass】

--- # 特性规划 * 【新模型】 JanusPro * 【模型特性】 CogVideoX: PP * 【模型特性】 OpensoraPlan1.3: CP (Ring Attention) * 【模型特性】 Qwen2VL: VPP, CP (Ulysses & Ring Attention) * 【模型特性】 InternVL2: TP, CP (Ulysses & Ring Attention) * 【基础特性】 Distrain --- # 工具使用 ## 昇腾Profiling采集工具 MindSpeed-MM集成了昇腾profiling采集工具，以提供对模型运行情况的分析。该工具能够依照配置采集模型的算子、显存等关键信息，同时支持动静态两种采集方式，协助开发者分析模型瓶颈，并可根据实际场景需求选择使用。具体方法见 [README](./mindspeed_mm/tools/README.md) 的profiling章节 ## MindStudio Insight性能分析工具针对大模型集群场景的性能调优，这里推荐一款优秀的可视化调优工具MindStudio Insight。 MindStudio Insight提供了包括Timeline视图、通信分析、计算耗时等的可视化呈现，以便用户分析潜在的性能瓶颈，并指导如何采取措施消除或减少这些瓶颈。具体使用方法见[《MindStudio Insight操作指南》](https://www.hiascend.com/document/detail/zh/mindstudio/70RC3/msinsightug/msascendinsightug/Insight_userguide_0002.html)，下载地址[《MindStudio Insight》](https://support.huawei.com/enterprise/zh/ascend-computing/mindstudio-pid-251913966/software/262029358?idAbsPath=fixnode01%7C23710424%7C251366513%7C22892968%7C251913966) --- # 版本维护 MindSpeed-MM版本有以下五个维护阶段： | **状态** | **时间** | **说明** | | ------------------- | -------- |----------------------------------------------------------------------| | 计划 | 1—3 个月 | 计划特性 | | 开发 | 3 个月 | 开发特性 | | 维护 | 6-12 个月| 合入所有已解决的问题并发布版本，针对不同的MindSpeed-MM版本采取不同的维护策略，常规版本和长期支持版本维护周期分别为6个月和12个月 | | 无维护 | 0—3 个月 | 合入所有已解决的问题，无专职维护人员，无版本发布 | | 生命周期终止（EOL） | N/A | 分支不再接受任何修改 | MindSpeed-MM已发布版本维护策略： | **MindSpeed-MM版本** | **维护策略** | **当前状态** | **发布时间** | **后续状态** | **EOL日期** | |-----------------|-----------|--------|------------|-----------------------|-----------| | 2.0.0 | 常规版本 | 维护 | 2025/03/30 | 预计2025/09/30起无维护 | | 1.0.0 | 常规版本 | 维护 | 2024/12/30 | 预计2025/06/30起无维护 | | | 1.0.RC3 | 常规版本 | 维护 | 2024/09/30 | 预计2025/03/30起无维护 | | --- # 常见问题相关FAQ请参考链接：[FAQ](./docs/FAQ.md) --- # 相关资源 1. [面向大规模分布式训练的多模态套件](https://mp.weixin.qq.com/s/Qiw_qThKA72T0lLOSpjkKw) 2. [凭借昇腾澎湃算力，Open-Sora Plan实现电影级视频生成](https://mp.weixin.qq.com/s/KY2tLthhre-SRbuWka3c2w) 3. [MindSpeed-MM支持主流多模态理解大模型，性能实现大幅提升！](https://mp.weixin.qq.com/s/3pZRy24ITyKl3nGc33Sq7w) 4. [基于昇腾原生训练！中大和360联合打造多模态任务新范式Qihoo-T2X](https://mp.weixin.qq.com/s/zQAy_hbL9cR3c8-NO6lKnA) 5. [基于昇腾MindSpeed MM玩转Wan2.1视频生成SOTA模型](https://mp.weixin.qq.com/s/g2ShV2F6YpoVAniw6CBN_w) 6. [多模态理解SOTA模型开箱即用，MindSpeed MM支持Qwen2.5-VL最佳实践](https://mp.weixin.qq.com/s/ac7RUWw79stunwQIyC-ykQ) --- # 安全申明 [MindSpeed MM 安全申明](https://gitee.com/ascend/MindSpeed-MM/blob/master/docs/SECURITYNOTE.md) --- # 免责声明 ## 致MindSpeed-MM使用者 1. MindSpeed-MM提供的模型仅供您用于非商业目的。 2. 对于各模型，MindSpeed-MM平台仅提示性地向您建议可用于训练的数据集，华为不提供任何数据集，如您使用这些数据集进行训练，请您特别注意应遵守对应数据集的License，如您因使用数据集而产生侵权纠纷，华为不承担任何责任。 3. 如您在使用MindSpeed-MM模型过程中，发现任何问题（包括但不限于功能问题、合规问题），请在Gitee提交issue，我们将及时审视并解决。 ## 致数据集所有者如果您不希望您的数据集在MindSpeed-MM中的模型被提及，或希望更新MindSpeed-MM中的模型关于您的数据集的描述，请在Gitee提交issue，我们将根据您的issue要求删除或更新您的数据集描述。衷心感谢您对MindSpeed-MM的理解和贡献。 ## License声明 Ascend MindSpeed-MM提供的模型，如模型目录下存在License的，以该License为准。如模型目录下不存在License的，以Apache 2.0许可证许可，对应许可证文本可查阅Ascend MindSpeed-MM根目录。 --- # 致谢 MindSpeed-MM 由华为公司的下列部门及昇腾生态合作伙伴联合贡献：华为公司： * 计算产品线 * 公共开发部 * 2012实验室 * 华为云生态合作伙伴： * 360 AI Research * 北大OpenSoraPlan团队 * 微信技术架构部基础架构中心感谢来自社区的每一个PR，欢迎贡献 MindSpeed-MM ---