From 483910d08b7b4640a9637a6df076bc70054a86da Mon Sep 17 00:00:00 2001 From: hao9656 Date: Mon, 8 Sep 2025 16:02:50 +0800 Subject: [PATCH 1/2] [Docs]modify qwen2.5omni heterogeneous parallel training description --- examples/qwen2.5omni/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/qwen2.5omni/README.md b/examples/qwen2.5omni/README.md index 64ea5d14..7a1026d0 100644 --- a/examples/qwen2.5omni/README.md +++ b/examples/qwen2.5omni/README.md @@ -401,7 +401,7 @@ bash examples/qwen2.5omni/finetune_qwen2_5_omni_7b.sh 配置脚本前需要完成前置准备工作,包括:**环境安装**、**权重下载及转换**、**数据集准备及处理**,详情可查看对应章节。 -其中“权重转换”需要根据设定的异构并行配置进行修改,例如Vit模块和Audio模块不切分,llm模块按TP4进行切分时,权重转换脚本命令如下: +其中“权重转换”需要根据设定的异构并行配置进行修改(当前仅支持DP和TP的异构并行),例如Vit模块和Audio模块不切分,llm模块按TP4进行切分时,权重转换脚本命令如下: ```bash # 7b @@ -445,7 +445,7 @@ GPT_ARGS=" ```json { - "image_encoder": { + "image_encoder": { "vision_encoder": {}, "vision_projector": {}, "tp":1, -- Gitee From a27ddcb3f172b94b7fcacc607352ec4978a88a58 Mon Sep 17 00:00:00 2001 From: hao9656 Date: Thu, 11 Sep 2025 11:12:06 +0800 Subject: [PATCH 2/2] [Docs]modify qwen2.5omni heterogeneous parallel training description --- checkpoint/vlm_model/mm_to_hf.py | 2 +- examples/qwen2.5omni/README.md | 9 ++++++++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/checkpoint/vlm_model/mm_to_hf.py b/checkpoint/vlm_model/mm_to_hf.py index 5b870b93..0f66909b 100644 --- a/checkpoint/vlm_model/mm_to_hf.py +++ b/checkpoint/vlm_model/mm_to_hf.py @@ -103,7 +103,7 @@ def merge_by_tp(tp_state_dicts: List[STATE_DICT_T], patterns: TP_PATTERN_T, tp_s if key.startswith(prefix): if size <= 0: merged_dict[key] = merger.merge(tp_values) - if size == 1: + elif size == 1: merged_dict[key] = tp_values[0] else: merged_dict[key] = merger.merge(tp_values[:size]) diff --git a/examples/qwen2.5omni/README.md b/examples/qwen2.5omni/README.md index 7a1026d0..a7c1509c 100644 --- a/examples/qwen2.5omni/README.md +++ b/examples/qwen2.5omni/README.md @@ -428,9 +428,16 @@ mm-convert Qwen2_5_OmniConverter hf_to_mm \ #### 2. 配置参数 -参考**微调**章节进行数据目录配置和模型保存加载等配置,需要注意的是在配置`examples/qwen2.5omni/finetune_qwen2_5_omni_7b.sh`时要增加`--hetero-parallel`开启异构并行训练 +参考**微调**章节进行数据目录配置和模型保存加载等配置,需要注意的是在配置`examples/qwen2.5omni/finetune_qwen2_5_omni_7b.sh`时要增加`--hetero-parallel`开启异构并行训练; + +注意llm的并行配置在finetune_qwen2_5_omni_7b.sh文件中定义,vit和audio的并行配置在model_7b.json文件中定义,vit和audio以及llm三者的gbs是一致的,需要关注llm的MBS配置; +例如vit和audio模块不切分,而llm采用tp4切分时,llm的MBS必须是自身TP的整数倍,以确保vit和audio的DP域能够均匀分配到整数的MBS值; ```shell +TP=4 +PP=1 +CP=1 +MBS=4 ... GPT_ARGS=" ... -- Gitee