diff --git a/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md b/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md index 7bc34a88d656840b27ace049d9d59ca1fc32ae1b..1ca95315d8c546a688c555f4ca9cb03f4200793d 100644 --- a/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md +++ b/debug/accuracy_tools/msprobe/docs/21.visualization_PyTorch.md @@ -45,15 +45,15 @@ msprobe -f pytorch graph -i ./compare.json -o ./output ``` **命令行参数说明**: -| 参数名 | 说明 | 是否必选 | -|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| -| -f 或 --framework | 指定训练框架。pytorch。 | 是 | -| -i 或 --input_path | 指定比对文件,参考[比对文件说明](#313-比对文件说明) | 是 | -| -o 或 --output_path | 配置比对结果文件存盘目录,str 类型。文件名称基于时间戳自动生成,格式为:`compare_{timestamp}.vis或build_{timestamp}.vis`。 | 是 | -| -lm 或 --layer_mapping | 跨套件比对,例如同一个模型分别使用了DeepSpeed和Megatron套件的比对场景。配置该参数时表示开启跨套件Layer层的比对功能,指定模型代码中的Layer层后,可以识别对应dump数据中的模块或API。需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(Layer)](#71-自定义映射文件layer),如何配置自定义映射文件请参考[模型分级可视化如何配置layer mapping映射文件](./visualization/layer_mapping_example.md)。 配置该参数后,将仅按节点名称进行比对,忽略节点的 type 和 shape。如果调试侧和标杆侧有名称不同的节点,则需要配置自定义映射文件,-lm参数传入自定义映射文件路径;如果调试侧和标杆侧节点名称相同,则仅指定-lm即可。 | 否 | -| -oc 或 --overflow_check | 是否开启溢出检测模式,开启后会在输出vis文件中(`compare_{timestamp}.vis或build_{timestamp}.vis`)对每个溢出节点进行标记溢出等级,溢出等级说明参考[溢出等级说明](#312-溢出等级说明) | 否 | -| -f 或 --fuzzy_match | 是否开启模糊匹配,bool类型。模糊匹配说明参考[匹配说明](#311-匹配说明) | 否 | -| -cs 或 --complete_stack | 是否使用完整的堆栈信息,bool类型。默认使用精简的堆栈信息,数据量小有助于增加流畅度。完整堆栈和精简堆栈信息参考[堆栈信息说明](#72-堆栈信息说明) | 否 | +| 参数名 | 说明 | 是否必选 | +|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| +| -f 或 --framework | 指定训练框架。pytorch。 | 是 | +| -i 或 --input_path | 指定比对文件,参考[比对文件说明](#313-比对文件说明) | 是 | +| -o 或 --output_path | 配置比对结果文件存盘目录,str 类型。文件名称基于时间戳自动生成,格式为:`compare_{timestamp}.vis或build_{timestamp}.vis`。 | 是 | +| -lm 或 --layer_mapping | 跨套件比对,例如同一个模型分别使用了DeepSpeed和Megatron套件的比对场景。配置该参数时表示开启跨套件Layer层的比对功能,指定模型代码中的Layer层后,可以识别对应dump数据中的模块或API。需要指定自定义映射文件*.yaml。自定义映射文件的格式请参见[自定义映射文件(Layer)](#71-自定义映射文件layer),如何配置自定义映射文件请参考[模型分级可视化如何配置layer mapping映射文件](./visualization/layer_mapping_example.md)。 配置该参数后,将仅按节点名称进行比对,忽略节点的 type 和 shape。如果调试侧和标杆侧有名称不同的节点,则需要配置自定义映射文件,-lm参数传入自定义映射文件路径;如果调试侧和标杆侧节点名称相同,则仅指定-lm即可。

可参考的实际案例:[MindSpeed&LLamaFactory数据采集和自动比对](./visualization/mindspeed_llamafactory_mapping.md) | 否 | +| -oc 或 --overflow_check | 是否开启溢出检测模式,开启后会在输出vis文件中(`compare_{timestamp}.vis或build_{timestamp}.vis`)对每个溢出节点进行标记溢出等级,溢出等级说明参考[溢出等级说明](#312-溢出等级说明) | 否 | +| -f 或 --fuzzy_match | 是否开启模糊匹配,bool类型。模糊匹配说明参考[匹配说明](#311-匹配说明) | 否 | +| -cs 或 --complete_stack | 是否使用完整的堆栈信息,bool类型。默认使用精简的堆栈信息,数据量小有助于增加流畅度。完整堆栈和精简堆栈信息参考[堆栈信息说明](#72-堆栈信息说明) | 否 | #### 3.1.1 匹配说明 diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/1.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/1.png new file mode 100644 index 0000000000000000000000000000000000000000..791befb7d42725d2ab8fe24377f223893080e36b Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/1.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/2.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/2.png new file mode 100644 index 0000000000000000000000000000000000000000..f8ac5b391d462d2b2e720fc1189762821a9c03eb Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/2.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/3.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/3.png new file mode 100644 index 0000000000000000000000000000000000000000..7e876f81083f368dd20d66c3a91e37943344904a Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/3.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/4.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/4.png new file mode 100644 index 0000000000000000000000000000000000000000..e4798076a65454a1507e923ebe8111fdaa4926bb Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/4.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/5.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/5.png new file mode 100644 index 0000000000000000000000000000000000000000..9a7b68f54ec6ba1e43528fdae20ae495d50a0705 Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/5.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/6.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/6.png new file mode 100644 index 0000000000000000000000000000000000000000..9bc7c4621f884229c8c99a41c53499cad151dbe8 Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/6.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/7.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/7.png new file mode 100644 index 0000000000000000000000000000000000000000..1c8731a470c00bd108358407b2a13a9c613d417c Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/7.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory-qwen25vl.txt b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory-qwen25vl.txt new file mode 100644 index 0000000000000000000000000000000000000000..b0322dc208b0aa0490e4adb18bcd8b6ec7b1b557 --- /dev/null +++ b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory-qwen25vl.txt @@ -0,0 +1,59 @@ +DeepSpeedEngine( + (module): Qwen2_5_VLForConditionalGeneration( + (visual): Qwen2_5_VisionTransformerPretrainedModel( + (patch_embed): Qwen2_5_VisionPatchEmbed( + (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False) + ) + (rotary_pos_emb): Qwen2_5_VisionRotaryEmbedding() + (blocks): ModuleList( + (0-15): 16 x Qwen2_5_VLVisionBlock( + (norm1): Qwen2RMSNorm((0,), eps=1e-06) + (norm2): Qwen2RMSNorm((0,), eps=1e-06) + (attn): Qwen2_5_VLVisionSdpaAttention( + (qkv): Linear(in_features=1280, out_features=3840, bias=True) + (proj): Linear(in_features=1280, out_features=1280, bias=True) + ) + (mlp): Qwen2_5_VLMLP( + (gate_proj): Linear(in_features=1280, out_features=3420, bias=True) + (up_proj): Linear(in_features=1280, out_features=3420, bias=True) + (down_proj): Linear(in_features=3420, out_features=1280, bias=True) + (act_fn): SiLU() + ) + ) + ) + (merger): Qwen2_5_VLPatchMerger( + (ln_q): Qwen2RMSNorm((0,), eps=1e-06) + (mlp): Sequential( + (0): Linear(in_features=5120, out_features=5120, bias=True) + (1): GELU(approximate='none') + (2): Linear(in_features=5120, out_features=2048, bias=True) + ) + ) + ) + (model): Qwen2_5_VLModel( + (embed_tokens): Embedding(151936, 2048) + (layers): ModuleList( + (0-7): 8 x Qwen2_5_VLDecoderLayer( + (self_attn): Qwen2_5_VLSdpaAttention( + (q_proj): Linear(in_features=2048, out_features=2048, bias=True) + (k_proj): Linear(in_features=2048, out_features=256, bias=True) + (v_proj): Linear(in_features=2048, out_features=256, bias=True) + (o_proj): Linear(in_features=2048, out_features=2048, bias=False) + (rotary_emb): Qwen2_5_VLRotaryEmbedding() + ) + (mlp): Qwen2MLP( + (gate_proj): Linear(in_features=2048, out_features=11008, bias=False) + (up_proj): Linear(in_features=2048, out_features=11008, bias=False) + (down_proj): Linear(in_features=11008, out_features=2048, bias=False) + (act_fn): SiLU() + ) + (input_layernorm): Qwen2RMSNorm((0,), eps=1e-06) + (post_attention_layernorm): Qwen2RMSNorm((0,), eps=1e-06) + ) + ) + (norm): Qwen2RMSNorm((0,), eps=1e-06) + (rotary_emb): Qwen2_5_VLRotaryEmbedding() + ) + (lm_head): Linear(in_features=2048, out_features=151936, bias=False) + ) +) diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory1.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory1.png new file mode 100644 index 0000000000000000000000000000000000000000..4be2f18a16cc28b740f5c1e7f181d18e9b1463c5 Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory1.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory2.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory2.png new file mode 100644 index 0000000000000000000000000000000000000000..21bd1b887c7cddc3f9fad2473a4746d7f4098587 Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/llamafactory2.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed-mm-qwen25vl.txt b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed-mm-qwen25vl.txt new file mode 100644 index 0000000000000000000000000000000000000000..9f4ba4fd044482fddbc9d24d88d5b29980ebcc7e --- /dev/null +++ b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed-mm-qwen25vl.txt @@ -0,0 +1,80 @@ +[DistributedDataParallel( + (module): Float16Module( + (module): VLMModel( + (image_encoder): VisionModel( + (encoder): Qwen2VLViT( + (patch_embed): PatchEmbed( + (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False) + ) + (rotary_pos_emb): VisionRotaryEmbedding() + (blocks): Qwen2VLVisionTransformerBlock( + (layers): ModuleList( + (0-15): 16 x TransformerLayer( + (input_layernorm): RMSNorm() + (self_attention): Qwen2vlVitSelfAttention( + (core_attention): DotProductAttention( + (scale_mask_softmax): FusedScaleMaskSoftmax() + (attention_dropout): Dropout(p=0.0, inplace=False) + ) + (linear_proj): RowParallelLinear() + (linear_qkv): ColumnParallelLinear() + (q_layernorm): IdentityOp() + (k_layernorm): IdentityOp() + ) + (pre_cross_attn_layernorm): IdentityOp() + (cross_attention): IdentityOp() + (cross_attn_bda): IdentityFuncOp() + (pre_mlp_layernorm): RMSNorm() + (mlp): MLP( + (linear_fc1): ColumnParallelLinear() + (linear_fc2): RowParallelLinear() + ) + ) + ) + ) + ) + (projector): MultimodalProjector( + (layernorm): RMSNorm() + (encoder): MLP( + (linear_fc1): ColumnParallelLinear() + (linear_fc2): RowParallelLinear() + ) + ) + ) + (text_decoder): MMGPTModel( + (embedding): LanguageModelEmbedding( + (word_embeddings): VocabParallelEmbedding() + (embedding_dropout): Dropout(p=0.0, inplace=False) + ) + (rotary_pos_emb): Qwen2VLRotaryEmbedding_llm() + (decoder): TransformerBlock( + (layers): ModuleList( + (0-7): 8 x TransformerLayer( + (input_layernorm): RMSNorm() + (self_attention): Qwen2vlSelfAttention( + (core_attention): DotProductAttention( + (scale_mask_softmax): FusedScaleMaskSoftmax() + (attention_dropout): Dropout(p=0.0, inplace=False) + ) + (linear_proj): RowParallelLinear() + (linear_qkv): ColumnParallelLinear() + (q_layernorm): IdentityOp() + (k_layernorm): IdentityOp() + ) + (pre_cross_attn_layernorm): IdentityOp() + (cross_attention): IdentityOp() + (cross_attn_bda): IdentityFuncOp() + (pre_mlp_layernorm): RMSNorm() + (mlp): MLP( + (linear_fc1): ColumnParallelLinear() + (linear_fc2): RowParallelLinear() + ) + ) + ) + (final_layernorm): RMSNorm() + ) + (output_layer): ColumnParallelLinear() + ) + ) + ) +)] diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed1.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed1.png new file mode 100644 index 0000000000000000000000000000000000000000..7346684b53a59f1a67a894fcdae143b6a0b142c2 Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed1.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed2.png b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed2.png new file mode 100644 index 0000000000000000000000000000000000000000..c3d485a5a9dbcf58e2737f3d49f830bfc9d1b54d Binary files /dev/null and b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactoary_img/mindspeed2.png differ diff --git a/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactory_mapping.md b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactory_mapping.md new file mode 100644 index 0000000000000000000000000000000000000000..c9d93e532a6cf3f34931e547aa4003e28feaedf4 --- /dev/null +++ b/debug/accuracy_tools/msprobe/docs/visualization/mindspeed_llamafactory_mapping.md @@ -0,0 +1,330 @@ +# MindSpeed&LLamaFactory数据采集和自动比对 + +## 0. 使用场景 +基于MindSpeed和LLamaFactory框架实现的同一模型,在模型超参、环境变量、初始权重、训练数据等一致的前提下,训练过程中出现了精度差异,需要进行**整网比对**寻找精度差异点。 + +本文选取Qwen2.5vl和Qwen2.5模型,指导用户如何进行MindSpeed&LLamaFactory数据采集和自动比对。 + +## 1. 数据采集 + +### 1.1 准备数据采集配置文件 + +数据采集前需要准备一个json文件,本案例命名为config.json,其内容包含了数据采集的所需配置。 + +本案例使用的配置内容如下,更多配置请参考[config.json 配置示例](../03.config_examples.md),配置详细介绍请参考[配置文件介绍](../02.config_introduction.md)。 + +```json +{ + "task": "statistics", + "dump_path": "/home/data_dump", + "rank": [], + "step": [0], + "level": "mix", + "async_dump": false, + + "statistics": { + "scope": [], + "list": [], + "tensor_list": [], + "data_mode": ["all"], + "summary_mode": "statistics" + } +} +``` +请注意,在数据采集结束后将进行模型分级可视化比对,配置文件中的`level`需要配置为`L0`(模块数据)或`mix`(模块+API数据)。 + +### 1.2 添加msprobe工具采集接口 + +本案例使用的工具采集接口配置如下,更多配置和接口介绍请参考[PyTorch 场景的精度数据采集](../05.data_dump_PyTorch.md)。 + +#### 1.2.1 LLamaFactory数据采集 + +LLamaFactory依赖Transformers的底层能力,msprobe工具采集功能将添加在Transformers中。 + +以Transformers 4.49.0版本为例,通过`pip3 show Transformers`获取`Location路径`,打开`Location路径/transformers/trainer.py`文件。 + +1. 在trainer.py文件中添加工具接口,初始化数据采集配置以及固定随机数: + + ![llamafactory1.png](./mindspeed_llamafactoary_img/llamafactory1.png) + +2. 在trainer.py文件**训练循环逻辑位置**添加工具接口,控制数据采集的启动、停止和step计数: + + ![llamafactory2.png](./mindspeed_llamafactoary_img/llamafactory2.png) + +3. 配置完成,启动模型训练脚本,数据将自动采集,落盘数据格式请参考[PyTorch 场景的精度数据采集-dump-结果文件介绍](../05.data_dump_PyTorch.md#3-dump-结果文件介绍)。 + +#### 1.2.2 MindSpeed数据采集 + +打开training.py文件,MindSpeed-MM路径为`mindspeed_mm/training.py`,MindSpeed-LLM路径为`mindspeed_llm/training/training.py`。 + +1. 在training.py文件中添加工具接口,初始化数据采集配置以及固定随机数: + + ![mindspeed1.png](./mindspeed_llamafactoary_img/mindspeed1.png) + +2. 在training.py文件**训练循环逻辑位置**添加工具接口,控制数据采集的启动、停止和step计数: + + ![mindspeed2.png](./mindspeed_llamafactoary_img/mindspeed2.png) + +3. 配置完成,启动模型训练脚本,数据将自动采集,落盘数据格式请参考[PyTorch 场景的精度数据采集-dump-结果文件介绍](../05.data_dump_PyTorch.md#3-dump-结果文件介绍)。 + +## 2. 自动比对 + +### 2.1 模型分级可视化比对 + +该功能将msprobe工具dump的精度数据进行解析,还原模型图结构,实现模型各个层级的精度数据比对,方便用户理解模型结构、分析精度问题。 + +我们将使用以下命令行进行模型分级可视化比对: + +``` +msprobe -f pytorch graph -i ./compare.json -o ./output -lm ./layer_mapping.yaml +``` +具体的参数说明请点击查看[分级可视化构图比对-构图命令行说明](../21.visualization_PyTorch.md#31-构图命令行说明)。 + +在基于MindSpeed和LLamaFactory框架的模型比对场景中,**-lm参数是必填的**,-lm参数所需的layer_mapping.yaml如何配置将在下面的章节进行介绍。 + +模型分级可视化比对完成后,可通过tensorboard(需安装[tb_graph_ascend插件](../21.visualization_PyTorch.md#1依赖安装))启动端口,在浏览器页面查看模型结构和精度比对结果,请参考[分级可视化构图比对-启动tensorboard](../21.visualization_PyTorch.md#4启动tensorboard)和[分级可视化构图比对-浏览器查看](../21.visualization_PyTorch.md#5浏览器查看)。 + +### 2.2 layer_mapping映射文件配置 +msprobe工具的比对功能会将比对双方dump名称一致的数据进行比对。由于MindSpeed和LLamaFactory框架代码实现的差异,一些模型层级和层级名称有所不同导致无法进行匹配,需要进行layer层名称映射,才能够比对。 + +#### 2.2.1 layer_mapping映射文件模板 + +此处提供了Qwen2.5vl和Qwen2.5模型的layer_mapping映射文件模板,可直接使用。**如果您使用其他模型,或对MindSpeed和LLamaFactory框架进行过定制开发修改过框架源码,此layer_mapping映射文件模板可能会失效,请按照后续步骤修改layer_mapping映射文件模板**。 + +每个模型有两个layer_mapping映射文件模板,分别是NPU侧为Mindspeed Bench侧为LLamaFactory,以及NPU侧为LLamaFactory Bench侧为Mindspeed,映射内容有所不同。 + +文件名格式:\*.yaml,*为文件名,可自定义。本文命名为layer_mapping.yaml。 + +**Qwen2.5vl** + +```yaml +# NPU侧为Mindspeed-MM, Bench侧为LLamaFactory +TopLayer: + 0.module: module + +Float16Module: + module.image_encoder: visual + module.text_decoder: model + +VisionModel: + encoder.patch_embed: patch_embed + encoder.rotary_pos_emb: rotary_pos_emb + encoder.blocks.layers: blocks + projector: merger + +TransformerLayer: + input_layernorm: norm1 + self_attention: attn + pre_mlp_layernorm: norm2 + +Qwen2vlVitSelfAttention: + linear_qkv: qkv + linear_proj: proj + +MLP: + linear_fc1: up_proj + linear_fc2: down_proj + +MultimodalProjector: + layernorm: ln_q + encoder: mlp + encoder.linear_fc1: mlp.0 + encoder.linear_fc2: mlp.2 + +MMGPTModel: + embedding.word_embeddings: embed_tokens + rotary_pos_emb: rotary_emb + decoder.layers: layers + decoder.final_layernorm: norm + output_layer: lm_head +``` +```yaml +# NPU侧为LLamaFactory, Bench侧为Mindspeed-MM +TopLayer: + module: 0.module + +Qwen2_5_VLForConditionalGeneration: + visual: module.image_encoder + model: module.text_decoder + lm_head: module.text_decoder.output_layer + +Qwen2_5_VisionTransformerPretrainedModel: + patch_embed: encoder.patch_embed + rotary_pos_emb: encoder.rotary_pos_emb + blocks: encoder.blocks.layers + merger: projector + +Qwen2_5_VLVisionBlock: + norm1: input_layernorm + attn: self_attention + norm2: pre_mlp_layernorm + +Qwen2_5_VLVisionSdpaAttention: + qkv: linear_qkv + proj: linear_proj + +Qwen2_5_VLMLP: + up_proj: linear_fc1 + down_proj: linear_fc2 + +Qwen2_5_VLPatchMerger: + ln_q: layernorm + mlp: encoder + mlp.0: encoder.linear_fc1 + mlp.2: encoder.linear_fc2 + +Qwen2_5_VLModel: + embed_tokens: embedding.word_embeddings + rotary_emb: rotary_pos_emb + layers: decoder.layers + norm: decoder.final_layernorm + +Qwen2_5_VLDecoderLayer: + self_attn: self_attention + self_attn.o_proj: self_attention.linear_proj + post_attention_layernorm: pre_mlp_layernorm +``` + +**Qwen2.5** + +```yaml +# NPU侧为Mindspeed-LLM, Bench侧为LLamaFactory +TopLayer: + 0.module: module + +Float16Module: + module: model + module.output_layer: lm_head + +GPTModel: + embedding.word_embeddings: embed_tokens + decoder.layers: layers + decoder.final_layernorm: norm + +TransformerLayer: + self_attention: self_attn + pre_mlp_layernorm: post_attention_layernorm + +SelfAttention: + linear_proj: o_proj + +MLP: + linear_fc1: up_proj + linear_fc2: down_proj +``` +```yaml +# NPU侧为LLamaFactory, Bench侧为Mindspeed-LLM +TopLayer: + module: 0.module + +Qwen2ForCausalLM: + model: module + lm_head: module.output_layer + +Qwen2Model: + embed_tokens: embedding.word_embeddings + layers: decoder.layers + norm: decoder.final_layernorm + +Qwen2DecoderLayer: + self_attn: self_attention + post_attention_layernorm: pre_mlp_layernorm + +Qwen2Attention: + o_proj: linear_proj + +Qwen2MLP: + up_proj: linear_fc1 + down_proj: linear_fc2 +``` + +#### 2.2.2 layer_mapping映射文件配置过程 +以Qwen2.5vl模型,NPU侧MindSpeed,Bench侧LLamaFactory为例。 + +1. 模型结构打印 + + 参考[添加msprobe工具采集接口](#12-添加msprobe工具采集接口)章节,配置过程中会在模型文件中添加`debugger.start(model=model)`,针对`start接口`中的`model`进行`print(model)`即可打印模型结构。 + + 打印的模型结构:[mindspeed-mm-qwen25vl.txt](./mindspeed_llamafactoary_img/mindspeed-mm-qwen25vl.txt),[llamafactory-qwen25vl.txt](./mindspeed_llamafactoary_img/llamafactory-qwen25vl.txt) + +2. 基于模型结构由外到内进行layer mapping配置 + +- 结构1 + + ![1.png](./mindspeed_llamafactoary_img/1.png) + + ```yaml + TopLayer: # 代表模型最顶层 + 0.module: module # MindSpeed的model类型是list,msprobe采集会对其添加数字前缀,代表当前模型在list中的索引,因此要做0.module -> module的映射 + + Float16Module: # MindSpeed的Float16Module与LLamaFactory的Qwen2_5_VLForConditionalGeneration同级,对它们的子层进行映射 + module.image_encoder: visual # MindSpeed的Float16Module多了一个子层module,跨层级用"."分隔,配置为module.image_encoder + module.text_decoder: model + ``` +- 结构2 + + ![2.png](./mindspeed_llamafactoary_img/2.png) + + ```yaml + VisionModel: # MindSpeed的VisionModel与LLamaFactory的Qwen2_5_VisionPatchEmbed同级,对它们的子层进行映射 + encoder.patch_embed: patch_embed + encoder.rotary_pos_emb: rotary_pos_emb + encoder.blocks.layers: blocks + projector: merger + ``` +- 结构3 + + ![3.png](./mindspeed_llamafactoary_img/3.png) + + ```yaml + TransformerLayer: # MindSpeed的TransformerLayer与LLamaFactory的Qwen2_5_VLVisionBlock同级,对它们的子层进行映射 + input_layernorm: norm1 + self_attention: attn + pre_mlp_layernorm: norm2 + ``` +- 结构4 + + ![4.png](./mindspeed_llamafactoary_img/4.png) + + ```yaml + Qwen2vlVitSelfAttention: # MindSpeed的Qwen2vlVitSelfAttention与LLamaFactory的Qwen2_5_VLVisionSdpaAttention同级,对它们的子层进行映射 + linear_qkv: qkv + linear_proj: proj + + MLP: # MindSpeed的MLP与LLamaFactory的Qwen2_5_VLMLP同级,对它们的子层进行映射 + linear_fc1: up_proj + linear_fc2: down_proj + ``` +- 结构5 + + ![5.png](./mindspeed_llamafactoary_img/5.png) + + ```yaml + MultimodalProjector: # MindSpeed的MultimodalProjector与LLamaFactory的Qwen2_5_VLPatchMerger同级,对它们的子层进行映射 + layernorm: ln_q + encoder: mlp + encoder.linear_fc1: mlp.0 + encoder.linear_fc2: mlp.2 + ``` +- 结构6 + + ![6.png](./mindspeed_llamafactoary_img/6.png) + + ```yaml + MMGPTModel: # MindSpeed的MMGPTModel与LLamaFactory的Qwen2_5_VLModel同级,对它们的子层进行映射 + embedding.word_embeddings: embed_tokens + rotary_pos_emb: rotary_emb + decoder.layers: layers + decoder.final_layernorm: norm + output_layer: lm_head + ``` +- 结构7 + + ![7.png](./mindspeed_llamafactoary_img/7.png) + + 由于TransformerLayer和MLP层已经配置过,无法再重复配置,此处的节点映射可通过[手动选择节点匹配](#23-手动选择节点匹配)完成。 + +### 2.3 手动选择节点匹配 +如果通过layer_mapping映射配置后,还有节点未匹配上,可通过浏览器界面,使用鼠标选择两个待匹配的灰色节点进行匹配。 + +请参考[分级可视化构图比对-手动选择节点匹配](../21.visualization_PyTorch.md#56-手动选择节点匹配)。