From c85008243551e9fcac6335262dd54fdf0c190388 Mon Sep 17 00:00:00 2001
From: SaiYao <saiyao@huawei.com>
Date: Thu, 9 Oct 2025 14:50:43 +0800
Subject: [PATCH] =?UTF-8?q?=E3=80=90=E8=AE=AD=E8=BD=AC=E6=8E=A8=E3=80=91?=
 =?UTF-8?q?=E6=9B=B4=E6=96=B0=E8=AE=AD=E8=BD=AC=E6=8E=A8=E4=BD=BF=E7=94=A8?=
 =?UTF-8?q?=E6=96=87=E6=A1=A3?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../docs/source_en/feature/safetensors.md     |  2 +-
 .../docs/source_en/guide/evaluation.md        | 56 +++++++++--------
 .../docs/source_zh_cn/feature/safetensors.md  |  2 +-
 .../docs/source_zh_cn/guide/evaluation.md     | 62 ++++++++++---------
 4 files changed, 65 insertions(+), 57 deletions(-)

diff --git a/docs/mindformers/docs/source_en/feature/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md
index d16885f3c0..4353963d4e 100644
--- a/docs/mindformers/docs/source_en/feature/safetensors.md
+++ b/docs/mindformers/docs/source_en/feature/safetensors.md
@@ -478,7 +478,7 @@ python toolkit/safetensors/unified_safetensors.py \
 
     **Note**: If `merged_ckpt_strategy.ckpt` already exists in the strategy folder and the folder path is still passed in, the script will first delete the old `merged_ckpt_strategy.ckpt` and merge it to create a new `merged_ckpt_strategy.ckpt` for weight conversion. Therefore, make sure that the folder has sufficient write permissions, otherwise the operation will report an error.
 - **mindspore_ckpt_dir**: Distributed weights path, please fill in the path of the folder where the source weights are located, the source weights should be stored in `model_dir/rank_x/xxx.safetensors` format, and fill in the folder path as `model_dir`.
-- **output_dir**: The path where the target weights will be saved. The default value is `"/new_llm_data/******/ckpt/nbg3_31b/tmp"`, i.e., the target weights will be placed in the `/new_llm_data/******/ckpt/nbg3_31b/tmp` directory.
+- **output_dir**: The path where the target weights will be saved. The default value is `"/path/output_dir"`. If this parameter is not configured, the target weights will be placed in the `/path/output_dir` directory by default.
 - **file_suffix**: The naming suffix of the target weights file. The default value is `"1_1"`, i.e. the target weights will be merged by searching for matching weight files in the `*1_1.safetensors` format.
 - **has_redundancy**: Whether the merged source weights are redundant weights. The default value is `True`, which means that the original weights used for merging are redundant. If the original weights are saved as de-redundant weights, it needs to be set to `False`.
 - **filter_out_param_prefix**: You can customize the parameters to be filtered out when merging weights, and the filtering rules are based on prefix name matching. For example, optimizer parameter `"adam_"`.
diff --git a/docs/mindformers/docs/source_en/guide/evaluation.md b/docs/mindformers/docs/source_en/guide/evaluation.md
index 5dd22cafd3..f4e9666741 100644
--- a/docs/mindformers/docs/source_en/guide/evaluation.md
+++ b/docs/mindformers/docs/source_en/guide/evaluation.md
@@ -467,34 +467,35 @@ After training, the model generally uses the trained model weights to run evalua
 
 ### Distributed Weight Merging
 
-If the weights generated after training are distributed, the existing distributed weights need to be merged into complete weights first, and then the weights can be loaded through online slicing to complete the inference task. Using the [safetensors weight merging script](https://gitee.com/mindspore/mindformers/blob/master/toolkit/safetensors/unified_safetensors.py) provided by MindSpore Transformers, the merged weights are in the format of complete weights.
+If the weights generated after training are distributed, the existing distributed weights need to be merged into complete weights first, and then the weights can be loaded through online slicing to complete the inference task.
 
-Parameters can be filled in as follows:
+MindSpore Transformers provides a [safetensors weight merging script](https://gitee.com/mindspore/mindformers/blob/master/toolkit/safetensors/unified_safetensors.py) that can be used to merge multiple safetensors weights obtained from distributed training to obtain the complete weights.
+
+The merging instruction is as follows (the Adam optimizer parameters are merged for the training weights in step 1000, and the redundancy removal function is enabled when saving the training weights):
 
 ```shell
 python toolkit/safetensors/unified_safetensors.py \
-  --src_strategy_dirs src_strategy_path_or_dir \
-  --mindspore_ckpt_dir mindspore_ckpt_dir\
-  --output_dir output_dir \
-  --file_suffix "1_1" \
-  --filter_out_param_prefix "adam_"
+  --src_strategy_dirs output/strategy \
+  --mindspore_ckpt_dir output/checkpoint \
+  --output_dir /path/to/unified_train_ckpt \
+  --file_suffix "1000_1" \
+  --filter_out_param_prefix "adam_" \
+  --has_redundancy False
 ```
 
 Script parameter description:
 
-- src_strategy_dirs: The path to the distributed strategy file corresponding to the source weight, usually saved in the output/strategy/ directory by default after starting the training task. Distributed weights need to be filled in according to the following situations:
-
-   1. Source weights enable pipeline parallelism: Weight conversion is based on the merged strategy file, fill in the path of the distributed strategy folder. The script will automatically merge all ckpt_strategy_rank_x.ckpt files in the folder and generate merged_ckpt_strategy.ckpt in the folder. If merged_ckpt_strategy.ckpt already exists, you can directly fill in the path of this file.
-   2. Source weights do not enable pipeline parallelism: Weight conversion can be based on any strategy file, just fill in the path of any ckpt_strategy_rank_x.ckpt file.
+- **src_strategy_dirs**: The path to the distributed strategy file corresponding to the source weights, usually saved by default in the `output/strategy/` directory after starting the training task. Distributed weights need to be filled in according to the following:
 
-   Note: If merged_ckpt_strategy.ckpt already exists in the strategy folder and the folder path is still passed in, the script will first delete the old merged_ckpt_strategy.ckpt and then merge to generate a new merged_ckpt_strategy.ckpt for weight conversion. Therefore, please ensure that the folder has sufficient write permissions, otherwise the operation will report an error.
+    - **Source weights turn on pipeline parallelism**: The weight conversion is based on the merged strategy files, fills in the path to the distributed strategies folder. The script will automatically merge all `ckpt_strategy_rank_x.ckpt` files in the folder and generate `merged_ckpt_strategy.ckpt` in the folder. If `merged_ckpt_strategy.ckpt` already exists, you can just fill in the path to that file.
+    - **Source weights turn off pipeline parallelism**: The weight conversion can be based on any of the strategy files, just fill in the path to any of the `ckpt_strategy_rank_x.ckpt` files.
 
-- mindspore_ckpt_dir: Path to distributed weights, please fill in the path of the folder where the source weights are located. The source weights should be stored in the format model_dir/rank_x/xxx.safetensors, and fill in the folder path as model_dir.
-- output_dir: Save path of target weights, the default value is `/new_llm_data/******/ckpt/nbg3_31b/tmp`, that is, the target weights will be placed in the `/new_llm_data/******/ckpt/nbg3_31b/tmp` directory.
-- file_suffix: Naming suffix of target weight files, the default value is "1_1", that is, the target weights will be searched in the format *1_1.safetensors.
-- has_redundancy: Whether the merged source weights are redundant weights, the default is True.
-- filter_out_param_prefix: When merging weights, you can customize to filter out some parameters, and the filtering rules match by prefix name, such as optimizer parameters "adam_".
-- max_process_num: Maximum number of processes for merging. Default value: 64.
+    **Note**: If `merged_ckpt_strategy.ckpt` already exists in the strategy folder and the folder path is still passed in, the script will first delete the old `merged_ckpt_strategy.ckpt` and merge it to create a new `merged_ckpt_strategy.ckpt` for weight conversion. Therefore, make sure that the folder has sufficient write permissions, otherwise the operation will report an error.
+- **mindspore_ckpt_dir**: Distributed weights path, please fill in the path of the folder where the source weights are located, the source weights should be stored in `model_dir/rank_x/xxx.safetensors` format, and fill in the folder path as `model_dir`.
+- **output_dir**: The path where the target weights will be saved. The default value is `"/path/output_dir"`. If this parameter is not configured, the target weights will be placed in the `/path/output_dir` directory by default.
+- **file_suffix**: The naming suffix of the target weights file. The default value is `"1_1"`, i.e. the target weights will be merged by searching for matching weight files in the `*1_1.safetensors` format.
+- **filter_out_param_prefix**: You can customize the parameters to be filtered out when merging weights, and the filtering rules are based on prefix name matching. For example, optimizer parameter `"adam_"`.
+- **has_redundancy**: Whether the merged source weights are redundant weights. The default value is `True`, which means that the original weights used for merging are redundant. If the original weights are saved as de-redundant weights, it needs to be set to `False`.
 
 ### Inference Configuration Development
 
@@ -504,17 +505,20 @@ Taking Qwen3 as an example, modify the [Qwen3 training configuration](https://gi
 
 Main modification points of Qwen3 training configuration include:
 
-- Modify the value of run_mode to "predict".
-- Add pretrained_model_dir: Hugging Face or ModelScope model directory path, place model configuration, Tokenizer and other files.
-- In parallel_config, only keep data_parallel and model_parallel.
-- In model_config, only keep compute_dtype, layernorm_compute_dtype, softmax_compute_dtype, rotary_dtype, params_dtype, and keep the precision consistent with the inference configuration.
-- In the parallel module, only keep parallel_mode and enable_alltoall, and modify the value of parallel_mode to "MANUAL_PARALLEL".
+- Modify the value of `run_mode` to `"predict"`.
+- Add the `pretrained_model_dir` parameter, set to the Hugging Face or ModelScope model directory path, to place model configuration, Tokenizer, and other files. If the trained weights are placed in this directory, `load_checkpoint` can be omitted in the YAML file.
+- In `parallel_config`, only keep `data_parallel` and `model_parallel`.
+- In `model_config`, only keep `compute_dtype`, `layernorm_compute_dtype`, `softmax_compute_dtype`, `rotary_dtype`, `params_dtype`, and keep the precision consistent with the inference configuration.
+- In the `parallel` module, only keep `parallel_mode` and `enable_alltoall`, and modify the value of `parallel_mode` to `"MANUAL_PARALLEL"`.
+
+> If the model's parameters were customized during training, or differ from the open-source configuration, you must modify the model configuration file config.json in the `pretrained_model_dir` directory when performing inference. You can also configure the modified parameters in `model_config`. When passing the modified parameters to the `model_config` file, the values ​​in the corresponding configuration file in config.json will be overwritten when the model is passed to the inference function.
+> </br>To verify that the passed configuration is correct, look for `The converted TransformerConfig is: ...` or `The converted MLATransformerConfig is: ...` in the logs.
 
 ### Inference Function Verification
 
-After the weights and configuration files are ready, use a single data input for inference to check whether the output content meets the expected logic. Refer to the [inference document](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/guide/inference.md) to start the inference task.
+After the weights and configuration files are ready, use a single data input for inference to check whether the output content meets the expected logic. Refer to the [inference document](../guide/inference.md) to start the inference task.
 
-For example:
+For example, taking Qwen3 single-card inference as an example, the command to start the inference task is:
 
 ```shell
 python run_mindformer.py \
@@ -540,4 +544,4 @@ If the output content appears garbled or does not meet expectations, you need to
 
 ### Evaluation using AISBench
 
-Refer to the AISBench evaluation section and use the AISBench tool for evaluation to verify model precision.
\ No newline at end of file
+Refer to the [AISBench evaluation section](#aisbench-benchmarking) and use the AISBench tool for evaluation to verify model precision.
\ No newline at end of file
diff --git a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md
index eda51771a8..07a94af031 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md
@@ -478,7 +478,7 @@ python toolkit/safetensors/unified_safetensors.py \
 
     **注意**：如果策略文件夹下已存在 `merged_ckpt_strategy.ckpt` 且仍传入文件夹路径，脚本会首先删除旧的 `merged_ckpt_strategy.ckpt`，再合并生成新的 `merged_ckpt_strategy.ckpt` 以用于权重转换。因此，请确保该文件夹具有足够的写入权限，否则操作将报错。
 - **mindspore_ckpt_dir**：分布式权重路径，请填写源权重所在文件夹的路径，源权重应按 `model_dir/rank_x/xxx.safetensors` 格式存放，并将文件夹路径填写为 `model_dir`。
-- **output_dir**：目标权重的保存路径，默认值为 `"/new_llm_data/******/ckpt/nbg3_31b/tmp"`，即目标权重将放置在 `/new_llm_data/******/ckpt/nbg3_31b/tmp` 目录下。
+- **output_dir**：目标权重的保存路径，默认值为 `"/path/output_dir"`，如若未配置该参数，目标权重将默认放置在 `/path/output_dir` 目录下。
 - **file_suffix**：目标权重文件的命名后缀，默认值为 `"1_1"`，即目标权重将按照 `*1_1.safetensors` 格式查找匹配的权重文件进行合并。
 - **has_redundancy**：合并的源权重是否是冗余的权重，默认为 `True`，表示用于合并的原始权重有冗余；若原始权重保存时为去冗余权重，则需设置为 `False`。
 - **filter_out_param_prefix**：合并权重时可自定义过滤掉部分参数，过滤规则以前缀名匹配。如优化器参数 `"adam_"`。
diff --git a/docs/mindformers/docs/source_zh_cn/guide/evaluation.md b/docs/mindformers/docs/source_zh_cn/guide/evaluation.md
index 35559e0e2b..1039e2bd00 100644
--- a/docs/mindformers/docs/source_zh_cn/guide/evaluation.md
+++ b/docs/mindformers/docs/source_zh_cn/guide/evaluation.md
@@ -471,54 +471,58 @@ Harness评测支持单机单卡、单机多卡、多机多卡场景，每种场
 
 ### 分布式权重合并
 
-训练后产生的权重如果是分布式的，需要先将已有的分布式权重合并成完整权重后，再通过在线切分的方式进行权重加载完成推理任务。使用MindSpore Transformers提供的[safetensors权重合并脚本](https://gitee.com/mindspore/mindformers/blob/master/toolkit/safetensors/unified_safetensors.py)，合并后的权重格式为完整权重。
+训练后产生的权重如果是分布式的，需要先将已有的分布式权重合并成完整权重后，再通过在线切分的方式进行权重加载完成推理任务。
 
-可以按照以下方式填写参数：
+MindSpore Transformers 提供了一份 [safetensors 权重合并脚本](https://gitee.com/mindspore/mindformers/blob/master/toolkit/safetensors/unified_safetensors.py)，使用该脚本，可以将分布式训练得到的多个 safetensors 权重进行合并，得到完整权重。
+
+合并指令参考如下（对第 1000 步训练权重进行去 adam 优化器参数合并，且训练权重在保存时开启了去冗余功能）：
 
 ```shell
 python toolkit/safetensors/unified_safetensors.py \
-  --src_strategy_dirs src_strategy_path_or_dir \
-  --mindspore_ckpt_dir mindspore_ckpt_dir\
-  --output_dir output_dir \
-  --file_suffix "1_1" \
-  --filter_out_param_prefix "adam_"
+  --src_strategy_dirs output/strategy \
+  --mindspore_ckpt_dir output/checkpoint \
+  --output_dir /path/to/unified_train_ckpt \
+  --file_suffix "1000_1" \
+  --filter_out_param_prefix "adam_" \
+  --has_redundancy False
 ```
 
 脚本参数说明：
 
-- src_strategy_dirs：源权重对应的分布式策略文件路径，通常在启动训练任务后默认保存在 output/strategy/ 目录下。分布式权重需根据以下情况填写：
-
-   1. 源权重开启了流水线并行：权重转换基于合并的策略文件，填写分布式策略文件夹路径。脚本会自动将文件夹内的所有 ckpt_strategy_rank_x.ckpt 文件合并，并在文件夹下生成 merged_ckpt_strategy.ckpt。如果已经存在 merged_ckpt_strategy.ckpt，可以直接填写该文件的路径。
-   2. 源权重未开启流水线并行：权重转换可基于任一策略文件，填写任意一个 ckpt_strategy_rank_x.ckpt 文件的路径即可。
+- **src_strategy_dirs**：源权重对应的分布式策略文件路径，通常在启动训练任务后默认保存在 `output/strategy/` 目录下。分布式权重需根据以下情况填写：
 
-   注意：如果策略文件夹下已存在 merged_ckpt_strategy.ckpt 且仍传入文件夹路径，脚本会首先删除旧的 merged_ckpt_strategy.ckpt，再合并生成新的 merged_ckpt_strategy.ckpt 以用于权重转换。因此，请确保该文件夹具有足够的写入权限，否则操作将报错。
+    - **源权重开启了流水线并行**：权重转换基于合并的策略文件，填写分布式策略文件夹路径。脚本会自动将文件夹内的所有 `ckpt_strategy_rank_x.ckpt` 文件合并，并在文件夹下生成 `merged_ckpt_strategy.ckpt`。如果已经存在 `merged_ckpt_strategy.ckpt`，可以直接填写该文件的路径。
+    - **源权重未开启流水线并行**：权重转换可基于任一策略文件，填写任意一个 `ckpt_strategy_rank_x.ckpt` 文件的路径即可。
 
-- mindspore_ckpt_dir：分布式权重路径，请填写源权重所在文件夹的路径，源权重应按 model_dir/rank_x/xxx.safetensors 格式存放，并将文件夹路径填写为 model_dir。
-- output_dir：目标权重的保存路径，默认值为 `/new_llm_data/******/ckpt/nbg3_31b/tmp`，即目标权重将放置在 `/new_llm_data/******/ckpt/nbg3_31b/tmp` 目录下。
-- file_suffix：目标权重文件的命名后缀，默认值为 "1_1"，即目标权重将按照 *1_1.safetensors 格式查找。
-- has_redundancy：合并的源权重是否是冗余的权重，默认为 True。
-- filter_out_param_prefix：合并权重时可自定义过滤掉部分参数，过滤规则以前缀名匹配。如优化器参数"adam_"。
-- max_process_num：合并最大进程数。默认值：64。
+    **注意**：如果策略文件夹下已存在 `merged_ckpt_strategy.ckpt` 且仍传入文件夹路径，脚本会首先删除旧的 `merged_ckpt_strategy.ckpt`，再合并生成新的 `merged_ckpt_strategy.ckpt` 以用于权重转换。因此，请确保该文件夹具有足够的写入权限，否则操作将报错。
+- **mindspore_ckpt_dir**：分布式权重路径，请填写源权重所在文件夹的路径，源权重应按 `model_dir/rank_x/xxx.safetensors` 格式存放，并将文件夹路径填写为 `model_dir`。
+- **output_dir**：目标权重的保存路径，默认值为 `"/path/output_dir"`，如若未配置该参数，目标权重将默认放置在 `/path/output_dir` 目录下。
+- **file_suffix**：目标权重文件的命名后缀，默认值为 `"1_1"`，即目标权重将按照 `*1_1.safetensors` 格式查找匹配的权重文件进行合并。
+- **filter_out_param_prefix**：合并权重时可自定义过滤掉部分参数，过滤规则以前缀名匹配。如优化器参数 `"adam_"`。
+- **has_redundancy**：合并的源权重是否是冗余的权重，默认为 `True`，表示用于合并的原始权重有冗余；若原始权重保存时为去冗余权重，则需设置为 `False`。
 
 ### 推理配置开发
 
 在完成权重文件的合并后，需依据训练配置文件开发对应的推理配置文件。
 
-以Qwen3为例，基于[Qwen3推理配置](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3/predict_qwen3.yaml)修改[Qwen3训练配置](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3/finetune_qwen3.yaml)：
+以 Qwen3 为例，基于 [Qwen3 推理配置](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3/predict_qwen3.yaml)修改 [Qwen3 训练配置](https://gitee.com/mindspore/mindformers/blob/master/configs/qwen3/finetune_qwen3.yaml)：
+
+Qwen3 训练配置主要修改点包括：
 
-Qwen3训练配置主要修改点包括：
+- `run_mode` 的值修改为 `"predict"`。
+- 添加 `pretrained_model_dir` 参数，配置为 Hugging Face 或 ModelScope 的模型目录路径，放置模型配置、Tokenizer 等文件。如果将训练得到的完整权重放置在此目录底下，则 yaml 中可以不配置 `load_checkpoint`。
+- `parallel_config` 只保留 `data_parallel` 和 `model_parallel`。
+- `model_config` 中只保留 `compute_dtype`、`layernorm_compute_dtype`、`softmax_compute_dtype`、`rotary_dtype`、`params_dtype`，和推理配置保持精度一致。
+- `parallel` 模块中，只保留 `parallel_mode` 和 `enable_alltoall`，`parallel_mode` 的值修改为 `"MANUAL_PARALLEL"`。
 
-- run_mode的值修改为"predict"。
-- 添加pretrained_model_dir：Hugging Face或ModelScope的模型目录路径，放置模型配置、Tokenizer等文件。
-- parallel_config只保留data_parallel和model_parallel。
-- model_config中只保留compute_dtype、layernorm_compute_dtype、softmax_compute_dtype、rotary_dtype、params_dtype，和推理配置保持精度一致。
-- parallel模块中，只保留parallel_mode和enable_alltoall，parallel_mode的值修改为"MANUAL_PARALLEL"。
+> 如果模型的参数量在训练时进行了自定义，或与开源配置不同，进行推理时需要同步修改 `pretrained_model_dir` 对应路径下的模型配置 config.json。也可以在 `model_config` 中配置对应修改后的参数，传入推理时，`model_config` 中的同名配置会覆盖 config.json 中对应配置的值。
+> </br>如需检查传入的配置项是否正确，可以通过查找日志中的 `The converted TransformerConfig is: ...` 或 `The converted MLATransformerConfig is: ...` 内容，查找对应的配置项。
 
 ### 推理功能验证
 
-在权重和配置文件都准备好的情况下，使用单条数据输入进行推理，检查输出内容是否符合预期逻辑，参考[推理文档](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/guide/inference.md)，拉起推理任务。
+在权重和配置文件都准备好的情况下，使用单条数据输入进行推理，检查输出内容是否符合预期逻辑，参考[推理文档](../guide/inference.md)，拉起推理任务。
 
-例如：
+如，以 Qwen3 单卡推理为例，拉起推理任务的指令为：
 
 ```shell
 python run_mindformer.py \
@@ -542,6 +546,6 @@ python run_mindformer.py \
 
     若模型配置与权重加载均无误，但推理结果仍不符合预期，需进行精度比对分析，参考推理精度比对文档，逐层比对训练与推理的输出差异，排查潜在的数据预处理、计算精度或算子问题。
 
-### 使用AISBench进行评测
+### 使用 AISBench 进行评测
 
-参考AISBench评测章节，使用AISBench工具进行评测，验证模型精度。
\ No newline at end of file
+参考 [AISBench 评测章节](#aisbench评测)，使用 AISBench 工具进行评测，验证模型精度。
\ No newline at end of file
-- 
Gitee