From 75c74cfbb96de6b16c015df9fc9c1399d3c57aba Mon Sep 17 00:00:00 2001 From: SaiYao Date: Fri, 17 Oct 2025 10:28:20 +0800 Subject: [PATCH] =?UTF-8?q?=E3=80=90r2.7.1=E3=80=91=E3=80=90=E8=AE=AD?= =?UTF-8?q?=E8=BD=AC=E6=8E=A8=E3=80=91=E6=9B=B4=E6=96=B0=E8=AE=AD=E8=BD=AC?= =?UTF-8?q?=E6=8E=A8=E4=BD=BF=E7=94=A8=E6=96=87=E6=A1=A3?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../docs/source_en/feature/safetensors.md | 26 +++---- .../docs/source_en/guide/evaluation.md | 72 +++++++++-------- .../docs/source_zh_cn/feature/safetensors.md | 24 +++--- .../docs/source_zh_cn/guide/evaluation.md | 78 ++++++++++--------- 4 files changed, 104 insertions(+), 96 deletions(-) diff --git a/docs/mindformers/docs/source_en/feature/safetensors.md b/docs/mindformers/docs/source_en/feature/safetensors.md index 60f12faddb..9ad5aaf7ff 100644 --- a/docs/mindformers/docs/source_en/feature/safetensors.md +++ b/docs/mindformers/docs/source_en/feature/safetensors.md @@ -1,6 +1,6 @@ # Safetensors Weights -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.1/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_en/feature/safetensors.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r1.7.0/docs/mindformers/docs/source_en/feature/safetensors.md) ## Overview @@ -15,7 +15,7 @@ There are two main types of Safetensors files: complete weights files and distri Safetensors complete weights can be obtained in two ways: 1. Download directly from Huggingface. -2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/feature/ckpt.html#distributed-weight-slicing-and-merging). +2. After MindSpore Transformers distributed training, the weights are generated by [merge script](https://www.mindspore.cn/mindformers/docs/en/master/feature/ckpt.html#distributed-weight-slicing-and-merging). Huggingface Safetensors example catalog structure is as follows: @@ -47,7 +47,7 @@ qwen2_7b Safetensors distributed weights can be obtained in two ways: 1. Generated by distributed training with MindSpore Transformers. -2. Using [format conversion script](https://www.mindspore.cn/docs/en/r2.7.1/api_python/mindspore/mindspore.ckpt_to_safetensors.html), the original distributed ckpt weights are changed to the Safetensors format. +2. Using [format conversion script](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.ckpt_to_safetensors.html), the original distributed ckpt weights are changed to the Safetensors format. Distributed Safetensors example catalog structure is as follows: @@ -69,7 +69,7 @@ qwen2_7b In the training process of deep learning models, saving the model weights is a crucial step. The weight saving function allows us to store the model parameters at any stage of training, so that users can restore, continue training, evaluate or deploy after training is interrupted or completed. At the same time, by saving weights, experimental results can be reproduced in different environments. -Currently, MindSpore Transformers supports reading and saving weight files in the [safetensors](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/feature/safetensors.html) format. +Currently, MindSpore Transformers supports reading and saving weight files in the [safetensors](https://www.mindspore.cn/mindformers/docs/en/master/feature/safetensors.html) format. ### Directory Structure @@ -119,7 +119,7 @@ Users can control the weight saving behavior by modifying the configuration file Users can modify the fields under `CheckpointMonitor` in the `yaml` configuration file to control the weight saving behavior. -Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_zh_cn/example/deepseek3/pretrain_deepseek3_671b.yaml) as an example, the following configuration can be made: +Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/docs/blob/r1.7.0/docs/mindformers/docs/source_zh_cn/example/deepseek3/pretrain_deepseek3_671b.yaml) as an example, the following configuration can be made: ```yaml # callbacks @@ -152,7 +152,7 @@ The main parameters concerning the preservation of the weight configuration are | remove_redundancy | Whether redundancy is removed when saving model weights. | (bool, optional) - Default: `False` . | | save_network_params | Whether to additionally save only network parameters. | (bool, optional) - Whether to additionally save only network parameters. Default: `False` . | -If you want to know more about CheckpointMonitor, you can refer to [CheckpointMonitor API documentation](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/core/mindformers.core.CheckpointMonitor.html). +If you want to know more about CheckpointMonitor, you can refer to [CheckpointMonitor API documentation](https://www.mindspore.cn/mindformers/docs/en/master/core/mindformers.core.CheckpointMonitor.html). ## Weight Loading @@ -324,7 +324,7 @@ output **2. Merging Distributed Strategy** -Call the [strategy merge interface](https://www.mindspore.cn/docs/en/r2.7.1/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html) to merge all strategy files after centralization into one file for subsequent weight slicing. +Call the [strategy merge interface](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html) to merge all strategy files after centralization into one file for subsequent weight slicing. ```python import mindspore as ms @@ -478,7 +478,7 @@ python toolkit/safetensors/unified_safetensors.py \ **Note**: If `merged_ckpt_strategy.ckpt` already exists in the strategy folder and the folder path is still passed in, the script will first delete the old `merged_ckpt_strategy.ckpt` and merge it to create a new `merged_ckpt_strategy.ckpt` for weight conversion. Therefore, make sure that the folder has sufficient write permissions, otherwise the operation will report an error. - **mindspore_ckpt_dir**: Distributed weights path, please fill in the path of the folder where the source weights are located, the source weights should be stored in `model_dir/rank_x/xxx.safetensors` format, and fill in the folder path as `model_dir`. -- **output_dir**: The path where the target weights will be saved. The default value is `"/new_llm_data/******/ckpt/nbg3_31b/tmp"`, i.e., the target weights will be placed in the `/new_llm_data/******/ckpt/nbg3_31b/tmp` directory. +- **output_dir**: The path where the target weights will be saved. The default value is `"/path/output_dir"`. If this parameter is not configured, the target weights will be placed in the `/path/output_dir` directory by default. - **file_suffix**: The naming suffix of the target weights file. The default value is `"1_1"`, i.e. the target weights will be merged by searching for matching weight files in the `*1_1.safetensors` format. - **has_redundancy**: Whether the merged source weights are redundant weights. The default value is `True`, which means that the original weights used for merging are redundant. If the original weights are saved as de-redundant weights, it needs to be set to `False`. - **filter_out_param_prefix**: You can customize the parameters to be filtered out when merging weights, and the filtering rules are based on prefix name matching. For example, optimizer parameter `"adam_"`. @@ -516,7 +516,7 @@ python toolkit/safetensors/unified_safetensors.py \ #### Usage Directions -Use [strategy merging interface](https://www.mindspore.cn/docs/en/r2.7.1/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html) and [slicing saving interface](https://www.mindspore.cn/docs/en/r2.7.1/api_python/parallel/mindspore.parallel.load_distributed_checkpoint.html) provided by MindSpore. The safetensors weights are sliced and saved offline as follows. The format of the sliced weights is [distributed weights](#distributed-weights). +Use [strategy merging interface](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html) and [slicing saving interface](https://www.mindspore.cn/docs/en/master/api_python/parallel/mindspore.parallel.load_distributed_checkpoint.html) provided by MindSpore. The safetensors weights are sliced and saved offline as follows. The format of the sliced weights is [distributed weights](#distributed-weights). ```python import mindspore as ms @@ -556,7 +556,7 @@ MindSpore Transformers stock weights file is in ckpt format, which can be format #### Interface Calling -Call [Mindspore format conversion interface](https://www.mindspore.cn/docs/en/r2.7.1/api_python/mindspore/mindspore.ckpt_to_safetensors.html) to implement. +Call [Mindspore format conversion interface](https://www.mindspore.cn/docs/en/master/api_python/mindspore/mindspore.ckpt_to_safetensors.html) to implement. ```python import mindspore as ms @@ -626,7 +626,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder. -For more details, please refer to [Introduction to SFT fine-tuning](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/guide/supervised_fine_tuning.html) and [Introduction to Pre-training](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/guide/pre_training.html). +For more details, please refer to [Introduction to SFT fine-tuning](https://www.mindspore.cn/mindformers/docs/en/master/guide/supervised_fine_tuning.html) and [Introduction to Pre-training](https://www.mindspore.cn/mindformers/docs/en/master/guide/pre_training.html). ### Example of an Inference Task @@ -673,7 +673,7 @@ The results of executing the above single-card inference and multi-card inferenc 'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......] ``` -For more details, please refer to: [Introduction to Inference](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/guide/inference.html) +For more details, please refer to: [Introduction to Inference](https://www.mindspore.cn/mindformers/docs/en/master/guide/inference.html) ### Examples of Resumable Training after Breakpoint Tasks @@ -709,4 +709,4 @@ callbacks: checkpoint_format: safetensors # Save weights file format ``` -For more details, please refer to: [Introduction to Breakpoints](https://www.mindspore.cn/mindformers/docs/en/r1.7.0/feature/resume_training.html). +For more details, please refer to: [Introduction to Breakpoints](https://www.mindspore.cn/mindformers/docs/en/master/feature/resume_training.html). diff --git a/docs/mindformers/docs/source_en/guide/evaluation.md b/docs/mindformers/docs/source_en/guide/evaluation.md index f97ae8df65..c0378681f5 100644 --- a/docs/mindformers/docs/source_en/guide/evaluation.md +++ b/docs/mindformers/docs/source_en/guide/evaluation.md @@ -1,6 +1,6 @@ # Evaluation -[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.1/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_en/guide/evaluation.md) +[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r1.7.0/docs/mindformers/docs/source_en/guide/evaluation.md) ## Overview @@ -10,7 +10,7 @@ In previous versions, MindSpore Transformers adapted the Harness evaluation fram ## AISBench Benchmarking -For service-oriented evaluation of MindSpore Transformers, the AISBench Benchmark suite is recommended. AISBench Benchmark is a model evaluation tool built on OpenCompass, compatible with OpenCompass's configuration system, dataset structure, and model backend implementation, while extending support for service-oriented models. It supports 30+ open-source datasets: [Evaluation datasets supported by AISBench](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/datasets.md#%E5%BC%80%E6%BA%90%E6%95%B0%E6%8D%AE%E9%9B%86). +For service-oriented evaluation of MindSpore Transformers, the AISBench Benchmark suite is recommended. AISBench Benchmark is a model evaluation tool built on OpenCompass, compatible with OpenCompass's configuration system, dataset structure, and model backend implementation, while extending support for service-oriented models. It supports 30+ open-source datasets: [Evaluation datasets supported by AISBench](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/datasets.md#%E5%BC%80%E6%BA%90%E6%95%B0%E6%8D%AE%E9%9B%86). Currently, AISBench supports two major categories of inference task evaluation scenarios: @@ -45,7 +45,7 @@ pip3 install -e ./ --use-pep517 **Step 2 Dataset Download** -The official documentation provides download links for each dataset. Taking CEVAL as an example, you can find the download link in the [CEVAL documentation,](https://gitee.com/aisbench/benchmark/blob/master/ais_bench/benchmark/configs/datasets/ceval/README.md), and execute the following commands to download and extract the dataset to the specified path: +The official documentation provides download links for each dataset. Taking CEVAL as an example, you can find the download link in the [CEVAL documentation,](https://gitee.com/aisbench/benchmark/blob/r1.7.0/ais_bench/benchmark/configs/datasets/ceval/README.md), and execute the following commands to download and extract the dataset to the specified path: ```bash cd ais_bench/datasets @@ -113,7 +113,7 @@ Parameter Description: - `--models`: Specifies the model task interface, i.e., vllm_api_general, corresponding to the file name changed in the previous step. There is also vllm_api_general_chat - `--datasets`: Specifies the dataset task, i.e., the ceval_gen_4_shot_str dataset task, where 4_shot means the question will be input repeatedly four times, and str means non-chat output -For more parameter configuration descriptions, see [Configuration Description](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF). +For more parameter configuration descriptions, see [Configuration Description](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF). After the evaluation is completed, statistical results will be displayed on the screen. The specific execution results and logs will be saved in the outputs folder under the current path. In case of execution exceptions, problems can be located based on the logs. @@ -168,7 +168,7 @@ Parameter Description: - `--summarizer`: Specifies task statistical data - `--mode`: Specifies the task execution mode -For more parameter configuration descriptions, see [Configuration Description](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF). +For more parameter configuration descriptions, see [Configuration Description](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF). #### Evaluation Results Description @@ -188,7 +188,7 @@ After the evaluation is completed, performance evaluation results will be output - For more evaluation tasks, such as synthetic random dataset evaluation and performance stress testing, see the following documentation: [AISBench Official Documentation](https://gitee.com/aisbench/benchmark/tree/master/doc/users_guide). - For more tips on optimizing inference performance, see the following documentation: [Inference Performance Optimization](https://docs.qq.com/doc/DZGhMSWFCenpQZWJR). -- For more parameter descriptions, see the following documentation: [Performance Evaluation Results Description](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/performance_metric.md). +- For more parameter descriptions, see the following documentation: [Performance Evaluation Results Description](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/performance_metric.md). ### Appendix @@ -319,7 +319,7 @@ pip install -e . 1. Create a new directory with e.g. the name `model_dir` for storing the model yaml files. 2. Place the model inference yaml configuration file (predict_xxx_.yaml) in the directory created in the previous step. The directory location of the reasoning yaml configuration file for different models refers to [model library](../introduction/models.md). -3. Configure the yaml file. If the model class, model Config class, and model Tokenizer class in yaml use cheat code, that is, the code files are in [research](https://gitee.com/mindspore/mindformers/tree/r1.7.0/research) directory or other external directories, it is necessary to modify the yaml file: under the corresponding class `type` field, add the `auto_register` field in the format of `module.class`. (`module` is the file name of the script where the class is located, and `class` is the class name. If it already exists, there is no need to modify it.). +3. Configure the yaml file. If the model class, model Config class, and model Tokenizer class in yaml use cheat code, that is, the code files are in [research](https://gitee.com/mindspore/mindformers/tree/master/research) directory or other external directories, it is necessary to modify the yaml file: under the corresponding class `type` field, add the `auto_register` field in the format of `module.class`. (`module` is the file name of the script where the class is located, and `class` is the class name. If it already exists, there is no need to modify it.). Using [predict_llama3_1_8b. yaml](https://gitee.com/mindspore/mindformers/blob/r1.7.0/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml) configuration as an example, modify some of the configuration items as follows: @@ -352,7 +352,7 @@ The following table lists the parameters of the script of `run_harness.sh`: | Parameter | Type | Description | Required | |-------------------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------| -| `--register_path` | str | The absolute path of the directory where the cheat code is located. For example, the model directory under the [research](https://gitee.com/mindspore/mindformers/tree/r1.7.0/research) directory. | No(The cheat code is required) | +| `--register_path` | str | The absolute path of the directory where the cheat code is located. For example, the model directory under the [research](https://gitee.com/mindspore/mindformers/tree/master/research) directory. | No(The cheat code is required) | | `--model` | str | The value must be `mf`, indicating the MindSpore Transformers evaluation policy. | Yes | | `--model_args` | str | Model and evaluation parameters. For details, see MindSpore Transformers model parameters. | Yes | | `--tasks` | str | Dataset name. Multiple datasets can be specified and separated by commas (,). | Yes | @@ -467,34 +467,35 @@ After training, the model generally uses the trained model weights to run evalua ### Distributed Weight Merging -If the weights generated after training are distributed, the existing distributed weights need to be merged into complete weights first, and then the weights can be loaded through online slicing to complete the inference task. Using the [safetensors weight merging script](https://gitee.com/mindspore/mindformers/blob/r1.7.0/toolkit/safetensors/unified_safetensors.py) provided by MindSpore Transformers, the merged weights are in the format of complete weights. +If the weights generated after training are distributed, the existing distributed weights need to be merged into complete weights first, and then the weights can be loaded through online slicing to complete the inference task. -Parameters can be filled in as follows: +MindSpore Transformers provides a [safetensors weight merging script](https://gitee.com/mindspore/mindformers/blob/r1.7.0/toolkit/safetensors/unified_safetensors.py) that can be used to merge multiple safetensors weights obtained from distributed training to obtain the complete weights. + +The merging instruction is as follows (the Adam optimizer parameters are merged for the training weights in step 1000, and the redundancy removal function is enabled when saving the training weights): ```shell python toolkit/safetensors/unified_safetensors.py \ - --src_strategy_dirs src_strategy_path_or_dir \ - --mindspore_ckpt_dir mindspore_ckpt_dir\ - --output_dir output_dir \ - --file_suffix "1_1" \ - --filter_out_param_prefix "adam_" + --src_strategy_dirs output/strategy \ + --mindspore_ckpt_dir output/checkpoint \ + --output_dir /path/to/unified_train_ckpt \ + --file_suffix "1000_1" \ + --filter_out_param_prefix "adam_" \ + --has_redundancy False ``` Script parameter description: -- src_strategy_dirs: The path to the distributed strategy file corresponding to the source weight, usually saved in the output/strategy/ directory by default after starting the training task. Distributed weights need to be filled in according to the following situations: - - 1. Source weights enable pipeline parallelism: Weight conversion is based on the merged strategy file, fill in the path of the distributed strategy folder. The script will automatically merge all ckpt_strategy_rank_x.ckpt files in the folder and generate merged_ckpt_strategy.ckpt in the folder. If merged_ckpt_strategy.ckpt already exists, you can directly fill in the path of this file. - 2. Source weights do not enable pipeline parallelism: Weight conversion can be based on any strategy file, just fill in the path of any ckpt_strategy_rank_x.ckpt file. +- **src_strategy_dirs**: The path to the distributed strategy file corresponding to the source weights, usually saved by default in the `output/strategy/` directory after starting the training task. Distributed weights need to be filled in according to the following: - Note: If merged_ckpt_strategy.ckpt already exists in the strategy folder and the folder path is still passed in, the script will first delete the old merged_ckpt_strategy.ckpt and then merge to generate a new merged_ckpt_strategy.ckpt for weight conversion. Therefore, please ensure that the folder has sufficient write permissions, otherwise the operation will report an error. + - **Source weights turn on pipeline parallelism**: The weight conversion is based on the merged strategy files, fills in the path to the distributed strategies folder. The script will automatically merge all `ckpt_strategy_rank_x.ckpt` files in the folder and generate `merged_ckpt_strategy.ckpt` in the folder. If `merged_ckpt_strategy.ckpt` already exists, you can just fill in the path to that file. + - **Source weights turn off pipeline parallelism**: The weight conversion can be based on any of the strategy files, just fill in the path to any of the `ckpt_strategy_rank_x.ckpt` files. -- mindspore_ckpt_dir: Path to distributed weights, please fill in the path of the folder where the source weights are located. The source weights should be stored in the format model_dir/rank_x/xxx.safetensors, and fill in the folder path as model_dir. -- output_dir: Save path of target weights, the default value is `/new_llm_data/******/ckpt/nbg3_31b/tmp`, that is, the target weights will be placed in the `/new_llm_data/******/ckpt/nbg3_31b/tmp` directory. -- file_suffix: Naming suffix of target weight files, the default value is "1_1", that is, the target weights will be searched in the format *1_1.safetensors. -- has_redundancy: Whether the merged source weights are redundant weights, the default is True. -- filter_out_param_prefix: When merging weights, you can customize to filter out some parameters, and the filtering rules match by prefix name, such as optimizer parameters "adam_". -- max_process_num: Maximum number of processes for merging. Default value: 64. + **Note**: If `merged_ckpt_strategy.ckpt` already exists in the strategy folder and the folder path is still passed in, the script will first delete the old `merged_ckpt_strategy.ckpt` and merge it to create a new `merged_ckpt_strategy.ckpt` for weight conversion. Therefore, make sure that the folder has sufficient write permissions, otherwise the operation will report an error. +- **mindspore_ckpt_dir**: Distributed weights path, please fill in the path of the folder where the source weights are located, the source weights should be stored in `model_dir/rank_x/xxx.safetensors` format, and fill in the folder path as `model_dir`. +- **output_dir**: The path where the target weights will be saved. The default value is `"/path/output_dir"`. If this parameter is not configured, the target weights will be placed in the `/path/output_dir` directory by default. +- **file_suffix**: The naming suffix of the target weights file. The default value is `"1_1"`, i.e. the target weights will be merged by searching for matching weight files in the `*1_1.safetensors` format. +- **filter_out_param_prefix**: You can customize the parameters to be filtered out when merging weights, and the filtering rules are based on prefix name matching. For example, optimizer parameter `"adam_"`. +- **has_redundancy**: Whether the merged source weights are redundant weights. The default value is `True`, which means that the original weights used for merging are redundant. If the original weights are saved as de-redundant weights, it needs to be set to `False`. ### Inference Configuration Development @@ -504,17 +505,20 @@ Taking Qwen3 as an example, modify the [Qwen3 training configuration](https://gi Main modification points of Qwen3 training configuration include: -- Modify the value of run_mode to "predict". -- Add pretrained_model_dir: Hugging Face or ModelScope model directory path, place model configuration, Tokenizer and other files. -- In parallel_config, only keep data_parallel and model_parallel. -- In model_config, only keep compute_dtype, layernorm_compute_dtype, softmax_compute_dtype, rotary_dtype, params_dtype, and keep the precision consistent with the inference configuration. -- In the parallel module, only keep parallel_mode and enable_alltoall, and modify the value of parallel_mode to "MANUAL_PARALLEL". +- Modify the value of `run_mode` to `"predict"`. +- Add the `pretrained_model_dir` parameter, set to the Hugging Face or ModelScope model directory path, to place model configuration, Tokenizer, and other files. If the trained weights are placed in this directory, `load_checkpoint` can be omitted in the YAML file. +- In `parallel_config`, only keep `data_parallel` and `model_parallel`. +- In `model_config`, only keep `compute_dtype`, `layernorm_compute_dtype`, `softmax_compute_dtype`, `rotary_dtype`, `params_dtype`, and keep the precision consistent with the inference configuration. +- In the `parallel` module, only keep `parallel_mode` and `enable_alltoall`, and modify the value of `parallel_mode` to `"MANUAL_PARALLEL"`. + +> If the model's parameters were customized during training, or differ from the open-source configuration, you must modify the model configuration file config.json in the `pretrained_model_dir` directory when performing inference. You can also configure the modified parameters in `model_config`. When passing the modified parameters to the `model_config` file, the values ​​in the corresponding configuration file in config.json will be overwritten when the model is passed to the inference function. +>
To verify that the passed configuration is correct, look for `The converted TransformerConfig is: ...` or `The converted MLATransformerConfig is: ...` in the logs. ### Inference Function Verification -After the weights and configuration files are ready, use a single data input for inference to check whether the output content meets the expected logic. Refer to the [inference document](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_en/guide/inference.md) to start the inference task. +After the weights and configuration files are ready, use a single data input for inference to check whether the output content meets the expected logic. Refer to the [inference document](../guide/inference.md) to start the inference task. -For example: +For example, taking Qwen3 single-card inference as an example, the command to start the inference task is: ```shell python run_mindformer.py \ @@ -540,4 +544,4 @@ If the output content appears garbled or does not meet expectations, you need to ### Evaluation using AISBench -Refer to the AISBench evaluation section and use the AISBench tool for evaluation to verify model precision. \ No newline at end of file +Refer to the [AISBench evaluation section](#aisbench-benchmarking) and use the AISBench tool for evaluation to verify model precision. \ No newline at end of file diff --git a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md index 57ea94f791..64dfc18fca 100644 --- a/docs/mindformers/docs/source_zh_cn/feature/safetensors.md +++ b/docs/mindformers/docs/source_zh_cn/feature/safetensors.md @@ -1,6 +1,6 @@ # Safetensors权重 -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.1/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_zh_cn/feature/safetensors.md) +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/r1.7.0/docs/mindformers/docs/source_zh_cn/feature/safetensors.md) ## 概述 @@ -48,7 +48,7 @@ qwen2_7b Safetensors分布式权重可通过以下两种方式获取: 1. 通过MindSpore Transformers分布式训练生成。 -2. 通过[格式转换脚本](https://www.mindspore.cn/docs/zh-CN/r2.7.1/api_python/mindspore/mindspore.ckpt_to_safetensors.html),将原有分布式ckpt权重转换为Safetensors格式。 +2. 通过[格式转换脚本](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.ckpt_to_safetensors.html),将原有分布式ckpt权重转换为Safetensors格式。 分布式Safetensors示例目录结构: @@ -70,7 +70,7 @@ qwen2_7b 在深度学习模型的训练过程中,保存模型的权重是至关重要的一步。权重保存功能使得我们能够在训练的任意阶段存储模型的参数,以便用户在训练中断或完成后进行恢复、继续训练、评估或部署。同时,还可以通过保存权重的方式,在不同环境下复现实验结果。 -目前,MindSpore Transformers 支持 [safetensors](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.7.0/feature/safetensors.html) 格式的权重文件读取和保存。 +目前,MindSpore Transformers 支持 [safetensors](https://www.mindspore.cn/mindformers/docs/zh-CN/master/feature/safetensors.html) 格式的权重文件读取和保存。 ### 目录结构 @@ -120,7 +120,7 @@ output 用户可修改 `yaml` 配置文件中 `CheckpointMonitor` 下的字段来控制权重保存行为。 -以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_zh_cn/example/deepseek3/pretrain_deepseek3_671b.yaml) 为例,可做如下配置: +以 [DeepSeek-V3 预训练 yaml](https://gitee.com/mindspore/docs/blob/r1.7.0/docs/mindformers/docs/source_zh_cn/example/deepseek3/pretrain_deepseek3_671b.yaml) 为例,可做如下配置: ```yaml # callbacks @@ -152,7 +152,7 @@ callbacks: | remove_redundancy | 保存模型权重时是否去除冗余。 | (bool, 可选) - 默认值: `False` 。 | | save_network_params | 是否仅额外保存网络参数。 | (bool, 可选) - 是否仅额外保存网络参数。默认值: `False` 。 | -如果您想了解更多有关 CheckpointMonitor 的知识,可以参考 [CheckpointMonitor API 文档](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.7.0/core/mindformers.core.CheckpointMonitor.html)。 +如果您想了解更多有关 CheckpointMonitor 的知识,可以参考 [CheckpointMonitor API 文档](https://www.mindspore.cn/mindformers/docs/zh-CN/master/core/mindformers.core.CheckpointMonitor.html)。 ## 权重加载 @@ -324,7 +324,7 @@ output **2.合并分布式策略** -调用MindSpore提供的[策略合并接口](https://www.mindspore.cn/docs/zh-CN/r2.7.1/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html)将集中后的所有策略文件合并成一个文件,用于后续权重切分。 +调用MindSpore提供的[策略合并接口](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html)将集中后的所有策略文件合并成一个文件,用于后续权重切分。 ```python import mindspore as ms @@ -478,7 +478,7 @@ python toolkit/safetensors/unified_safetensors.py \ **注意**:如果策略文件夹下已存在 `merged_ckpt_strategy.ckpt` 且仍传入文件夹路径,脚本会首先删除旧的 `merged_ckpt_strategy.ckpt`,再合并生成新的 `merged_ckpt_strategy.ckpt` 以用于权重转换。因此,请确保该文件夹具有足够的写入权限,否则操作将报错。 - **mindspore_ckpt_dir**:分布式权重路径,请填写源权重所在文件夹的路径,源权重应按 `model_dir/rank_x/xxx.safetensors` 格式存放,并将文件夹路径填写为 `model_dir`。 -- **output_dir**:目标权重的保存路径,默认值为 `"/new_llm_data/******/ckpt/nbg3_31b/tmp"`,即目标权重将放置在 `/new_llm_data/******/ckpt/nbg3_31b/tmp` 目录下。 +- **output_dir**:目标权重的保存路径,默认值为 `"/path/output_dir"`,如若未配置该参数,目标权重将默认放置在 `/path/output_dir` 目录下。 - **file_suffix**:目标权重文件的命名后缀,默认值为 `"1_1"`,即目标权重将按照 `*1_1.safetensors` 格式查找匹配的权重文件进行合并。 - **has_redundancy**:合并的源权重是否是冗余的权重,默认为 `True`,表示用于合并的原始权重有冗余;若原始权重保存时为去冗余权重,则需设置为 `False`。 - **filter_out_param_prefix**:合并权重时可自定义过滤掉部分参数,过滤规则以前缀名匹配。如优化器参数 `"adam_"`。 @@ -516,7 +516,7 @@ python toolkit/safetensors/unified_safetensors.py \ #### 使用说明 -使用MindSpore提供的[策略合并接口](https://www.mindspore.cn/docs/zh-CN/r2.7.1/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html)和[切分保存接口](https://www.mindspore.cn/docs/zh-CN/r2.7.1/api_python/parallel/mindspore.parallel.load_distributed_checkpoint.html),按照如下方式进行safetensors权重离线切分保存。切分后的权重格式为[分布式权重](#分布式权重)。 +使用MindSpore提供的[策略合并接口](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.merge_pipeline_strategys.html)和[切分保存接口](https://www.mindspore.cn/docs/zh-CN/master/api_python/parallel/mindspore.parallel.load_distributed_checkpoint.html),按照如下方式进行safetensors权重离线切分保存。切分后的权重格式为[分布式权重](#分布式权重)。 ```python import mindspore as ms @@ -556,7 +556,7 @@ MindSpore Transformers存量权重文件为ckpt格式,可以通过以下两种 #### 接口调用 -直接调用[Mindspore格式转换接口](https://www.mindspore.cn/docs/zh-CN/r2.7.1/api_python/mindspore/mindspore.ckpt_to_safetensors.html)实现。 +直接调用[Mindspore格式转换接口](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore/mindspore.ckpt_to_safetensors.html)实现。 ```python import mindspore as ms @@ -626,7 +626,7 @@ bash scripts/msrun_launcher.sh "run_mindformer.py \ 任务执行完成后,在mindformers/output目录下,会生成checkpoint文件夹,同时模型文件会保存在该文件夹下。 -更多详情请参考:[SFT微调介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.7.0/guide/supervised_fine_tuning.html)、[预训练介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.7.0/guide/pre_training.html) +更多详情请参考:[SFT微调介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/master/guide/supervised_fine_tuning.html)、[预训练介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/master/guide/pre_training.html) ### 推理任务示例 @@ -673,7 +673,7 @@ bash scripts/msrun_launcher.sh "python run_mindformer.py \ 'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......] ``` -更多详情请参考:[推理介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.7.0/guide/inference.html) +更多详情请参考:[推理介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/master/guide/inference.html) ### 断点续训任务示例 @@ -709,4 +709,4 @@ callbacks: checkpoint_format: safetensors # 保存权重文件格式 ``` -更多详情请参考:[断点续训介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/r1.7.0/feature/resume_training.html)。 +更多详情请参考:[断点续训介绍](https://www.mindspore.cn/mindformers/docs/zh-CN/master/feature/resume_training.html)。 diff --git a/docs/mindformers/docs/source_zh_cn/guide/evaluation.md b/docs/mindformers/docs/source_zh_cn/guide/evaluation.md index 28019dfbf2..2509576f77 100644 --- a/docs/mindformers/docs/source_zh_cn/guide/evaluation.md +++ b/docs/mindformers/docs/source_zh_cn/guide/evaluation.md @@ -1,6 +1,6 @@ # 评测指南 -[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.1/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_zh_cn/guide/evaluation.md) +[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/r1.7.0/docs/mindformers/docs/source_zh_cn/guide/evaluation.md) ## 概览 @@ -12,7 +12,7 @@ MindSpore Transformers在之前版本,对于部分Legacy架构的模型,适 ## AISBench评测 -MindSpore Transformers的服务化评测推荐AISBench Benchmark套件。AISBench Benchmark是基于OpenCompass构建的模型评测工具,兼容OpenCompass的配置体系、数据集结构与模型后端实现,并在此基础上扩展了对服务化模型的支持能力。同时支持30+开源数据集:[AISBench支持的评测数据集](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/datasets.md#%E5%BC%80%E6%BA%90%E6%95%B0%E6%8D%AE%E9%9B%86)。 +MindSpore Transformers的服务化评测推荐AISBench Benchmark套件。AISBench Benchmark是基于OpenCompass构建的模型评测工具,兼容OpenCompass的配置体系、数据集结构与模型后端实现,并在此基础上扩展了对服务化模型的支持能力。同时支持30+开源数据集:[AISBench支持的评测数据集](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/datasets.md#%E5%BC%80%E6%BA%90%E6%95%B0%E6%8D%AE%E9%9B%86)。 当前,AISBench支持两大类推理任务的评测场景: @@ -47,7 +47,7 @@ pip3 install -e ./ --use-pep517 #### Step2 数据集下载 -官方文档提供各个数据集下载链接,以ceval为例可在[ceval文档](https://gitee.com/aisbench/benchmark/blob/master/ais_bench/benchmark/configs/datasets/ceval/README.md)中找到下载链接,执行以下命令下载解压数据集到指定路径: +官方文档提供各个数据集下载链接,以ceval为例可在[ceval文档](https://gitee.com/aisbench/benchmark/blob/r1.7.0/ais_bench/benchmark/configs/datasets/ceval/README.md)中找到下载链接,执行以下命令下载解压数据集到指定路径: ```bash cd ais_bench/datasets @@ -115,7 +115,7 @@ ais_bench --models vllm_api_general --datasets ceval_gen_5_shot_str --debug - `--models`:指定了模型任务接口,即vllm_api_general,对应上一步更改的文件名。此外还有vllm_api_general_chat。 - `--datasets`:指定了数据集任务,即ceval_gen_5_shot_str数据集任务,其中的5_shot指问题会重复四次输入,str是指非chat输出。 -其它更多的参数配置说明,见[配置说明](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF)。 +其它更多的参数配置说明,见[配置说明](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF)。 评测结束后统计结果会打屏,具体执行结果和日志都会保存在当前路径下的outputs文件夹下,执行异常情况下可以根据日志定位问题。 @@ -170,7 +170,7 @@ ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen_0_shot_cot_str_perf - `--summarizer`:指定了任务统计数据。 - `--mode`:指定了任务执行模式。 -其它更多的参数配置说明,见[配置说明](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF)。 +其它更多的参数配置说明,见[配置说明](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/models.md#%E6%9C%8D%E5%8A%A1%E5%8C%96%E6%8E%A8%E7%90%86%E5%90%8E%E7%AB%AF)。 #### 评测结果说明 @@ -190,7 +190,7 @@ ais_bench --models vllm_api_stream_chat --datasets gsm8k_gen_0_shot_cot_str_perf - 更多评测任务,如合成随机数据集评测、性能压测,可查看以下文档:[AISBench官方文档](https://gitee.com/aisbench/benchmark/tree/master/doc/users_guide)。 - 更多调优推理性能技巧,可查看以下文档:[推理性能调优](https://docs.qq.com/doc/DZGhMSWFCenpQZWJR)。 -- 更多参数说明请看以下文档:[性能测评结果说明](https://gitee.com/aisbench/benchmark/blob/master/doc/users_guide/performance_metric.md)。 +- 更多参数说明请看以下文档:[性能测评结果说明](https://gitee.com/aisbench/benchmark/blob/r1.7.0/doc/users_guide/performance_metric.md)。 ### 附录 @@ -323,7 +323,7 @@ pip install -e . 1. 创建一个新目录,例如名称为`model_dir`,用于存储模型yaml文件。 2. 在上个步骤创建的目录中,放置模型推理yaml配置文件(predict_xxx_.yaml)。不同模型的推理yaml配置文件所在目录位置,请参考[模型库](../introduction/models.md)。 -3. 配置yaml文件。如果yaml中模型类、模型Config类、模型Tokenizer类使用了外挂代码,即代码文件在[research](https://gitee.com/mindspore/mindformers/tree/r1.7.0/research)目录或其他外部目录下,需要修改yaml文件:在相应类的`type`字段下,添加`auto_register`字段,格式为“module.class”(其中“module”为类所在脚本的文件名,“class”为类名。如果已存在,则不需要修改)。 +3. 配置yaml文件。如果yaml中模型类、模型Config类、模型Tokenizer类使用了外挂代码,即代码文件在[research](https://gitee.com/mindspore/mindformers/tree/master/research)目录或其他外部目录下,需要修改yaml文件:在相应类的`type`字段下,添加`auto_register`字段,格式为“module.class”(其中“module”为类所在脚本的文件名,“class”为类名。如果已存在,则不需要修改)。 以[predict_llama3_1_8b.yaml](https://gitee.com/mindspore/mindformers/blob/r1.7.0/research/llama3_1/llama3_1_8b/predict_llama3_1_8b.yaml)配置为例,对其中的部分配置项进行如下修改: @@ -356,7 +356,7 @@ run_harness.sh脚本参数配置如下表: | 参数 | 类型 | 参数介绍 | 是否必须 | |-------------------|-----|--------------------------------------------------------------------------------------------------|-----------| -| `--register_path` | str | 外挂代码所在目录的绝对路径。比如[research](https://gitee.com/mindspore/mindformers/tree/r1.7.0/research)目录下的模型目录 | 否(外挂代码必填) | +| `--register_path` | str | 外挂代码所在目录的绝对路径。比如[research](https://gitee.com/mindspore/mindformers/tree/master/research)目录下的模型目录 | 否(外挂代码必填) | | `--model` | str | 需设置为 `mf` ,对应为MindSpore Transformers评估策略 | 是 | | `--model_args` | str | 模型及评估相关参数,见下方模型参数介绍 | 是 | | `--tasks` | str | 数据集名称。可传入多个数据集,使用逗号(,)分隔 | 是 | @@ -471,54 +471,58 @@ Harness评测支持单机单卡、单机多卡、多机多卡场景,每种场 ### 分布式权重合并 -训练后产生的权重如果是分布式的,需要先将已有的分布式权重合并成完整权重后,再通过在线切分的方式进行权重加载完成推理任务。使用MindSpore Transformers提供的[safetensors权重合并脚本](https://gitee.com/mindspore/mindformers/blob/r1.7.0/toolkit/safetensors/unified_safetensors.py),合并后的权重格式为完整权重。 +训练后产生的权重如果是分布式的,需要先将已有的分布式权重合并成完整权重后,再通过在线切分的方式进行权重加载完成推理任务。 -可以按照以下方式填写参数: +MindSpore Transformers 提供了一份 [safetensors 权重合并脚本](https://gitee.com/mindspore/mindformers/blob/r1.7.0/toolkit/safetensors/unified_safetensors.py),使用该脚本,可以将分布式训练得到的多个 safetensors 权重进行合并,得到完整权重。 + +合并指令参考如下(对第 1000 步训练权重进行去 adam 优化器参数合并,且训练权重在保存时开启了去冗余功能): ```shell python toolkit/safetensors/unified_safetensors.py \ - --src_strategy_dirs src_strategy_path_or_dir \ - --mindspore_ckpt_dir mindspore_ckpt_dir\ - --output_dir output_dir \ - --file_suffix "1_1" \ - --filter_out_param_prefix "adam_" + --src_strategy_dirs output/strategy \ + --mindspore_ckpt_dir output/checkpoint \ + --output_dir /path/to/unified_train_ckpt \ + --file_suffix "1000_1" \ + --filter_out_param_prefix "adam_" \ + --has_redundancy False ``` 脚本参数说明: -- src_strategy_dirs:源权重对应的分布式策略文件路径,通常在启动训练任务后默认保存在 output/strategy/ 目录下。分布式权重需根据以下情况填写: - - 1. 源权重开启了流水线并行:权重转换基于合并的策略文件,填写分布式策略文件夹路径。脚本会自动将文件夹内的所有 ckpt_strategy_rank_x.ckpt 文件合并,并在文件夹下生成 merged_ckpt_strategy.ckpt。如果已经存在 merged_ckpt_strategy.ckpt,可以直接填写该文件的路径。 - 2. 源权重未开启流水线并行:权重转换可基于任一策略文件,填写任意一个 ckpt_strategy_rank_x.ckpt 文件的路径即可。 +- **src_strategy_dirs**:源权重对应的分布式策略文件路径,通常在启动训练任务后默认保存在 `output/strategy/` 目录下。分布式权重需根据以下情况填写: - 注意:如果策略文件夹下已存在 merged_ckpt_strategy.ckpt 且仍传入文件夹路径,脚本会首先删除旧的 merged_ckpt_strategy.ckpt,再合并生成新的 merged_ckpt_strategy.ckpt 以用于权重转换。因此,请确保该文件夹具有足够的写入权限,否则操作将报错。 + - **源权重开启了流水线并行**:权重转换基于合并的策略文件,填写分布式策略文件夹路径。脚本会自动将文件夹内的所有 `ckpt_strategy_rank_x.ckpt` 文件合并,并在文件夹下生成 `merged_ckpt_strategy.ckpt`。如果已经存在 `merged_ckpt_strategy.ckpt`,可以直接填写该文件的路径。 + - **源权重未开启流水线并行**:权重转换可基于任一策略文件,填写任意一个 `ckpt_strategy_rank_x.ckpt` 文件的路径即可。 -- mindspore_ckpt_dir:分布式权重路径,请填写源权重所在文件夹的路径,源权重应按 model_dir/rank_x/xxx.safetensors 格式存放,并将文件夹路径填写为 model_dir。 -- output_dir:目标权重的保存路径,默认值为 `/new_llm_data/******/ckpt/nbg3_31b/tmp`,即目标权重将放置在 `/new_llm_data/******/ckpt/nbg3_31b/tmp` 目录下。 -- file_suffix:目标权重文件的命名后缀,默认值为 "1_1",即目标权重将按照 *1_1.safetensors 格式查找。 -- has_redundancy:合并的源权重是否是冗余的权重,默认为 True。 -- filter_out_param_prefix:合并权重时可自定义过滤掉部分参数,过滤规则以前缀名匹配。如优化器参数"adam_"。 -- max_process_num:合并最大进程数。默认值:64。 + **注意**:如果策略文件夹下已存在 `merged_ckpt_strategy.ckpt` 且仍传入文件夹路径,脚本会首先删除旧的 `merged_ckpt_strategy.ckpt`,再合并生成新的 `merged_ckpt_strategy.ckpt` 以用于权重转换。因此,请确保该文件夹具有足够的写入权限,否则操作将报错。 +- **mindspore_ckpt_dir**:分布式权重路径,请填写源权重所在文件夹的路径,源权重应按 `model_dir/rank_x/xxx.safetensors` 格式存放,并将文件夹路径填写为 `model_dir`。 +- **output_dir**:目标权重的保存路径,默认值为 `"/path/output_dir"`,如若未配置该参数,目标权重将默认放置在 `/path/output_dir` 目录下。 +- **file_suffix**:目标权重文件的命名后缀,默认值为 `"1_1"`,即目标权重将按照 `*1_1.safetensors` 格式查找匹配的权重文件进行合并。 +- **filter_out_param_prefix**:合并权重时可自定义过滤掉部分参数,过滤规则以前缀名匹配。如优化器参数 `"adam_"`。 +- **has_redundancy**:合并的源权重是否是冗余的权重,默认为 `True`,表示用于合并的原始权重有冗余;若原始权重保存时为去冗余权重,则需设置为 `False`。 ### 推理配置开发 在完成权重文件的合并后,需依据训练配置文件开发对应的推理配置文件。 -以Qwen3为例,基于[Qwen3推理配置](https://gitee.com/mindspore/mindformers/blob/r1.7.0/configs/qwen3/predict_qwen3.yaml)修改[Qwen3训练配置](https://gitee.com/mindspore/mindformers/blob/r1.7.0/configs/qwen3/finetune_qwen3.yaml): +以 Qwen3 为例,基于 [Qwen3 推理配置](https://gitee.com/mindspore/mindformers/blob/r1.7.0/configs/qwen3/predict_qwen3.yaml)修改 [Qwen3 训练配置](https://gitee.com/mindspore/mindformers/blob/r1.7.0/configs/qwen3/finetune_qwen3.yaml): + +Qwen3 训练配置主要修改点包括: -Qwen3训练配置主要修改点包括: +- `run_mode` 的值修改为 `"predict"`。 +- 添加 `pretrained_model_dir` 参数,配置为 Hugging Face 或 ModelScope 的模型目录路径,放置模型配置、Tokenizer 等文件。如果将训练得到的完整权重放置在此目录底下,则 yaml 中可以不配置 `load_checkpoint`。 +- `parallel_config` 只保留 `data_parallel` 和 `model_parallel`。 +- `model_config` 中只保留 `compute_dtype`、`layernorm_compute_dtype`、`softmax_compute_dtype`、`rotary_dtype`、`params_dtype`,和推理配置保持精度一致。 +- `parallel` 模块中,只保留 `parallel_mode` 和 `enable_alltoall`,`parallel_mode` 的值修改为 `"MANUAL_PARALLEL"`。 -- run_mode的值修改为"predict"。 -- 添加pretrained_model_dir:Hugging Face或ModelScope的模型目录路径,放置模型配置、Tokenizer等文件。 -- parallel_config只保留data_parallel和model_parallel。 -- model_config中只保留compute_dtype、layernorm_compute_dtype、softmax_compute_dtype、rotary_dtype、params_dtype,和推理配置保持精度一致。 -- parallel模块中,只保留parallel_mode和enable_alltoall,parallel_mode的值修改为"MANUAL_PARALLEL"。 +> 如果模型的参数量在训练时进行了自定义,或与开源配置不同,进行推理时需要同步修改 `pretrained_model_dir` 对应路径下的模型配置 config.json。也可以在 `model_config` 中配置对应修改后的参数,传入推理时,`model_config` 中的同名配置会覆盖 config.json 中对应配置的值。 +>
如需检查传入的配置项是否正确,可以通过查找日志中的 `The converted TransformerConfig is: ...` 或 `The converted MLATransformerConfig is: ...` 内容,查找对应的配置项。 ### 推理功能验证 -在权重和配置文件都准备好的情况下,使用单条数据输入进行推理,检查输出内容是否符合预期逻辑,参考[推理文档](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/mindformers/docs/source_zh_cn/guide/inference.md),拉起推理任务。 +在权重和配置文件都准备好的情况下,使用单条数据输入进行推理,检查输出内容是否符合预期逻辑,参考[推理文档](../guide/inference.md),拉起推理任务。 -例如: +如,以 Qwen3 单卡推理为例,拉起推理任务的指令为: ```shell python run_mindformer.py \ @@ -542,6 +546,6 @@ python run_mindformer.py \ 若模型配置与权重加载均无误,但推理结果仍不符合预期,需进行精度比对分析,参考推理精度比对文档,逐层比对训练与推理的输出差异,排查潜在的数据预处理、计算精度或算子问题。 -### 使用AISBench进行评测 +### 使用 AISBench 进行评测 -参考AISBench评测章节,使用AISBench工具进行评测,验证模型精度。 \ No newline at end of file +参考 [AISBench 评测章节](#aisbench评测),使用 AISBench 工具进行评测,验证模型精度。 \ No newline at end of file -- Gitee