diff --git a/tutorials/source_en/parallel/msrun_launcher.md b/tutorials/source_en/parallel/msrun_launcher.md
index dc305e8889d83e1ba0d8d9392e78cefa62a82715..e4ab661485dcde5e1fa05d1d72143297cd26f028 100644
--- a/tutorials/source_en/parallel/msrun_launcher.md
+++ b/tutorials/source_en/parallel/msrun_launcher.md
@@ -80,7 +80,7 @@ A parameters list of command line:
Enable processes binding CPU cores. |
Bool/Dict |
True/False or a device-to-CPU-range dict. Default: False. |
- If set to True, msrun will automatically allocates CPU ranges based on device affinity; when manually passing a dict, e.g., {"device0":["0-10"],"device1":["11-20"]}, it assigns CPU range 0-10 to process 0 (device0) and 11-20 to process 1 (device1). |
+ If set to True, msrun will automatically allocate CPU ranges based on device affinity; if a dictionary is manually passed, CPU binding will be performed according to the CPU ranges allocated in the dictionary. For specific configurations, please refer to the **Process-Level CPU Binding** section. |
| --sim_level |
@@ -494,4 +494,69 @@ msrun --worker_num=8 --local_worker_num=8 --master_port=8118 --log_dir=msrun_log
- `p` (print): Prints the value of a variable. For example, `p variable` displays the current value of the variable `variable`.
- `l` (list): Display the context of the current code.
- `b` (break): Set a breakpoint, either by specifying a line number or a function name.
-- `h` (help): Display a help message listing all available commands.
\ No newline at end of file
+- `h` (help): Display a help message listing all available commands.
+
+## Process-Level Core Binding
+
+`msrun` supports setting the CPU affinity of a process at startup through the `--bind_core` parameter. The core implementation involves `msrun` internally calling the `taskset -c CPUA-CPUB python XXX.py` command to bind the process to CPU cores in the range from `CPUA` to `CPUB` while starting the Python file. Process-level core binding supports automatically obtaining the core binding strategy based on current environment information and also allows users to customize the core binding strategy.
+
+### 1. Automatic Core Binding (`--bind_core=True`)
+
+- **Function**: Automatically allocate CPU core ranges based on current environment information (CPU resources, NUMA nodes, device affinity) without manually specifying specific core numbers.
+- **Automated allocation logic**:
+
+ - Priority is given to using CPU cores within the affinity pool; if there are insufficient CPU cores in the affinity pool, CPU cores outside the affinity pool will be used.
+ - The automatic core binding function relies on system commands (such as `lscpu`, `npu-smi`) to obtain hardware information; if the command execution fails, the allocation strategy will be generated only based on available CPU resources.
+ - The method for obtaining the affinity relationship between CPUs and NPUs is consistent with the MindSpore interface `mindspore.runtime.set_cpu_affinity`, which can be referred to [mindspore.runtime.set_cpu_affinity](https://www.mindspore.cn/docs/en/master/api_python/runtime/mindspore.runtime.set_cpu_affinity.html).
+
+### 2. Custom Core Binding
+
+- **Function**: Customize the core binding strategy based on user input parameters.
+- **Format Requirement**: Pass a dictionary in JSON format, which needs to be wrapped with `''` around `{}` in the shell environment.
+- **Parameter Description**:
+
+ - The `key` of the dictionary supports `scheduler` (scheduling process) or `deviceX` (device process, where `X` is the device number).
+ - The `value` of the dictionary is a list of CPU core range segments (e.g., `["0-9", "20-29"]`).
+
+- **Example Explanation**:
+
+ ```bash
+ --bind_core='{"scheduler":["0-9"], "device0":["10-19"], "device1":["20-29", "40-49"]}'
+ ```
+
+ - Allocate CPU cores 0-9 to the `scheduler` process.
+ - Allocate CPU cores 10-19 to the worker process 0 (corresponding to `device0`).
+ - Allocate CPU cores 20-29 and 40-49 to the worker process 1 (corresponding to `device1`).
+
+- **Notes**:
+
+ 1. The process number must match the device number. For example, if `ASCEND_RT_VISIBLE_DEVICES=6,7` is configured so that process 0 corresponds to `device6` and process 1 corresponds to `device7`, the `key` in the configuration must use `device6` and `device7` to ensure effective core binding:
+
+ ```bash
+ --bind_core='{"scheduler":["0-9"], "device6":["10-19"], "device7":["20-29", "40-49"]}'
+ ```
+
+ The scheduler process does not occupy device resources, so it does not participate in device sorting. The order of keys does not affect their effectiveness (for example, the order of `scheduler` and `device6` in the above example can be interchanged).
+ 2. If the list of CPU range segments is empty, the affinity setting for that process is skipped. For example:
+
+ ```bash
+ --bind_core='{"scheduler":[], "device0":[], "device1":["20-29", "40-49"]}'
+ ```
+
+ An empty list for `scheduler` or `device0` means core binding is not performed for those processes.
+ 3. It is recommended that the number of worker processes be consistent with the number of key-value pairs in `--bind_core`. For example, in a single-machine two-devices task, if only core binding for worker process 1 is required, all processes (including those not needing core binding) must be explicitly configured:
+
+ ```bash
+ # correct example
+ --bind_core='{"scheduler":[], "device0":[], "device1":["20-29", "40-49"]}'
+
+ # wrong example
+ --bind_core='{"device1":["20-29", "40-49"]}'
+ ```
+
+ In the wrong example, worker process 0 may be mistakenly identified as corresponding to `device1` and thus have core binding skipped. The `scheduler` and worker process 1 will also be skipped because they are not included in the configuration.
+
+### 3. Disabling Core Binding (`--bind_core=False`)
+
+- **Function**: Do not enable the process-level core binding function.
+- **Default Value**: The default value of the `msrun --bind_core` parameter is `False`.
diff --git a/tutorials/source_zh_cn/parallel/msrun_launcher.md b/tutorials/source_zh_cn/parallel/msrun_launcher.md
index 56a8e1fa7a169417307035d7883905e6e554dbc7..198f21d554ea5dba4c719daaf079b36f73ca5cc3 100644
--- a/tutorials/source_zh_cn/parallel/msrun_launcher.md
+++ b/tutorials/source_zh_cn/parallel/msrun_launcher.md
@@ -80,7 +80,7 @@
开启进程绑核。 |
Bool/Dict |
True、False或者给指定设备分配CPU范围段的字典。默认为False。 |
- 若设置为True,则会基于环境信息按照设备亲和去自动分配CPU范围段;若手动传入一个字典,如{"device0":["0-10"],"device1":["11-20"]},则会给0号进程(对应device0)分配CPU范围段0-10,给1号进程(对应device1)分配CPU范围段11-20。 |
+ 若设置为True,则会基于环境信息按照设备亲和去自动分配CPU范围段;若手动传入一个字典,则根据该字典分配的CPU范围段去绑核。具体配置可参考**进程级绑核**章节。。 |
| --sim_level |
@@ -495,3 +495,70 @@ msrun --worker_num=8 --local_worker_num=8 --master_port=8118 --log_dir=msrun_log
- `l` (list):显示当前代码的上下文。
- `b` (break):设置断点,可以指定行号或函数名。
- `h` (help):显示帮助信息,列出所有可用命令。
+
+## 进程级绑核
+
+`msrun` 支持通过 `--bind_core` 参数在进程启动时设置进程的 CPU 亲和性,其核心实现是在 `msrun` 内部调用 `taskset -c CPUA-CPUB python XXX.py` 命令,在启动 Python 文件的同时,为进程绑定 `CPUA` 到 `CPUB` 范围的 CPU 核。进程级绑核支持基于当前环境信息去自动获取绑核策略,也支持用户自定义绑核策略。
+
+### 1. 自动绑核(--bind_core=True)
+
+- **功能**:基于当前环境信息(CPU 资源、NUMA 节点、设备亲和性)自动分配 CPU 核范围,无需手动指定具体核编号。
+- **自动分配逻辑**:
+
+ - 优先使用亲和池内的 CPU 核;若亲和池内 CPU 核不足,则使用非亲和池内的 CPU 核。
+ - 自动绑核功能依赖系统命令(如 `lscpu`、`npu-smi`)获取硬件信息;若命令执行失败,将仅根据可用 CPU 资源生成分配策略。
+ - CPU 与 NPU 间亲和关系的获取方式,与 MindSpore 接口 `mindspore.runtime.set_cpu_affinity` 一致,可参考 [mindspore.runtime.set_cpu_affinity](https://www.mindspore.cn/docs/zh-CN/master/api_python/runtime/mindspore.runtime.set_cpu_affinity.html)。
+
+### 2. 自定义绑核
+
+- **功能**:依据用户传参,定制绑核策略。
+- **格式要求**:传入 JSON 格式的字典,在 shell 环境中需用 `''` 包裹 `{}`。
+- **参数说明**:
+
+ - 字典的 `key` 支持 `scheduler`(调度进程)或 `deviceX`(设备进程,`X` 为设备编号)。
+ - 字典的 `value` 为 CPU 核范围段列表(如 `["0-9", "20-29"]`)。
+
+- **示例**:
+
+ ```bash
+ --bind_core='{"scheduler":["0-9"], "device0":["10-19"], "device1":["20-29", "40-49"]}'
+ ```
+
+ 表示:
+
+ - 为`scheduler`进程分配 CPU 核 0-9;
+ - 为 0 号 worker 进程(对应`device0`)分配 CPU 核 10-19;
+ - 为 1 号 worker 进程(对应`device1`)分配 CPU 核 20-29 和 40-49。
+
+- **注意事项**:
+
+ 1. 进程编号需与设备编号匹配。例如,若通过`ASCEND_RT_VISIBLE_DEVICES=6,7`配置,使 0 号进程对应`device6`、1 号进程对应`device7`,则需按如下方式配置,否则无法为对应进程绑核:
+
+ ```bash
+ --bind_core='{"scheduler":["0-9"], "device6":["10-19"], "device7":["20-29", "40-49"]}'
+ ```
+
+ scheduler 进程不占用设备资源,因此不参与设备排序,键的顺序不影响生效(如上述示例中`scheduler`与`device6`顺序可互换)。
+ 2. 若 CPU 范围段列表为空,则跳过对该进程的亲和性设置。例如:
+
+ ```bash
+ --bind_core='{"scheduler":[], "device0":[], "device1":["20-29", "40-49"]}'
+ ```
+
+ 表示:跳过`scheduler`进程和 0 号 worker 进程的绑核,仅为 1 号 worker 进程(`device1`)分配 CPU 核。
+ 3. 建议 worker 进程数量与`--bind_core`字典的键值对数量一致。例如,单机两卡任务中,若仅需为 1 号 worker 进程绑核,需显式配置所有进程(包括不绑核的进程):
+
+ ```bash
+ # 正确示例
+ --bind_core='{"scheduler":[], "device0":[], "device1":["20-29", "40-49"]}'
+
+ # 错误示例
+ --bind_core='{"device1":["20-29", "40-49"]}'
+ ```
+
+ 错误示例中,0 号 worker 进程可能被误判为对应`device1`而跳过绑核,`scheduler`和 1 号 worker 进程因未在配置中也会被跳过。
+
+### 3. 关闭绑核(--bind_core=False)
+
+- **功能**:不启用进程级绑核功能。
+- **默认值**:`msrun --bind_core` 参数默认值为`False`。