diff --git a/docs/mindformers/docs/source_en/feature/configuration.md b/docs/mindformers/docs/source_en/feature/configuration.md
index 1aacfe4cd386eb67493a60bd51fef6456cf8cc27..6ba5b648a30eec8c78de6846d493918ad01114ea 100644
--- a/docs/mindformers/docs/source_en/feature/configuration.md
+++ b/docs/mindformers/docs/source_en/feature/configuration.md
@@ -136,6 +136,7 @@ Because different model configurations may vary, here are some common model conf
 | model.model_config.softmax_compute_dtype                  | string          | Required  | 'float32'     | The dtype used to compute the softmax during attention computation.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
 | model.model_config.rotary_dtype                           | string          | Required  | 'float32'     | Computed dtype for custom rotated position embeddings.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | model.model_config.init_method_std                        | float           | Required  | 0.02          | The standard deviation of the zero-mean normal for the default initialization method, corresponding to `initializer_range` in HuggingFace. If `init_method` and `output_layer_init_method` are provided, this method is not used.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| model.model_config.param_init_std_rules                   | list[dict]      | Optional  | None          | Custom rules for parameter initialization standard deviation. Each rule contains `target` (regex pattern for parameter name) and `init_method_std` (std value, ≥0), for example: `[{"target": ".*weight", "init_method_std": 0.02}]`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
 | model.model_config.moe_grouped_gemm                       | bool            | Required  | False         | When there are multiple experts per level, compress multiple local (potentially small) GEMMs in a single kernel launch to leverage grouped GEMM capabilities for improved utilization and performance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 | model.model_config.num_moe_experts                        | int             | Optional  | None          | The number of experts to use for the MoE layer, corresponding to `n_routed_experts` in HuggingFace. When set, the MLP is replaced by the MoE layer. Setting this to None disables the MoE.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | model.model_config.num_experts_per_tok                    | int             | Required  | 2             | The number of experts to route each token to.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
diff --git a/docs/mindformers/docs/source_zh_cn/feature/configuration.md b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
index 30f0530152c171a1c03376f94b58e44ffe5f5a5e..f70a33629409a6b1aa40cdcd65307eb4354eebca 100644
--- a/docs/mindformers/docs/source_zh_cn/feature/configuration.md
+++ b/docs/mindformers/docs/source_zh_cn/feature/configuration.md
@@ -136,6 +136,7 @@ Context配置主要用于指定[mindspore.set_context](https://www.mindspore.cn/
 | model.model_config.softmax_compute_dtype                  | string                | 可选   | 'float32'  | 用于在注意力计算期间计算 softmax 的 dtype。可以配置为 `'float32'`、`'float16'`、`'bfloat16'`。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | model.model_config.rotary_dtype                           | string                | 可选   | 'float32'  | 自定义旋转位置嵌入的计算 dtype。可以配置为 `'float32'`、`'float16'`、`'bfloat16'`。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
 | model.model_config.init_method_std                        | float                 | 可选   | 0.02       | 默认初始化方法的零均值正态的标准偏差，对应 HuggingFace 中的 `initializer_range` 。如果提供了 `init_method` 和 `output_layer_init_method` ，则不使用此方法。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| model.model_config.param_init_std_rules                   | list[dict]            | 可选   | None       | 自定义参数初始化标准差规则列表。每条规则包含 `target` （参数名正则）和 `init_method_std` （std值，≥0）。示例：`[{"target": ".*weight", "init_method_std": 0.02}]`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | model.model_config.moe_grouped_gemm                       | bool                  | 可选   | False      | 当每个等级有多个专家时，在单次内核启动中压缩多个本地（可能很小）gemm，以利用分组 GEMM 功能来提高利用率和性能。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
 | model.model_config.num_moe_experts                        | int                   | 可选   | None       | 用于 MoE 层的专家数量，对应 HuggingFace 中的 `n_routed_experts` 。设置后，将用 MoE 层替换 MLP。设置为 None 则不使用 MoE。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
 | model.model_config.num_experts_per_tok                    | int                   | 可选   | 2          | 每个 token 路由到的专家数量。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |