{"release":{"tag":{"name":"v1.0.RC2","path":"/ascend/MindSpeed/tags/v1.0.RC2","tree_path":"/ascend/MindSpeed/tree/v1.0.RC2","message":"\r\n## Megatron-LM支持\r\n- 对接 [NVIDIA Megatron Core 0.6.0](https://github.com/NVIDIA/Megatron-LM/tree/core_v0.6.0)\r\n- 支持 Megatron-LM 原生并行策略，例如： EP + TP + DP + SP + PP\r\n- 支持 Megatron-LM 分布式优化器\r\n\r\n## 算法方案\r\n- MoE\r\n    - 支持 Megatron-LM token dropless及Top-K路由的MoE方案\r\n    - 支持 DeepSpeed Capacity及Top1/Top2专家路由的MoE方案\r\n\r\n## 并行策略\r\n- 长序列\r\n    - 支持负载均衡的RingAttention长序列并行\r\n    - 支持Ulysses长序列并行\r\n    - 支持Hybrid(RingAttention-Ulysses)混合长序列并行\r\n\r\n## 内存优化\r\n- BF16参数副本复用\r\n- Flash Attention中atten_mask压缩及归一\r\n- 激活函数重计算(Prototype)\r\n\r\n## 通信优化\r\n- Token-dropless-MoE AllgatherDispatcher/AlltoAllDispatcher通信掩盖\r\n- RingAttention长序列并行send/recv多流掩盖\r\n- 流水并行（PP）send/recv多流掩盖\r\n- MC2 TP通信计算融合(Prototype)( **仅支持CANN 8.0.RC2和Ascend HDK 24.1.RC2及以上版本** )\r\n\r\n## 计算优化\r\n- Capacity MoE方案token重排性能优化\r\n- 支持 rms_norm/swiglu/rotary_embedding算子融合\r\n- Token-dropless-MoE token重排/反重排昇腾亲和优化\r\n\r\n## 融合算子\r\n- 新增 npu_rotary_position_embedding融合算子\r\n- 新增 ffn 融合算子(Prototype)\r\n- 新增 fusion_attention 融合算子(Prototype)\r\n- 新增 npu_mm_all_reduce_add_rms_norm 融合算子(Prototype)\r\n- 新增 npu_mm_all_reduce_add_rms_norm_ 融合算子(Prototype)\r\n- 新增 npu_grouped_mat_mul 融合算子(Prototype)\r\n- 新增 npu_grouped_mat_mul_all_reduce 融合算子(Prototype)\r","commit":{"id":"6ef21aa20fea5d00f9c448d5adcc8cefd5995c2e","short_id":"6ef21aa","title":"!551 适配新版ROPE融合算子","title_markdown":"\u003Ca title=\"Pull Request: 适配新版ROPE融合算子\" class=\"gfm gfm-pull_request\" href=\"/ascend/MindSpeed/pulls/551\"\u003E!551\u003C/a\u003E适配新版ROPE融合算子","description":"Merge pull request !551 from 王智伟/1.1","description_markdown":"Merge pull request \u003Ca title=\"Pull Request: 适配新版ROPE融合算子\" class=\"gfm gfm-pull_request\" href=\"/ascend/MindSpeed/pulls/551\"\u003E!551\u003C/a\u003Efrom 王智伟/1.1","message":"!551 适配新版ROPE融合算子\nMerge pull request !551 from 王智伟/1.1\n","message_markdown":"\u003Ca title=\"Pull Request: 适配新版ROPE融合算子\" class=\"gfm gfm-pull_request\" href=\"/ascend/MindSpeed/pulls/551\"\u003E!551\u003C/a\u003E适配新版ROPE融合算子\nMerge pull request \u003Ca title=\"Pull Request: 适配新版ROPE融合算子\" class=\"gfm gfm-pull_request\" href=\"/ascend/MindSpeed/pulls/551\"\u003E!551\u003C/a\u003Efrom 王智伟/1.1","detail_path":"/ascend/MindSpeed/commit/6ef21aa20fea5d00f9c448d5adcc8cefd5995c2e","commits_path":"/ascend/MindSpeed/commits/6ef21aa20fea5d00f9c448d5adcc8cefd5995c2e","tree_path":"/ascend/MindSpeed/tree/6ef21aa20fea5d00f9c448d5adcc8cefd5995c2e","author":{"name":"王智伟","email":"wangzhiwei108@huawei.com","username":"wangzw1022","user_path":"/wangzw1022","enterprise_user_path":"/HUAWEI-ASCEND/dashboard/members/wangzw1022","image_path":"no_portrait.png#王智伟-wangzw1022","is_gitee_user":true,"is_enterprise_user":true,"widget_url":""},"committer":{"name":"i-robot","email":"huawei_ci_bot@163.com","username":"I-am-a-robot","user_path":"/I-am-a-robot","enterprise_user_path":"/HUAWEI-ASCEND/dashboard/members/I-am-a-robot","image_path":"no_portrait.png#i-robot-I-am-a-robot","is_gitee_user":true,"is_enterprise_user":true,"widget_url":"https://gitee.com/widgets/gitee_double_eleven.png"},"authored_date":"2024-07-16T13:17:03+00:00","committed_date":"2024-07-16T13:17:03+00:00","signature":null,"build_state":null},"archive_path":"/ascend/MindSpeed/repository/archive/v1.0.RC2","signature":null},"operating":{"edit":false,"download":true,"destroy":false,"enterprise_forbid_zip":false},"release":{"title":"v1.0.RC2","path":"/ascend/MindSpeed/releases/tag/v1.0.RC2","tag_path":"/ascend/MindSpeed/tree/v1.0.RC2","project_id":32558883,"created_at":"2024-08-22T19:44:16+08:00","is_prerelease":false,"description":"\r\n## Megatron-LM支持\r\n- 对接 [NVIDIA Megatron Core 0.6.0](https://github.com/NVIDIA/Megatron-LM/tree/core_v0.6.0)\r\n- 支持 Megatron-LM 原生并行策略，例如： EP + TP + DP + SP + PP\r\n- 支持 Megatron-LM 分布式优化器\r\n\r\n## 算法方案\r\n- MoE\r\n    - 支持 Megatron-LM token dropless及Top-K路由的MoE方案\r\n    - 支持 DeepSpeed Capacity及Top1/Top2专家路由的MoE方案\r\n\r\n## 并行策略\r\n- 长序列\r\n    - 支持负载均衡的RingAttention长序列并行\r\n    - 支持Ulysses长序列并行\r\n    - 支持Hybrid(RingAttention-Ulysses)混合长序列并行\r\n\r\n## 内存优化\r\n- BF16参数副本复用\r\n- Flash Attention中atten_mask压缩及归一\r\n- 激活函数重计算(Prototype)\r\n\r\n## 通信优化\r\n- Token-dropless-MoE AllgatherDispatcher/AlltoAllDispatcher通信掩盖\r\n- RingAttention长序列并行send/recv多流掩盖\r\n- 流水并行（PP）send/recv多流掩盖\r\n- MC2 TP通信计算融合(Prototype)( **仅支持CANN 8.0.RC2和Ascend HDK 24.1.RC2及以上版本** )\r\n\r\n## 计算优化\r\n- Capacity MoE方案token重排性能优化\r\n- 支持 rms_norm/swiglu/rotary_embedding算子融合\r\n- Token-dropless-MoE token重排/反重排昇腾亲和优化\r\n\r\n## 融合算子\r\n- 新增 npu_rotary_position_embedding融合算子\r\n- 新增 ffn 融合算子(Prototype)\r\n- 新增 fusion_attention 融合算子(Prototype)\r\n- 新增 npu_mm_all_reduce_add_rms_norm 融合算子(Prototype)\r\n- 新增 npu_mm_all_reduce_add_rms_norm_ 融合算子(Prototype)\r\n- 新增 npu_grouped_mat_mul 融合算子(Prototype)\r\n- 新增 npu_grouped_mat_mul_all_reduce 融合算子(Prototype)\r\n","author":{"name":"郭鹏","username":"gp513","path":"/gp513","avatar_url":"no_portrait.png#郭鹏-gp513"},"attach_files":[],"zip_download_url":"/ascend/MindSpeed/releases/tag/v1.0.RC2.zip","tar_download_url":"/ascend/MindSpeed/releases/tag/v1.0.RC2.tar.gz"}}}