32 Star 16 Fork 27

Ascend / ascend-for-volcano

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

NPU亲和性调度算法设计说明与开发指导.zh

Ascend-volcano-plugin介绍

基于开源Volcano调度的插件机制,增加昇腾处理器的亲和性调度,虚拟设备调度等特性,最大化发挥昇腾处理器计算性能。

亲和性策略说明

昇腾910 AI处理器亲和性规则

昇腾910 AI处理器是华为研发的一款高性能AI处理器。其内部的处理器之间采用HCCS方式连接。每台物理设备具备8颗处理器,两个HCCS。每个HCCS存在4颗处理器,同一HCCS内处理器可做数据交换,不同HCCS内处理器不能通信,即同一Pod分配的昇腾910 AI处理器(若小于等于4)必须在同一个HCCS环内,否则任务运行失败。昇腾910 AI处理器的互联拓扑图如图1所示。

图 1 Ascend 910 AI Processor interconnection topology

说明: 图中A0~A7为昇腾910 AI处理器。

亲和性策略说明

针对昇腾910 AI处理器的特征和资源利用的规则,制定昇腾910 AI处理器的亲和性策略如表1所示。

表 1 昇腾910 AI处理器亲和性策略说明

优先级

策略名称

详细内容

1

HCCS亲和性调度原则

  • 如果申请昇腾910 AI处理器个数为1,则选择同一HCCS,剩余可用的昇腾910 AI处理器数量为1个的最佳,其次是剩余3个的为次佳,然后是剩余2个,最后是剩余4个。
  • 如果申请昇腾910 AI处理器个数为2,则选择同一HCCS剩余可用的昇腾910 AI处理器数量2个的为最佳,其次是剩余4个,最后是剩余3个。
  • 如果申请昇腾910 AI处理器个数为4,则必须选择同一HCCS剩余可用的昇腾910 AI处理器数量为4个。
  • 如果申请昇腾910 AI处理器个数为8,则申请节点所有8个昇腾910 AI处理器

2

优先占满调度原则

优先分配已经分配过昇腾910 AI处理器的AI服务器,减少碎片。

  • 如果申请1个昇腾910 AI处理器,优先申请capacity为8,且HCCS剩余可用处理器数量为1的节点,然后是剩余可用数量为3个,2个,4个。
  • 如果申请2个昇腾910 AI处理器,优先申请capacity为8,且HCCS剩余可用处理器数量为2的节点,然后是剩余可用数量为4个,3个。
  • 如果申请4个昇腾910 AI处理器,优先申请capacity为8,且剩余可用处理器数量为4的节点。
  • 如果申请昇腾910 AI处理器为8的正整数倍数,申请capacity为8,且已使用0个处理器的节点。

3

剩余偶数优先原则

优先选择满足1~3条件的HCCS,然后选择剩余处理器数量为偶数的HCCS。

--

多节点支持原则

一次训练任务只支持8*N的方式分配。

资源申请约束

根据业务模型,对训练任务的要求如下:

  1. 当训练任务申请昇腾910 AI处理器数量不大于4个时,需要将所需的昇腾910 AI处理器调度到同一个HCCS内。
  2. 当训练任务申请的昇腾910 AI处理器数量为8个时,需要将节点的昇腾910 AI处理器分配给该任务。
  3. 当申请的NPU数量大于8个时,申请数量只能是8*N(N>=1)个。
  4. 当训练任务申请的昇腾910 AI处理器数量不大于8时,只能申请一个Pod。大于8则每个Pod为8个昇腾910 AI处理器。
  5. 当训练任务申请虚拟设备vNPU时,申请数量只能为1。
  6. 遵循Volcano开源部分的其他约束。

调度算法设计说明

场景分类

根据亲和性策略和业务模型设计梳理出场景如表1所示。

说明:

  • A~D列4个分组,表示处理器选取时,满足处理器选取的四种HCCS情况。优先级逐次递减,即当A中不满足时,才会选择B,C,D。
  • 当组内满足HCCS时节点的情况。‘~’左边为满足要求的HCCS,右边为另一个HCCS的处理器剩余情况。如对于1个处理器申请的A组情况:另一个HCCS可能为0、1、2、3、4五种处理器剩余情况。其代表的节点优先级也依次减小。
  • 8颗及其以上处理器适用于4颗及其以下的情况。且均放在A组,且需要全部占用。

表 1 亲和性策略场景列表

场景序号

任务申请芯片数

A(节点中处理器剩余数)

B

C

D

备注

1

1

1~[0、1、2、3、4]

3~[0、2、3、4]

2~[0、2、4]

4~[0、4]

然后选择capacity为7,坏的视为已使用,重复A~D

2

2

2~[0、1、2、3、4]

4~[0、1、3、4]

3~[0、1]

-

3

4

4~[0、1、2、3、4]

-

-

-

4

8

8

-

-

-

-

5

8*N

0(8个处理器全部被占用)

-

-

-

-

算法设计说明

图 1 Affinity algorithm design process

图中关键步骤说明如下:

  1. 获取task的昇腾910 AI处理器申请数量。

  2. 根据请求的昇腾910 AI处理器数量,按照资源申请约束选出最优的节点。

  3. 从选出的节点中,选择符合要求的昇腾910 AI处理器。

  4. 对选出的结果进行保存。

  5. 对选出的节点进行加权操作。

    说明: 1~5都是在Volcano提供的注册函数batchNodeOrderFn中实现。

  6. 对选出的节点进行资源分配管理。

    说明: 该步骤是在Volcano的AddEventHandler函数中实现。该函数包含了节点资源的预分配allocate函数。

  7. 完成以上的分配操作后,Volcano框架会将本轮分配结果提交给K8s的kubelet进行确认执行,本次分配结束。

多节点处理原则

该特性结合Volcano的集群设计,只需要利用框架在选取节点时对每个节点进行加权,取出最优者即可。

两阶段的选取:先选出4个优选数组,再进行组内选取,最终实现多节点的整体考虑。

并发处理原则

由于Volcano未在allocate阶段的add方法中提供函数回调方法,故对于处理器的选取在节点筛选阶段就进行了。这样做的影响是:在任务并发执行时,可能存在芯片重复分配的情况。

以下两种场景可能涉及重复分配:

  • 本session的不同任务间。当多个任务同时需要分配,且同一节点可以分给多个任务时。由于原生Volcano只是对数量进行分配,未对处理器编号进行分配。会造成处理器总数分配完成,出现某一处理器被分配多次的情况。

    本程序使用Volcano框架提供的AddEventHandler函数来解决。在函数的allocate方法中,实现对节点处理器分配情况的管理。从而避免了重复分配的情况。

  • 不同session之间。在本次session分配处理器时,由于在加权阶段就进行了分配,若此时资源处于等待释放状态,即暂时不能分配,就会出现本次分配失败。但Volcano在本次session不会感知。下次session时,该处理器变为可分配状态,会分配给其他任务。导致两个任务分配到同一个处理器,其中一个任务失败。

    解决该问题的方法之一:在加权阶段进行处理器分配时,判断资源是否处于待释放状态。若是,则本次不分配。

调度算法实现说明

程序流程设计说明

图 1 Affinity program process (Volcano part)

华为昇腾处理器的亲和性调度基于Volcano开源部分提供的的插件机制,实现了插件简化开发。过程中主要实现了volcano-schedule框架中的几个插件函数。当Volcano每次session运行时,实现的函数就会按照编写的规则运行,从而实现处理器的亲和性调度。亲和性调度插件主要实现的函数如下:

  • validJobFn:

    该函数主要是拦截申请NPU资源的任务,但申请的数量需要满足亲和性策略。具体要求请参见亲和性策略说明

  • AddPredicateFn:

    该函数主要是过滤掉不满足亲和性要求的节点。比如task请求数量为2时,但节点的两个HCCS却各自拥有1个处理器。该节点满足数量要求,却不满足亲和性要求,需要排除。

  • AddBatchNodeOrderFn:

    该函数主要是选出满足亲和性条件的节点和节点内的处理器,并将结果放入Pod中。

  • AddEventHandler:

    该函数主要是将节点拥有的可用的昇腾910 AI处理器进行统一管理。防止并发情况下的分发错误。

目录结构

├── build                       		# CI编译脚本
│  ├── build.sh			# CI构建二进制脚本
│  ├── testBuild.sh			# LLT测试启动脚本
│  ├── volcano-v1.4.0.yaml
│  └── volcano-v1.7.0.yaml
├── config
│  └── config.go
├── doc			# 说明文档
│  └── figures
│      ├── Affinity-algorithm-design-process-ch.png
│      ├── Affinity-algorithm-design-process-en.png
│      ├── Affinity-program-process-(Volcano-part)-ch.png
│      ├── Affinity-program-process-(Volcano-part)-en.png
│      ├── Ascend-910-AI-Processor-interconnection-topology.png
│      ├── icon-caution.gif
│      ├── icon-danger.gif
│      ├── icon-note.gif
│      ├── icon-notice.gif
│      ├── icon-tip.gif
│      └── icon-warning.gif
├── huawei_npu.go		  # ascend-volcano-plugin组件入口代码
├── huawei_npu_test.go
├── internal
│  ├── ascend310			
│  │  ├── card310x4		# 310卡调度策略代码目录
│  │  │  ├── frame.go
│  │  │  ├── frame_test.go
│  │  │  ├── task.go
│  │  │  └── type.go
│  │  ├── chip310x4		# 310 芯片调度策略代码目录
│  │  │  ├── frame.go
│  │  │  ├── frame_test.go
│  │  │  ├── node.go
│  │  │  └── type.go
│  │  ├── frame.go
│  │  ├── frame_test.go
│  │  └── type.go
│  ├── ascend310p		# 310P 卡调度公共代码目录
│  │  ├── frame.go
│  │  ├── frame_test.go
│  │  ├── rescheduling.go
│  │  ├── type.go
│  │  ├── vnpu.go
│  │  └── vnpu_test.go
│  ├── ascend910
│  │  ├── ascend910b
│  │  │  ├── base.go
│  │  │  ├── card910bx2		 # A300T A2调度策略代码目录
│  │  │  │  ├── frame.go
│  │  │  │  └── type.go
│  │  │  ├── card910bx2infer	# A300I A2调度策略代码目录
│  │  │  │  ├── frame.go
│  │  │  │  └── type.go
│  │  │  ├── job.go
│  │  │  ├── module910bx16	# A200T A2 Box16调度策略代码目录
│  │  │  │  ├── frame.go
│  │  │  │  ├── node.go
│  │  │  │  └── type.go
│  │  │  ├── module910bx8	# A800T A2 调度策略代码目录
│  │  │  │  ├── frame.go
│  │  │  │  ├── job.go
│  │  │  │  ├── node.go
│  │  │  │  └── type.go
│  │  │  ├── node.go
│  │  │  └── type.go
│  │  ├── asend910old
│  │  │  ├── card910x2		# A300T调度策略代码目录
│  │  │  │  ├── frame.go
│  │  │  │  ├── frame_test.go
│  │  │  │  ├── job.go
│  │  │  │  └── type.go
│  │  │  ├── half910x4		# 800/9000 4卡调度策略代码目录
│  │  │  │  ├── frame.go
│  │  │  │  ├── frame_test.go
│  │  │  │  ├── job.go
│  │  │  │  ├── node.go
│  │  │  │  └── type.go
│  │  │  └── module910x8		# 800/9000调度策略代码目录
│  │  │      ├── frame.go
│  │  │      ├── frame_reschedule_test.go
│  │  │      ├── frame_test.go
│  │  │      ├── job.go
│  │  │      ├── node.go
│  │  │      ├── task.go
│  │  │      └── type.go
│  │  ├── frame.go
│  │  ├── frame_test.go
│  │  └── type.go
│  ├── base			 # 基础调度策略代码目录
│  │  ├── frame.go
│  │  ├── frame_test.go
│  │  ├── node.go
│  │  ├── task.go
│  │  └── type.go
│  ├── rescheduling		# 故障调度策略代码目录
│  │  ├── cache.go
│  │  ├── cache_test.go
│  │  ├── configmap.go
│  │  ├── configmap_test.go
│  │  ├── frame_reschedule_test.go
│  │  ├── job.go
│  │  ├── job_test.go
│  │  ├── node.go
│  │  ├── node_test.go
│  │  ├── reschedule.go
│  │  ├── reschedule_test.go
│  │  ├── task.go
│  │  └── type.go
│  ├── test		# LLT公共代码目录
│  │  ├── job.go
│  │  ├── reschedule.go
│  │  └── type.go
│  └── vnpu		 # VNPU调度公共代码目录
│      ├── frame.go
│      ├── frame_test.go
│      ├── node.go
│      ├── pod.go
│      ├── type.go
│      ├── vdynamic.go
│      └── vstatic.go
├── LICENSE
├── output		 # CI编译结果目录
│  ├── Dockerfile-controller
│  └── Dockerfile-scheduler
├── OWNERS
├── plugin			 # 插件适配代码目录
│  ├── const.go
│  ├── device_info.go
│  ├── device_info_test.go
│  ├── factory.go
│  ├── factory_test.go
│  ├── job.go
│  ├── job_test.go
│  ├── node.go
│  ├── node_test.go
│  ├── plugin.go
│  ├── plugin_test.go
│  ├── task.go
│  ├── task_test.go
│  ├── tor.go
│  ├── type.go
│  └── vnode.go
├── README.md
├── test				# llt公共基础代码目录
│  ├── frame.go
│  ├── job.go
│  ├── node.go
│  ├── pod.go
│  ├── reschedule.go
│  └── type.go
├── type.go
└── util				# 调度策略公共代码目录
    ├── configmap.go
    ├── configmap_test.go
    ├── job.go
    ├── job_test.go
    ├── task.go
    ├── task_test.go
    ├── type.go
    └── util.go

编译说明

编译前准备

  • 确保PC机连接至互联网,并已完成Git和Docker的安装。参见Git安装Docker-ce安装

  • 已完成Go语言环境的安装(版本>1.13,建议使用最新的bugfix版本)。参见https://golang.org/

  • 完成musl的安装(版本>=1.2.0)。参见http://musl.libc.org/

  • 根据所在网络环境配置Go代理地址,国内可使用Goproxy China,例如:

    go env -w GOPROXY=https://goproxy.cn,direct

编译Volcano

  1. 执行以下命令,在“$GOPATH/src/volcano.sh/“目录下拉取Volcano v1.4.0(或v1.7.0)版本官方开源代码。

    cd $GOPATH/src/volcano.sh/ git clone -b release-1.4 https://github.com/volcano-sh/volcano.git

  2. 将代码目录“ascend-for-volcano“重命名为“ascend-volcano-plugin”拷贝至Volcano官方开源代码的插件路径下(“$GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/“)。

  3. 执行以下命令,编译Volcano二进制文件和so文件。根据开源代码版本,为build.sh脚本选择对应的参数,如v1.4.0.

    cd $GOPATH/src/volcano.sh/volcano/pkg/scheduler/plugins/ascend-volcano-plugin/build

    chmod +x build.sh

    ./build.sh v1.4.0

    编译出的二进制文件和动态链接库文件在“ascend-volcano-plugin/output“目录下,文件表1所示。

    表 1 output路径下的文件列表

    文件名

    说明

    volcano-npu-{version}.so

    Volcano华为NPU调度插件动态链接库

    Dockerfile-scheduler

    Volcano scheduler镜像构建文本文件

    Dockerfile-controller

    Volcano controller镜像构建文本文件

    volcano-{version}.yaml

    Volcano的启动配置文件

    vc-scheduler

    Volcano scheduler组件二进制文件

    vc-controller-manager

    Volcano controller组件二进制文件

    说明: {version}:表示版本号。

安装前准备和安装Volcano

请参考《MindX DL用户指南》中的“集群调度用户指南 > 安装部署指导 > 安装集群调度组件 > 典型安装场景 > 集群调度场景”进行。

版本更新记录

版本

发布日期

修改说明

v3.0.0

2023-01-18

  • 开源首次发布。
  • 具体内容请参考《MindX DL用户指南》
Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

基于开源Volcano,增强华为NPU亲和性调度功能。为了确保您能够获得商业支持,请使用我们正式版本的源码(Tag说明中有配套xx版本或者xx补丁版本字样)。同时,建议在集成时反馈相关信息(至少包含如下内容:集成的内容,版本,联系方式)到kangfuan2@huawei.com邮箱,我们将严格保护您的个人信息。 展开 收起
Go 等 2 种语言
Apache-2.0
取消

贡献者

全部

近期动态

加载更多
不能加载更多了
Go
1
https://gitee.com/ascend/ascend-for-volcano.git
git@gitee.com:ascend/ascend-for-volcano.git
ascend
ascend-for-volcano
ascend-for-volcano
master

搜索帮助