# ascend-device-plugin **Repository Path**: frankang/ascend-device-plugin ## Basic Information - **Project Name**: ascend-device-plugin - **Description**: 华为NPU设备插件,提供Kubernetes设备上报、设备监控、设备分配功能 - **Primary Language**: Go - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 52 - **Created**: 2023-02-15 - **Last Updated**: 2024-04-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Ascend Device Plugin.en - [Ascend Device Plugin](#ascend-device-plugin.md) - [Description](#description.md) - [Environment Dependencies](#environment-dependencies.md) - [Building Ascend Device Plugin](#building-ascend-device-plugin.md) - [Creating DaemonSet](#creating-daemonset.md) - [Creating a Service Container](#creating-a-service-container.md) - [Directory Structure](#directory-structure.md) - [Version Updates](#version-updates.md)

Ascend Device Plugin

- **[Description](#description.md)** - **[Environment Dependencies](#environment-dependencies.md)** - **[Building Ascend Device Plugin](#building-ascend-device-plugin.md)** - **[Creating DaemonSet](#creating-daemonset.md)** - **[Creating a Service Container](#creating-a-service-container.md)**

Description

The device management plugin provides the following functions: - Device discovery: The number of discovered devices can be obtained from the Ascend device driver and reported to the Kubernetes system. Virtual devices generated by splitting physical devices can be detected. The device splitting must be done in advance. - Health check: The health status of Ascend devices can be detected. When a device is unhealthy, the device is reported to the Kubernetes system and is removed. The health status of a virtual device is determined by the physical device from which the virtual device is split. - Device allocation: Ascend devices can be allocated in the Kubernetes system.

Environment Dependencies

**Table 1** Environment Dependencies

Check Item

Requirement

dos2unix

Run the dos2unix --version command to check that the software has been installed. There is no requirement on the version.

Driver version of the RUN package

Go to the directory of the driver (for example, /usr/local/Ascend/driver) and run the cat version.info command to confirm that the driver version is 1.73 or later.

Go language environment

Run the go version command to confirm that the version is 1.14.3 or later.

gcc version

Run the gcc --version command to confirm that the version is 7.3.0 or later.

Kubernetes version

1.17.x. Select the latest bugfix version.

You can run the kubectl version command to view the version.

Docker environment

Run the docker info command to confirm that Docker has been installed.

root user permission

Check that the root user permission of the BMS is available.

Building Ascend Device Plugin

## Procedure 1. Run the following commands to configure the environment variables: **export GO111MODULE=on** **export GOPROXY=**_Proxy address_ **export GONOSUMDB=\\\*** >![](figures/icon-note.gif) **NOTE:** >Use the actual GOPROXY proxy address. You can run the **go mod download** command in the **ascend-device-plugin** directory to check the address. If no error information is displayed, the proxy is set successfully. 2. \(Optional\) Go to the **ascend-device-plugin** directory and run the following command to modify the YAML file as required: - Common YAML file **ascendplugin-910.yaml** - YAML file of MindX DL **vim ascendplugin-volcano.yaml** ``` ...... containers: - image: ascend-k8sdeviceplugin:v2.0.2 # Image name and version name: device-plugin-01 resources: requests: memory: 500Mi cpu: 500m limits: memory: 500Mi cpu: 500m command: [ "/bin/bash", "-c", "--"] args: [ "ascendplugin -useAscendDocker=true -volcanoType=true -logFile=/var/log/mindx-dl/devicePlugin/devicePlugin.log -logLevel=0" ] ...... ``` 3. Set the **useAscendDocker** parameter. - If Ascend Docker Runtime has been installed, set **useAscendDocker** to **true**. This is the default scenario and is recommended. - If Ascend Docker Runtime has not been installed, set **useAscendDocker** to **false**. - If CPU binding has been enabled, set **useAscendDocker** to **false** no matter whether Ascend Docker Runtime has been installed. 4. Run the following commands to go to the build directory and execute the build script. The binary device-plugin, YAML, and Dockerfile files are generated in the **output** directory. **cd **_/home/test/_**ascend-device-plugin/build/** **chmod +x build.sh** **./build.sh** 5. Run the following command to view the generated software package: **ll **_/home/test/_**ascend-device-plugin/output** ``` drwxr-xr-x 2 root root 4096 Jun 8 18:42 ./ drwxr-xr-x 9 root root 4096 Jun 8 17:12 ../ -r-x------. 1 root root 31927176 Jul 26 14:12 device-plugin -r--------. 1 root root 2081 Jul 26 14:12 device-plugin-310-v2.0.2.yaml -r--------. 1 root root 2202 Jul 26 14:12 device-plugin-710-v2.0.2.yaml -r--------. 1 root root 1935 Jul 26 14:12 device-plugin-910-v2.0.2.yaml -r--------. 1 root root 3070 Jul 26 14:12 device-plugin-volcano-v2.0.2.yaml -r--------. 1 root root 469 Jul 26 14:12 Dockerfile ``` >![](figures/icon-note.gif) **NOTE:** >The **ascendplugin-910.yaml** file in the **ascend-device-plugin** directory corresponds to the **device-plugin-910-v2.0.2.yaml** file in the **ascend-device-plugin/output/** directory and is used to update the version number. 6. Run the following command to view the Dockerfile. You can modify the Dockerfile as required. **vi** _/home/test/_**ascend-device-plugin/Dockerfile** ``` # Select the base image as required. You can run the docker images command to query the base image. FROM ubuntu:18.04 as build RUN useradd -d /home/HwHiAiUser -u 1000 -m -s /usr/sbin/nologin HwHiAiUser # Specify whether to use Ascend Docker. The default value is true. Change it to false. ENV USE_ASCEND_DOCKER true ENV LD_LIBRARY_PATH /usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64/common ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/Ascend/driver/lib64/ # Ensure that the device-plugin binary file exists for copying. COPY ./output/device-plugin /usr/local/bin/ RUN chmod 550 /usr/local/bin/device-plugin &&\ echo 'umask 027' >> /etc/profile &&\ echo 'source /etc/profile' >> ~/.bashrc ```

Creating DaemonSet

## Procedure >![](figures/icon-note.gif) **NOTE:** >The following operations use the ARM platform as an example to describe how to build and distribute images. 1. Obtain the Ascend Device Plugin software package. If the software package is compiled by yourself, skip this step. 1. Log in to [MindX DL](https://support.huawei.com/enterprise/zh/ascend-computing/mindx-pid-252501207/software). 2. Select the desired version and click **Download** next to the software package to obtain the software package and digital signature file. 3. After the download is complete, verify the integrity of the software package. 2. Upload the software package to any directory \(for example, **/home/ascend-device-plugin**\) on the server and decompress it. 3. Go to the directory where the package is decompressed and run the following command to build the Ascend Device Plugin image. ``` docker build -t ascend-k8sdeviceplugin:v2.0.2 . ``` If "Successfully built xxx" is displayed, the image is successfully created. Do not omit **.** at the end of the command. The dot \(.\) indicates the current directory. Run the **docker images** command. You can find the image whose name is **ascend-k8sdeviceplugin** and tag is **v2.0.2**. 4. Run the following command to package and compress the compiled image for transmission between servers: >![](figures/icon-note.gif) **NOTE:** >_\{arch\}_ indicates the system architecture. Images of different architectures cannot be shared. ``` docker save -o ascend-k8sdeviceplugin:v2.0.2 | gzip > Ascend-K8sDevicePlugin-v2.0.2-{arch}-Docker.tar.gz ``` Alternatively, run the following command to package the image \(the image will not be compressed\): ``` docker save -o Ascend-K8sDevicePlugin-v2.0.2-{arch}-Docker.tar ascend-k8sdeviceplugin:v2.0.2 ``` 5. \(Optional\) In cluster scenarios, distribute the packaged image \(for example, under **/home/ascend-device-plugin/**\) to compute nodes with Ascend AI Processors. ``` cd /home/ascend-device-plugin scp Ascend-K8sDevicePlugin-v2.0.2-{arch}-Docker.tar.gz root@{node IP address}:/home/ascend-device-plugin ``` 6. Run the following command to download the image. The compressed image file is used as an example. If the image file is not compressed, you need to change the file name. - ARM ``` docker load -i Ascend-K8sDevicePlugin-v2.0.2-arm64-Docker.tar.gz ``` - x86 ``` docker load -i Ascend-K8sDevicePlugin-v2.0.2-amd64-Docker.tar.gz ``` 7. Run the following command to label the node with Ascend 910, Ascend 310, or Ascend 710: ``` kubectl label nodes localhost.localdomain accelerator=huawei-Ascend910 ``` **localhost.localdomain** is the name of the node with Ascend 910, Ascend 310, or Ascend 710. You can run the **kubectl get node** command to view the node name. The label name must be the same as the **nodeSelector** label name in the YAML file in the software package. 8. Go to the **ascend-device-plugin/output** directory and run the following command to deploy DaemonSet: - On the Ascend 310 AI Processor node: ``` kubectl apply -f device-plugin-310-v2.0.2.yaml ``` - On the Ascend 910 AI Processor node, working with Volcano: ``` kubectl apply -f device-plugin-volcano-v2.0.2.yaml ``` - On the Ascend 710 AI Processor node: ``` kubectl apply -f device-plugin-710-v2.0.2.yaml ``` - On the Ascend 910 AI Processor node, where Ascend Device Plugin works independently and does not collaborate with Volcano: ``` kubectl apply -f device-plugin-910-v2.0.2.yaml ``` To view the node deployment information, you need to wait for several minutes after the deployment is complete. For details about the parameters in the YAML file, see [Table 1](#table1286935610129). **Table 1** Ascend Device Plugin startup parameters

Parameter

Type

Default Value

Description

-mode

string

None

Running mode of Ascend Device Plugin. If this parameter is not specified, the running mode is automatically specified based on the NPU type.

  • ascend310: running in Ascend 310 AI Processor mode
  • ascend710: running in Ascend 710 AI Processor mode
  • ascend910: running in Ascend 910 AI Processor mode

-fdFlag

bool

false

Edge scenario flag, indicating whether to manage devices with FusionDirector

-useAscendDocker

bool

true

Whether to use Ascend Docker Runtime

NOTE:

If CPU binding has been enabled, set useAscendDocker to false no matter whether Ascend Docker Runtime is used.

-volcanoType

bool

false

Whether to use Volcano for scheduling

-version

bool

false

Version of Ascend Device Plugin

-edgeLogFile

string

/var/alog/AtlasEdge_log/devicePlugin.log

Log path in the edge scenario

-logLevel

int

0

Level of a log

  • -1: debug
  • 0: info
  • 1: warning
  • 2: error
  • 3: dpanic
  • 4: panic
  • 5: fatal

-maxAge

int

7

Log backup time limit. The minimum value is 7, in days.

-isCompress

bool

false

Whether to automatically compress backup log files

-logFile

string

/var/log/mindx-dl/devicePlugin/devicePlugin.log

Log file

NOTE:

If the size of a log file exceeds 20 MB, automatic dump is triggered. The maximum size of a log file cannot be changed.

-maxBackups

int

30

Maximum number of dumped log files that can be retained. The value range is (0, 30].

9. Run the following command to view the node device deployment information: ``` kubectl describe node ``` If the label and number of nodes are correct, the deployment is successful, as shown in the following figure. ``` Capacity: cpu: 128 ephemeral-storage: 3842380928Ki huawei.com/Ascend910: 8 hugepages-2Mi: 0 memory: 263865068Ki pods: 110 Allocatable: cpu: 128 ephemeral-storage: 3541138257382 huawei.com/Ascend910: 8 hugepages-2Mi: 0 memory: 263762668Ki pods: 110 ```

Creating a Service Container

## Procedure 1. Go to the **ascend-device-plugin** directory and run the following command to edit the pod configuration file: ``` cd /home/ascend-device-plugin vi ascend.yaml ``` ``` apiVersion: v1 #Specifies the API version. This value must be included in kubectl apiversion. kind: Pod #Role or type of the resource to be created metadata: name: rest502 #Pod name, which must be unique in the same namespace. spec: #Detailed definition of a container in a pod. containers: #Containers in the pod. - name: rest502 #Container name in the pod. image: centos_arm64_resnet50:7.8 #Address of the inference or training service image used by the container in the pod. imagePullPolicy: Never resources: limits: #Resource limits huawei.com/Ascend310: 2 #Change the resource type based on the site requirements. For details about supported resource types, see the following section. volumeMounts: - name: joblog mountPath: /home/log/ #Path of the container internal log. Change the value based on the task requirements. - name: model mountPath: /home/app/model #Container internal model path. Change the value based on the task requirements. - name: slog-path mountPath: /var/log/npu/conf/slog/slog.conf - name: ascend-driver-path mountPath: /usr/local/Ascend/driver #Change the value based on the actual driver path. volumes: - name: joblog hostPath: path: /home/test/docker_log #Log path mounted to the host. Change the value based on the task requirements. - name: model hostPath: path: /home/test/docker_model/ #Model path mounted to the host. Change the value based on the task requirements. - name: slog-path hostPath: path: /var/log/npu/conf/slog/slog.conf - name: ascend-driver-path hostPath: path: /usr/local/Ascend/driver #Change the value based on the actual driver path. ``` Supported resource types: - **huawei.com/Ascend310: 2**, indicating that two Ascend 310 AI Processors are allocated - **huawei.com/Ascend710: 1**, indicating that one Ascend 710 AI Processor is allocated - **huawei.com/Ascend910: 4**, indicating that four Ascend 910 AI Processors are allocated - **huawei.com/Ascend910-16c: 1**, indicating that a virtual device with 16 cores is allocated. Only single-card single-container tasks are supported. That is, the value must be **1**. Virtual devices with **2c**, **4c**, **8c**, or **16c** AI cores can be scheduled. 2. Run the following command to create a pod: ``` kubectl apply -f ascend.yaml ``` >![](figures/icon-note.gif) **NOTE:** >To delete the pod, run the following command: >**kubectl delete -f** **ascend.yaml** 3. Run the following commands to access the pod and view the allocation information: ``` kubectl exec -it Pod name bash ``` The pod name is the one configured in [1](#en-us_topic_0269670251_en-us_topic_0249483204_li104071617503). ``` ls /dev/ ``` In the command output similar to the following, **davinci3** and **davinci4** are the allocated pods. ``` core davinci3 davinci4 davinci_manager devmm_svm fd full hisi_hdc mqueue null ptmx ```

Directory Structure

``` ├── build # Compilation scripts │ └── build.sh ├── output # Compilation result directory. ├── src # Source code directory. │ └── plugin │ │ ├── cmd/ascendplugin │ │ │ └── ascend_plugin.go │ │ └── pkg/npu/huawei ├── test # Test directory. ├── Dockerfile # Image file. ├── LICENSE ├── Open Source Software Notice.md ├── README.ZH.md ├── README.EN.md ├── ascend.yaml # YAML file of the sample running task ├── ascendplugin-310.yaml # YAML file for deploying the plugin on the inference card with Ascend 310 ├── ascendplugin-710.yaml # YAML file for deploying the plugin on the inference card with Ascend 710 ├── ascendplugin-volcano.yaml # YAML file for implementing affinity scheduling and deployment with Ascend 910 and Volcano ├── ascendplugin-910.yaml # YAML file for deploying the plugin with Ascend 910 but without Volcano ├── go.mod └── go.sum ```

Version Updates

Version

Date

Description

v2.0.3

2021-10-15

  • nothing

v2.0.2

2021-07-15

  • Added the support of Ascend 910 AI Processors for computing power splitting.

v2.0.1

2021-04-20

  • Adapted to Ascend 710 AI Processors.
  • Changed the processor ID in the reported information from the logical ID to the physical ID.
  • Changed the policy to ensure that processors are not isolated due to minor alarms.

v20.2.0

2021-01-08

Optimized the description in "Creating DaemonSet."

v20.2.0

2020-11-18

This is the first official release.