# ix-Volcano-Plugin **Repository Path**: deep-spark/ix-volcano-plugin ## Basic Information - **Project Name**: ix-Volcano-Plugin - **Description**: Iluvatar Corex Volcano Plugin是基于 Volcano 调度器开发的调度扩展,专为天数 GPU 集群设计,可在多节点环境下为多 GPU 任务实现最优的拓扑调度策略。 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-04 - **Last Updated**: 2025-09-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # IluvatarCorex Volcano Plugin ## Introduction IluvatarCorex Volcano Plugin is a scheduling plugin based on the Volcano scheduler, specifically designed for iluvatar GPU cluster, and achieves optimal topological scheduling strategies for multi-GPU tasks in multi-node scenarios. ## Prerequisites - Iluvatar driver and software stack >= v1.1.0 - Volcano version == 1.7.0 or 1.9.0 or 1.11.0 ## Deploy IX-Device-Plugin ```shell git clone http://bitbucket.iluvatar.ai:7990/scm/infra/ix-device-plugin.git -b feature/use-volcano docker pull 10.150.9.98:80/infra/ix-device-plugin:4.2.0-volcano docker tag 10.150.9.98:80/infra/ix-device-plugin:4.2.0-volcano ix-device-plugin:4.2.0 kubectl apply -f ix-device-plugin-volcano.yaml ``` ## Build Scheduler Image For example, if you want to build the plugin for Volcano 1.7.0, you can do the following: ```shell git clone https://github.com/volcano-sh/volcano.git -b release-1.7 cd volcano/pkg/scheduler/plugins/ git clone http://bitbucket.iluvatar.ai:7990/scm/infra/ix-volcano-plugin.git cd ix-volcano-plugin/output ../build/build.sh v1.7.0 sudo docker build --no-cache -t volcanosh/vc-scheduler:v1.7.0 ./ -f ./Dockerfile-scheduler ``` ## Deploy Volcano ``` mkdir -p /var/log/iluvatarcorex/volcano/ kubectl apply -f volcano-v1.7.0.yaml ``` ## Verify Deployment ### Verify IX-Device-Plugin ```shell # Check gpu info in a node kubectl get configmap -n kube-system ix-device-info-cm- -o yaml ``` `Example Output` ```yaml apiVersion: v1 data: DeviceInfoCfg: '{"DeviceInfo":{"GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578":{"Name":"Iluvatar BI-V150S","UUID":"GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578","Links":{"GPU-7454be0a-588c-45b6-a667-11c5810d920d":[{"TypeName":"P2PLinkCrossCPU","TypeIndex":1}],"GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec":[{"TypeName":"P2PLinkSameBoard","TypeIndex":6}]}},"GPU-7454be0a-588c-45b6-a667-11c5810d920d":{"Name":"Iluvatar MR-V50","UUID":"GPU-7454be0a-588c-45b6-a667-11c5810d920d","Links":{"GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578":[{"TypeName":"P2PLinkCrossCPU","TypeIndex":1}],"GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec":[{"TypeName":"P2PLinkCrossCPU","TypeIndex":1}]}},"GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec":{"Name":"Iluvatar BI-V150S","UUID":"GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec","Links":{"GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578":[{"TypeName":"P2PLinkSameBoard","TypeIndex":6}],"GPU-7454be0a-588c-45b6-a667-11c5810d920d":[{"TypeName":"P2PLinkCrossCPU","TypeIndex":1}]}}},"UpdateTime":1740968562}' DeviceListCfg: '{"DeviceList":["GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec","GPU-7454be0a-588c-45b6-a667-11c5810d920d","GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"],"UpdateTime":1740968562}' kind: ConfigMap metadata: creationTimestamp: "2025-03-03T02:15:42Z" name: ix-device-info-cm-volcano-worker namespace: kube-system resourceVersion: "483076" uid: ded2087e-1a67-4b29-b989-421faed24a35 ``` ### Test Volcano Job ```yaml apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: test-job spec: minAvailable: 2 schedulerName: volcano # policies: # - event: PodEvicted # action: RestartJob plugins: ssh: [] env: [] svc: [] maxRetry: 5 queue: default tasks: - replicas: 5 name: "default-nginx1" template: metadata: name: web spec: volumes: - name: corex-sdk hostPath: path: /usr/local/corex containers: - image: nginx imagePullPolicy: IfNotPresent name: nginx env: - name: PATH value: "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/corex/bin" - name: LD_LIBRARY_PATH value: "/usr/local/corex/lib64" resources: requests: cpu: "1" iluvatar.com/gpu: "2" limits: iluvatar.com/gpu: "2" volumeMounts: - mountPath: /usr/local/corex name: corex-sdk restartPolicy: OnFailure ``` ```shell kubectl apply -f job.yaml ``` `Example Output` ```shell NAMESPACE NAME READY STATUS RESTARTS AGE default test-job-default-nginx1-0 1/1 Running 0 16s default test-job-default-nginx1-1 1/1 Running 0 16s default test-job-default-nginx1-2 1/1 Running 0 16s default test-job-default-nginx1-3 1/1 Running 0 16s default test-job-default-nginx1-4 0/1 Pending 0 16s ``` ```shell kv describe pod test-job-default-nginx1-0|grep -A 5 Annotations ``` ```shell Annotations: iluvatar.com/DevKubelet: GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578,GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec iluvatar.com/DevRealAlloc: GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578,GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec iluvatar.com/DevVolcano: GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578,GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec predicate-time: 18446744073709551615 scheduling.k8s.io/group-name: test-job-eb6730f3-cf8c-49c2-8525-019ba5822b35 volcano.sh/job-name: test-job ``` It can be observed that the scheduling results of Volcano job through the Volcano scheduler are reflected in the Pod's annotations, which include the scheduled GPU resources and the actual allocation of devices. ### Test Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: corex-example spec: replicas: 3 selector: matchLabels: app: corex-example template: metadata: labels: app: corex-example spec: schedulerName: volcano containers: - name: corex-example image: nginx imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /usr/local/corex name: corex-sdk command: ["tail", "-f", "/dev/null"] resources: requests: cpu: 100m memory: 100M iluvatar.com/gpu: 2 limits: cpu: 100m memory: 100M iluvatar.com/gpu: 2 volumes: - name: corex-sdk hostPath: path: /usr/local/corex ``` ```shell kubectl apply -f deployment.yaml ``` `示例输出` ```shell NAMESPACE NAME READY STATUS RESTARTS AGE default corex-example-6f4f78bbcb-4554b 1/1 Running 0 9s default corex-example-6f4f78bbcb-b76xk 1/1 Running 0 9s default corex-example-6f4f78bbcb-jxn7b 1/1 Running 0 9s ``` ```shell kv describe pod corex-example-6f4f78bbcb-4554b|grep -A 5 Annotations ``` ```shell Annotations: iluvatar.com/DevKubelet: GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578,GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec iluvatar.com/DevRealAlloc: GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec,GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578 iluvatar.com/DevVolcano: GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec,GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578 predicate-time: 18446744073709551615 scheduling.k8s.io/group-name: podgroup-249f7d1b-df62-475e-affb-a57574dfecc2 ``` It can be observed that the scheduling results of Deployment through the Volcano scheduler are reflected in the Pod's annotations, which include the scheduled GPU resources and the actual allocation of devices. ## License Copyright (c) 2025 Iluvatar CoreX. All rights reserved. This project has an Apache-2.0 license, as found in the [LICENSE](LICENSE) file.