登录
注册
开源
企业版
高校版
搜索
帮助中心
使用条款
关于我们
开源
企业版
高校版
私有云
模力方舟
AI 队友
登录
注册
轻量养虾,开箱即用!低 Token + 稳定算力,Gitee & 模力方舟联合出品的 PocketClaw 正式开售!点击了解详情~
代码拉取完成,页面将自动刷新
仓库状态说明
捐赠
捐赠前请先登录
取消
前往登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
Watch
不关注
关注所有动态
仅关注版本发行动态
关注但不提醒动态
31
Star
23
Fork
39
Ascend
/
ascend-npu-exporter
暂停
代码
Issues
9
Pull Requests
7
Wiki
统计
流水线
服务
质量分析
Jenkins for Gitee
腾讯云托管
腾讯云 Serverless
悬镜安全
阿里云 SAE
Codeblitz
SBOM
开发画像分析
我知道了,不再自动展开
更新失败,请稍后重试!
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
npu-exporter启动找不到hccn_tool
DONE
#IAJPTN
需求
Charts
创建于
2024-08-13 15:57
我的yaml ``` --- apiVersion: apps/v1 kind: DaemonSet metadata: name: npu-exporter namespace: kube-system spec: selector: matchLabels: app: npu-exporter template: metadata: annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default labels: app: npu-exporter spec: nodeSelector: node-role.kubernetes.io/productName: huawei.com node-role.kubernetes.io/deviceType: Ascend310P containers: - name: npu-exporter image: 10.102.17.2/sys/ascend-npu-exporter:v5.0.RC3 resources: requests: memory: 1000Mi cpu: 1000m limits: memory: 1000Mi cpu: 1000m imagePullPolicy: Always command: [ "/bin/bash", "-c", "--"] # pair firstly args: [ "umask 027;npu-exporter -port=8082 -ip=0.0.0.0 -updateTime=5 -logFile=/var/log/mindx-dl/npu-exporter/npu-exporter.log -logLevel=0 -containerMode=docker" ] securityContext: privileged: true readOnlyRootFilesystem: true runAsUser: 0 runAsGroup: 0 ports: - name: http containerPort: 8082 protocol: TCP volumeMounts: - name: log-npu-exporter mountPath: /var/log/mindx-dl/npu-exporter - name: localtime mountPath: /etc/localtime readOnly: true - name: ascend-driver mountPath: /usr/local/Ascend/driver readOnly: true - name: ascend-dcmi mountPath: /usr/local/dcmi readOnly: true - name: sys mountPath: /sys readOnly: true - name: docker-shim # delete when only use containerd or isula mountPath: /var/run/dockershim.sock readOnly: true - name: docker # delete when only use containerd or isula mountPath: /var/run/docker readOnly: true - name: cri-dockerd # reserve when k8s version is 1.24+ and the container runtime is docker mountPath: /var/run/cri-dockerd.sock readOnly: true - name: containerd # delete when only use isula mountPath: /run/containerd readOnly: true - name: isulad # delete when use containerd or docker mountPath: /run/isulad.sock readOnly: true - name: tmp mountPath: /tmp volumes: - name: log-npu-exporter hostPath: path: /var/log/mindx-dl/npu-exporter type: Directory - name: localtime hostPath: path: /etc/localtime - name: ascend-driver hostPath: path: /usr/local/Ascend/driver - name: ascend-dcmi hostPath: path: /usr/local/dcmi - name: sys hostPath: path: /sys - name: docker-shim # delete when only use containerd or isula hostPath: path: /var/run/dockershim.sock - name: docker # delete when only use containerd or isula hostPath: path: /var/run/docker - name: cri-dockerd # reserve when k8s version is 1.24+ and the container runtime is docker hostPath: path: /var/run/cri-dockerd.sock - name: containerd # delete when only use isula hostPath: path: /run/containerd - name: isulad # delete when use containerd or docker hostPath: path: /run/isulad.sock - name: tmp hostPath: path: /tmp ``` 错误日志 ``` kubectl logs npu-exporter-8t85m -n kube-system [INFO] 2024/08/13 15:53:02.898325 1 hwlog/api.go:108 npu-exporter.log's logger init success [INFO] 2024/08/13 15:53:02.898565 1 npu-exporter/main.go:205 listen on: 0.0.0.0 [INFO] 2024/08/13 15:53:02.898616 1 npu-exporter/main.go:325 npu exporter starting and the version is v5.0.RC3_linux-x86_64 [WARN] 2024/08/13 15:53:03.218254 1 npu-exporter/main.go:339 enable unsafe http server [WARN] 2024/08/13 15:53:08.219019 10 container/runtime_ops.go:150 failed to get OCI connection [WARN] 2024/08/13 15:53:08.219101 10 container/runtime_ops.go:152 use backup address to try again [INFO] 2024/08/13 15:53:08.219662 10 collector/npu_collector.go:240 Starting update cache every 5 seconds [INFO] 2024/08/13 15:53:08.251820 38 collector/npu_collector.go:308 update cache,key is npu-exporter-containers-devices [ERROR] 2024/08/13 15:53:08.252079 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:08.259782 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:08.267577 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:08.275601 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [INFO] 2024/08/13 15:53:08.275630 37 collector/npu_collector.go:285 update cache,key is npu-exporter-network-info [INFO] 2024/08/13 15:53:08.573334 36 collector/npu_collector.go:264 update cache,key is npu-exporter-npu-list [INFO] 2024/08/13 15:53:13.240922 38 collector/npu_collector.go:308 update cache,key is npu-exporter-containers-devices [ERROR] 2024/08/13 15:53:13.249472 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:13.257303 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:13.265322 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:13.272898 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [INFO] 2024/08/13 15:53:13.272928 37 collector/npu_collector.go:285 update cache,key is npu-exporter-network-info [INFO] 2024/08/13 15:53:13.569896 36 collector/npu_collector.go:264 update cache,key is npu-exporter-npu-list [INFO] 2024/08/13 15:53:18.233317 38 collector/npu_collector.go:308 update cache,key is npu-exporter-containers-devices [ERROR] 2024/08/13 15:53:18.246591 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:18.254970 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:18.263827 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:18.271911 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [INFO] 2024/08/13 15:53:18.271938 37 collector/npu_collector.go:285 update cache,key is npu-exporter-network-info [INFO] 2024/08/13 15:53:18.569092 36 collector/npu_collector.go:264 update cache,key is npu-exporter-npu-list ```
我的yaml ``` --- apiVersion: apps/v1 kind: DaemonSet metadata: name: npu-exporter namespace: kube-system spec: selector: matchLabels: app: npu-exporter template: metadata: annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default labels: app: npu-exporter spec: nodeSelector: node-role.kubernetes.io/productName: huawei.com node-role.kubernetes.io/deviceType: Ascend310P containers: - name: npu-exporter image: 10.102.17.2/sys/ascend-npu-exporter:v5.0.RC3 resources: requests: memory: 1000Mi cpu: 1000m limits: memory: 1000Mi cpu: 1000m imagePullPolicy: Always command: [ "/bin/bash", "-c", "--"] # pair firstly args: [ "umask 027;npu-exporter -port=8082 -ip=0.0.0.0 -updateTime=5 -logFile=/var/log/mindx-dl/npu-exporter/npu-exporter.log -logLevel=0 -containerMode=docker" ] securityContext: privileged: true readOnlyRootFilesystem: true runAsUser: 0 runAsGroup: 0 ports: - name: http containerPort: 8082 protocol: TCP volumeMounts: - name: log-npu-exporter mountPath: /var/log/mindx-dl/npu-exporter - name: localtime mountPath: /etc/localtime readOnly: true - name: ascend-driver mountPath: /usr/local/Ascend/driver readOnly: true - name: ascend-dcmi mountPath: /usr/local/dcmi readOnly: true - name: sys mountPath: /sys readOnly: true - name: docker-shim # delete when only use containerd or isula mountPath: /var/run/dockershim.sock readOnly: true - name: docker # delete when only use containerd or isula mountPath: /var/run/docker readOnly: true - name: cri-dockerd # reserve when k8s version is 1.24+ and the container runtime is docker mountPath: /var/run/cri-dockerd.sock readOnly: true - name: containerd # delete when only use isula mountPath: /run/containerd readOnly: true - name: isulad # delete when use containerd or docker mountPath: /run/isulad.sock readOnly: true - name: tmp mountPath: /tmp volumes: - name: log-npu-exporter hostPath: path: /var/log/mindx-dl/npu-exporter type: Directory - name: localtime hostPath: path: /etc/localtime - name: ascend-driver hostPath: path: /usr/local/Ascend/driver - name: ascend-dcmi hostPath: path: /usr/local/dcmi - name: sys hostPath: path: /sys - name: docker-shim # delete when only use containerd or isula hostPath: path: /var/run/dockershim.sock - name: docker # delete when only use containerd or isula hostPath: path: /var/run/docker - name: cri-dockerd # reserve when k8s version is 1.24+ and the container runtime is docker hostPath: path: /var/run/cri-dockerd.sock - name: containerd # delete when only use isula hostPath: path: /run/containerd - name: isulad # delete when use containerd or docker hostPath: path: /run/isulad.sock - name: tmp hostPath: path: /tmp ``` 错误日志 ``` kubectl logs npu-exporter-8t85m -n kube-system [INFO] 2024/08/13 15:53:02.898325 1 hwlog/api.go:108 npu-exporter.log's logger init success [INFO] 2024/08/13 15:53:02.898565 1 npu-exporter/main.go:205 listen on: 0.0.0.0 [INFO] 2024/08/13 15:53:02.898616 1 npu-exporter/main.go:325 npu exporter starting and the version is v5.0.RC3_linux-x86_64 [WARN] 2024/08/13 15:53:03.218254 1 npu-exporter/main.go:339 enable unsafe http server [WARN] 2024/08/13 15:53:08.219019 10 container/runtime_ops.go:150 failed to get OCI connection [WARN] 2024/08/13 15:53:08.219101 10 container/runtime_ops.go:152 use backup address to try again [INFO] 2024/08/13 15:53:08.219662 10 collector/npu_collector.go:240 Starting update cache every 5 seconds [INFO] 2024/08/13 15:53:08.251820 38 collector/npu_collector.go:308 update cache,key is npu-exporter-containers-devices [ERROR] 2024/08/13 15:53:08.252079 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:08.259782 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:08.267577 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:08.275601 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [INFO] 2024/08/13 15:53:08.275630 37 collector/npu_collector.go:285 update cache,key is npu-exporter-network-info [INFO] 2024/08/13 15:53:08.573334 36 collector/npu_collector.go:264 update cache,key is npu-exporter-npu-list [INFO] 2024/08/13 15:53:13.240922 38 collector/npu_collector.go:308 update cache,key is npu-exporter-containers-devices [ERROR] 2024/08/13 15:53:13.249472 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:13.257303 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:13.265322 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:13.272898 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [INFO] 2024/08/13 15:53:13.272928 37 collector/npu_collector.go:285 update cache,key is npu-exporter-network-info [INFO] 2024/08/13 15:53:13.569896 36 collector/npu_collector.go:264 update cache,key is npu-exporter-npu-list [INFO] 2024/08/13 15:53:18.233317 38 collector/npu_collector.go:308 update cache,key is npu-exporter-containers-devices [ERROR] 2024/08/13 15:53:18.246591 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:18.254970 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:18.263827 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [ERROR] 2024/08/13 15:53:18.271911 37 hccn/hccn_tool.go:221 get npu interface status failed, fork/exec /usr/local/Ascend/driver/tools/hccn_tool: no such file or directory [INFO] 2024/08/13 15:53:18.271938 37 collector/npu_collector.go:285 update cache,key is npu-exporter-network-info [INFO] 2024/08/13 15:53:18.569092 36 collector/npu_collector.go:264 update cache,key is npu-exporter-npu-list ```
评论 (
3
)
登录
后才可以发表评论
状态
DONE
TODO
WIP
DONE
CLOSED
REJECTED
负责人
未设置
标签
未设置
项目
未立项任务
未立项任务
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
分支 (
-
)
标签 (
-
)
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
预计工期
(小时)
参与者(3)
Go
1
https://gitee.com/ascend/ascend-npu-exporter.git
git@gitee.com:ascend/ascend-npu-exporter.git
ascend
ascend-npu-exporter
ascend-npu-exporter
点此查找更多帮助
搜索帮助
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
评论
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册