# Dubhe-deploy **Repository Path**: buxiaomo/dubhe-deploy ## Basic Information - **Project Name**: Dubhe-deploy - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 27 - **Created**: 2024-03-05 - **Last Updated**: 2025-06-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 天枢 系统要求:Ubuntu 20.04、高于此版本参考 [issues](https://github.com/buxiaomo/kubeasy/issues/12) 配置 硬件要求:16C32G+(最小)、16C64G+(建议) 存储要求:200G+ ## 数据盘 部署 K8S 时内置,仅作为记录,可跳过. ``` mkfs.xfs /dev/vdb echo "/dev/vdb /data xfs defaults 0 0" >> /etc/fstab mkdir -p /data mount -a ``` ## NFS ### master节点部署nfs ``` # 安装 NFS mkdir -p /data/kubernetes /data/dubhe apt-get install nfs-kernel-server nfs-common -y echo "/data/kubernetes *(rw,sync,insecure,no_subtree_check,no_root_squash)" >> /etc/exports echo "/data/dubhe *(rw,sync,insecure,no_subtree_check,no_root_squash)" >> /etc/exports systemctl restart nfs-kernel-server systemctl enable nfs-kernel-server echo '192.168.1.2:/data/dubhe /data/dubhe nfs defaults 0 0' > /etc/fstab ``` ### 有nfs存储 ``` echo '192.168.100.2:/dubhe /data/dubhe nfs defaults 0 0' > /etc/fstab mount -a ``` ## 部署K8S 网络不好提前下载以下文件,在执行 `make prepare` 之前将文件保存在 `/usr/local/src/kubeasy/scripts/src` 目录下. * https://github.com/buxiaomo/kubeasy/releases/download/v1.18.20/kubeasy-registry-v1.18.20.tar.gz * https://github.com/buxiaomo/kubeasy/releases/download/v1.18.20/kubeasy-binary-v1.18.20.tar.gz ``` apt-get update apt-get install wget git make -y wget https://github.com/mikefarah/yq/releases/download/v4.44.1/yq_linux_amd64 -O /usr/local/bin/yq chmod +x /usr/local/bin/yq git clone -b v1.18 https://github.com/buxiaomo/kubeasy.git /usr/local/src/kubeasy cd /usr/local/src/kubeasy make -C group_vars/ yq -i '.docker.datadir = "/data/docker"' ./group_vars/kubernetes.yml yq -i '.etcd.datadir = "/data/etcd"' ./group_vars/kubernetes.yml yq -i '.docker.daemon.hosts[0] = "unix:///var/run/docker.sock"' ./group_vars/kubernetes.yml yq -i '.docker.daemon.hosts[1] = "tcp://0.0.0.0:2375"' ./group_vars/kubernetes.yml yq -i '.docker.daemon.insecure-registries[0] = "192.168.1.2:30002"' ./group_vars/kubernetes.yml yq -i 'del(.docker.daemon.storage-opts)' ./group_vars/kubernetes.yml make runtime PIP_ARGS='-i https://pypi.tuna.tsinghua.edu.cn/simple' make prepare REGISTRY_DIR=/data make hosts make deploy REGISTRY_URL=http://`hostname -I | awk '{print $1}'`:5000 DOCKER_VERSION=24.0.9 KUBE_NETWORK=flannel ``` ## 部署存储 provisioner 不部署 Harbor 或单节点可跳过此步骤. ``` curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/ helm pull nfs-subdir-external-provisioner/nfs-subdir-external-provisioner --version 4.0.18 wget https://github.com/buxiaomo/dubhe-package/releases/download/v1/nfs-subdir-external-provisioner-4.0.18.tgz helm upgrade -i nfs-subdir-external-provisioner -n kube-system \ --set image.repository=192.168.1.2:5000/sig-storage/nfs-subdir-external-provisioner \ --set image.tag=v4.0.2 \ --set nfs.path=/data/kubernetes \ --set nfs.server=192.168.1.2 \ ./nfs-subdir-external-provisioner-4.0.18.tgz # kubectl patch storageclass nfs-client -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' ``` ## 部署 kube-state-metrics 监控需要,不需要可跳过. ``` curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm pull prometheus-community/kube-state-metrics --version 4.0.1 wget https://github.com/buxiaomo/dubhe-package/releases/download/v1/kube-state-metrics-4.0.1.tgz helm upgrade -i kube-state-metrics \ --namespace kube-system \ --set image.repository=192.168.1.2:5000/kube-state-metrics/kube-state-metrics \ --set image.tag=v2.2.0 \ ./kube-state-metrics-4.0.1.tgz ``` ## 部署 Traefik > 试验阶段 ``` curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm repo add traefik https://traefik.github.io/charts helm pull traefik/traefik --version 26.1.0 wget https://github.com/buxiaomo/dubhe-package/releases/download/v1/traefik-26.1.0.tgz helm upgrade -i traefik --version 26.1.0 \ --namespace traefik --create-namespace \ --set image.registry=docker.io \ --set image.repository=traefik \ --set image.tag=v2.11.0 \ --set logs.access.enabled=true \ --set service.type=NodePort \ --set ports.web.nodePort=30080 \ --set ports.websecure.nodePort=30443 \ --set globalArguments[0]='--entrypoints.web.forwardedheaders.insecure' \ --set globalArguments[1]='--entrypoints.websecure.forwardedHeaders.insecure' \ ./traefik-26.1.0.tgz ``` ### ingress ``` kind: Ingress apiVersion: extensions/v1beta1 metadata: name: dubhe-system namespace: dubhe-system annotations: kubernetes.io/ingress.class: traefik spec: rules: - host: tianqu.xiaomo.site http: paths: - path: /nacos/ pathType: Prefix backend: serviceName: nacos servicePort: 8848 - path: /minio pathType: Prefix backend: serviceName: minio servicePort: 9000 - path: /ws pathType: Prefix backend: serviceName: backend-k8s servicePort: 8960 - path: / pathType: Prefix backend: serviceName: web servicePort: 80 ``` ## 部署 Harbor ``` curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm repo add harbor https://helm.goharbor.io helm repo update helm pull harbor/harbor --version 1.14.3 wget https://github.com/buxiaomo/dubhe-package/releases/download/v1/harbor-1.14.3.tgz helm upgrade -i harbor --version 1.14.3 \ --create-namespace --namespace harbor-system \ --set exporter.image.repository=docker.io/goharbor/harbor-exporter \ --set exporter.image.tag=v2.12.0 \ --set redis.internal.image.repository=docker.io/goharbor/redis-photon \ --set redis.internal.image.tag=v2.12.0 \ --set database.internal.image.repository=docker.io/goharbor/harbor-db \ --set database.internal.image.tag=v2.12.0 \ --set trivy.image.repository=docker.io/goharbor/trivy-adapter-photon \ --set trivy.image.tag=v2.12.0 \ --set registry.registry.image.repository=docker.io/goharbor/registry-photon \ --set registry.registry.image.tag=v2.12.0 \ --set registry.controller.image.repository=docker.io/goharbor/harbor-registryctl \ --set registry.controller.image.tag=v2.12.0 \ --set jobservice.image.repository=docker.io/goharbor/harbor-jobservice \ --set jobservice.image.tag=v2.12.0 \ --set core.image.repository=docker.io/goharbor/harbor-core \ --set core.image.tag=v2.12.0 \ --set portal.image.repository=docker.io/goharbor/harbor-portal \ --set portal.image.tag=v2.12.0 \ --set nginx.image.repository=docker.io/goharbor/nginx-photon \ --set nginx.image.tag=v2.12.0 \ --set externalURL=http://192.168.1.2:30002/ \ --set expose.type=nodePort \ --set expost.nodePort.ports.http.nodePort=30002 \ --set expose.tls.enabled=false \ --set harborAdminPassword=Harbor12345 \ --set persistence.persistentVolumeClaim.registry.storageClass="nfs-client" \ --set persistence.persistentVolumeClaim.jobservice.jobLog.storageClass="nfs-client" \ --set persistence.persistentVolumeClaim.database.storageClass="nfs-client" \ --set persistence.persistentVolumeClaim.redis.storageClass="nfs-client" \ --set persistence.persistentVolumeClaim.trivy.storageClass="nfs-client" \ ./harbor-1.14.3.tgz ``` ## 部署天枢 ``` # 代理 export https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 no_proxy="mirrors.tencentyun.com" git clone https://gitee.com/buxiaomo/dubhe-deploy.git /usr/local/src/dubhe-deploy cd /usr/local/src/dubhe-deploy cp configmap.yaml.example configmap.yaml # 修改 Harbor 信息 sed -i "s|docker.hub|`hostname -I | awk '{print $1}'`:30002|g" configmap.yaml # 修改后端信息 sed -i "s|HOST_IP|`hostname -I | awk '{print $1}'`|g" configmap.yaml # 修改仓库地址 REGISTRY_URL=`hostname -I | awk '{print $1}'`:5000 sed -i "s|127.0.0.1:5000|${REGISTRY_URL}|g" configmap.yaml # 多节点需要修改 'storage.classname' sed -i 's/storage.classname:.*/storage.classname: "nfs-client"/g' configmap.yaml # 内网访问 sed -i "s|x.x.x.x|`hostname -I | awk '{print $1}'`|g" configmap.yaml # 公网访问 sed -i "s|x.x.x.x|124.156.101.227|g" configmap.yaml # 修改使用内网镜像仓库(没提前准备镜像可跳过) wget -c -t 5 https://github.com/buxiaomo/dubhe-package/releases/download/v1/registry.tar.gz.{00..10} cat /usr/local/src/registry.tar.gz.* | tar -zxvf - -C /data/ REGISTRY_URL=`hostname -I | awk '{print $1}'`:5000 sed -i "s|registry.cn-hangzhou.aliyuncs.com|${REGISTRY_URL}|g" configmap.yaml sed -i "s|docker.io|${REGISTRY_URL}|g" configmap.yaml sed -i "s|docker.elastic.co|${REGISTRY_URL}|g" configmap.yaml # 构建镜像 ## 下载了镜像,切换到 `local` 分支,使用本地镜像. ./dubhectl build-image all # 部署 ./dubhectl install # 其他 worker 节点挂载存储 ## storage.classname: "hostpath" mount -t nfs :/data/dubhe/dubhe-storage /nfs ## storage.classname: "nfs-client" echo "mount -t nfs 192.168.100.2:/kubernetes/dubhe-system-dubhe-storage-$(kubectl get pvc -n dubhe-system dubhe-storage -o jsonpath='{.spec.volumeName}') /nfs" ``` ## 访问地址 通过 内网IP/公网IP 加上一下端口访问对应的服务 | 服务名 | 端口 | 默认账户 | |----------|------------------------------|-----------------------------| | web | http://\:30800 | admin/admin | | minio | http://\:30900 | admin/abcdefg123456 | | nacos | http://\:30848/nacos | nacos/nacos | | harbor | http://\:30002 | admin/Harbor12345 | | mysql | \:30678 | root/root | | grafana | http://\:30006 | admin/admin | ### 公网防火墙 | 来源 | 端口 | 备注 | |-----------|--------|-------------------| | 0.0.0.0/0 | 30900 | minio | | 0.0.0.0/0 | 30960 | K8S WebSocket | | 0.0.0.0/0 | 30800 | web页面 | ## 重启服务 ``` kubectl rollout restart deployment -n dubhe-system backend kubectl rollout restart deployment -n dubhe-system algorithm-imgprocess \ algorithm-ofrecord algorithm-videosample backend-admin backend-algorithm \ backend-auth backend-data backend-data-dcm backend-data-task backend-dcm4chee \ backend-gateway backend-image backend-k8s backend-measure backend-model \ backend-model-converter backend-model-measure backend-notebook backend-optimize \ backend-point-cloud backend-serving backend-serving-gateway backend-tadl \ backend-terminal backend-train backend-visual ``` ## example ``` # buxiaomo/notebook:24.9.2-cuda-11.4.3-devel-ubuntu20.04 # git clone https://github.com/guanshuicheng/invoice.git . # conda create -n py36 python=3.6 -y # conda activate py36 # pip install -r requirements.txt # /home/admin/.local/lib/python3.8/site-packages # Notebook镜像 docker.io/buxiaomo/notebook:cuda-11.4.3-devel-ubuntu20.04 # 终端镜像 docker.io/buxiaomo/sshd:v1 ts-cli system url "http://43.163.108.82:30800/api/v1" ts-cli system auth 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MzM5Njk5MzUsInVzZXJfbmFtZSI6ImFkbWluIiwianRpIjoiYjNjMDk4ODItNWI1ZC00NGI4LWJhMzItYWZjZmM2ZmYwMjIzIiwiY2xpZW50X2lkIjoiZHViaGUtY2xpZW50Iiwic2NvcGUiOlsiYWxsIl19.Nh3bDcSHTtKdVzA2qagSP_k0lY4NK2jrvUQQ2v-HTF8' ts-cli dataset import --type=standard --source=/Users/peng.liu/Downloads/flower_photos/daisy --annotation_type=ImageClassify --name=demo ```