# k8s-python-api **Repository Path**: liht29/k8s-python-api ## Basic Information - **Project Name**: k8s-python-api - **Description**: A high-speed HPA component for K8s - **Primary Language**: Python - **License**: MulanPSL-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-03-10 - **Last Updated**: 2023-05-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # k8s-python-api ## 组件设计 工作流程为: 使用KubeEdge将云端集群扩展到边缘节点上,此时边缘节点上运行着不同应用,每一个应用都有多个容器副本。每一个应用的外部请求经过集群内部反向代理程序暴露的端口重定向到节点上该对应用的不同容器中。若此时该应用请求数量上升,组件能够从监控中获取到该应用请求速率变化的信息,并实时根据排队模型算法计算出满足SLA延时要求的容器数量,然后向集群中发出对该应用的容器进行扩容的信号;反之,若请求下降,组件发现当前容器的数量已经超出了满足应用要求的数量,因此向集群中发出缩容的信号。最后集群通信模块通过Kubernetes API请求对应的扩缩接口完成操作。 另外,如果此时某应用请求急剧增加,计算得到要分配的容器数量已经超出边缘节点的资源总量。组件将会使用带权公平分配算法会先保证那些没有超出资源请求界限的应用容器分配,而对于那些超出资源界限的应用将进行容器重分配。这样,当边缘节点的资源总量不足时,组件仍能保证那些正常应用的资源而不被抢占,并且能够最大程度地提供资源给那些高负载应用,避免了饥饿问题。 ### 集群通信 边缘节点通过KubeEdge连接到集群中,可以通过Kubernetes API与取进行交互。该部分负责与集群建立安全通信,能够获取集群的边缘节点、容器数量、处理器资源、状态等信息,并且能够对集群中边缘节点上部署的容器进行增删查改等操作。 ### 请求监控 分别对特定的应用请求进行监控和统计,该部分由集群中的Prometheus监控插件Nginx反向代理程序共同组成。监控插件的作用是动态实时地从指定服务中抓取集群的状态指标进行汇总。反向代理程序负责把请求平均地发送到节点中那些属于同一个应用的容器副本中进行处理,并且将经过它的所有请求作为状态指标上报给监控插件进行统计。 ### 队列模型 根据统计得到的请求速率进行动态计算,估算出每个应用为了满足低延时要求应该分配的容器数量。并且组件还会检查分配的容器是否会超出集群资源的总量,一旦集群进入过载状态,组件将根据预先设置的权重来重新分配容器的数量。 ## 环境配置 依赖的环境:Kubernetes、KubeEdge 需要配置nginx和prometheus ``` kubectl apply -f svc/nginx kubectl apply -f svc/prometheus kubectl apply -f metrics-server-component.yaml ``` ## 使用 ### 1. 直接部署 在单机环境中直接用python运行main.py脚本直接部署scale-plugin到集群中,需要有访问集群的权限.kube/config ### 2. 镜像部署 首先编写Dockerfile制作镜像(也可以直接使用已经构建好的镜像registry.cn-hangzhou.aliyuncs.com/liht29/scale-plugin:latest) 在scale-plugin.yaml中编写使用的镜像 ## demo ``` # 部署nginx组件监听请求 kubectl apply -f svc/nodejs-prime/ kubectl apply -f svc/nodejs-web kubectl apply -f svc/php-apache # 部署应用服务 kubectl apply -f deployment # 部署组件 kubectl apply -f scale-plugin.yaml ``` ### 3. 修改配置 可以在config.json中修改组件的运行配置,需要填写prometheus监听地址和端口,并且可以修改权重,运行之前请添加节点nodes修改为节点的名称 ## 运行测试 需要安装k6(https://k6.io/),使用到的测试脚本在script文件夹中 ### 应用处理请求速度测量 #### php-apache 1000m cpu - n=10000 c=1 u = 16 - n=10000 c=10 u = 15.67 - n=10000 c=100 u = 15.45 - n=10000 c=1000 u = 15.52 - n=10000 c=100 u = 16 cpu = 2000m #### nodejs-web 100m cpu - n=1000 c=1000 u = 41.64 - wrk: t = 12 c=400 u = 55.81 - wrk: t = 20 c=400 u = 46 - wrk: t = 400 c=400 u = 56 - wrk: t = 4 c = 20 u = 49 - wrk: t = 4 c = 1000 u = 588.52 cpu = 3000m - wrk: t = 4 c = 2000 u = 600 cpu = 300m ## 记录一些常用K8s命令 ### pod 日志 ``` kubectl logs -f custom-metrics-apiserver-659dfdc465-n5x6b -n custom-metrics ``` ### 删除pod ``` kubectl delete pods pod ``` ### 删除deployment ``` kubectl -n kube-system delete deployment grafana-core ``` ### 负载生成 ``` kubectl run -i --tty load-generators --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done" ``` ### 查看pod负载情况 ``` kubectl get hpa ``` ### namespace ``` kubectl config set-context --current --namespace= ``` ## 解决问题 1. 获取CPU资源时候显示unknown https://stackoverflow.com/questions/68648198/metrics-service-in-kubernetes-not-working 2. curl -Method 'POST' -Headers @{"Content-Type"="application/json"} http://localhost:8001/api/v1/namespaces/custom-metrics/services/custom-metrics-apiserver:http/proxy/write-metrics/namespaces/default/services/kubernetes/te st-metric -Body '"300m"' 3. Metric Server https://github.com/kubernetes-sigs/metrics-server#readme metrics-server 使用 Kubernetes API 来跟踪集群中的节点和 Pod。metrics-server 服务器通过 HTTP 查询每个节点以获取指标。 metrics-server 还构建了 Pod 元数据的内部视图,并维护 Pod 健康状况的缓存。 缓存的 Pod 健康信息可通过 metrics-server 提供的扩展 API 获得。 4. HPA https://kubernetes.io/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ 5. 官方调度缺点:给pod设置的资源阈值高,就容易使得扩容来不及,导致请求无法及时响应。而给的阈值低,就会导致资源浪费。 6. Prometheus:https://yunlzheng.gitbook.io/prometheus-book/part-iii-prometheus-shi-zhan/readmd/deploy-prometheus-in-kubernetes ## 测试报告 ### 3m-5m-3m hpa ``` execution: local script: node-web-test.js output: - scenarios: (100.00%) 1 scenario, 3000 max VUs, 31m30s max duration (incl. graceful stop): * default: Up to 3000 looping VUs for 31m0s over 7 stages (gracefulRampDown: 30s, gracefulStop: 30s) data_received..................: 24 MB 36 kB/s data_sent......................: 5.3 MB 8.0 kB/s http_req_blocked...............: avg=42.25µs min=1.05µs med=3.47µs max=18.82ms p(90)=10.44µs p(95)=29.3µs http_req_connecting............: avg=33.29µs min=0s med=0s max=18.75ms p(90)=0s p(95)=0s ✗ http_req_duration..............: avg=10s min=305.76µs med=3.48s max=44.1s p(90)=27.49s p(95)=31.99s { expected_response:true }...: avg=10.34s min=2.31ms med=3.89s max=44.1s p(90)=27.79s p(95)=32.1s http_req_failed................: 3.23% ✓ 2197 ✗ 65806 http_req_receiving.............: avg=53.4µs min=13.66µs med=41.2µs max=6.61ms p(90)=88.69µs p(95)=115.58µs http_req_sending...............: avg=51.44µs min=4.88µs med=14.57µs max=4.21ms p(90)=98.33µs p(95)=225.76µs http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...............: avg=10s min=266.9µs med=3.48s max=44.1s p(90)=27.49s p(95)=31.99s http_reqs......................: 68003 102.461064/s iteration_duration.............: avg=11s min=1s med=4.48s max=45.1s p(90)=28.49s p(95)=32.99s iterations.....................: 67982 102.429423/s vus............................: 10 min=1 max=3000 vus_max........................: 3000 min=3000 max=3000 running (11m03.7s), 0000/3000 VUs, 67982 complete and 50 interrupted iterations ``` plugin ``` data_received..................: 50 MB 76 kB/s data_sent......................: 12 MB 17 kB/s http_req_blocked...............: avg=21.23µs min=1.02µs med=3.53µs max=19.24ms p(90)=8.93µs p(95)=13.77µs http_req_connecting............: avg=14.5µs min=0s med=0s max=19.14ms p(90)=0s p(95)=0s ✗ http_req_duration..............: avg=4.02s min=266.76µs med=527.98ms max=1m0s p(90)=13.66s p(95)=17.46s { expected_response:true }...: avg=4.28s min=2.43ms med=871.81ms max=59.96s p(90)=14.45s p(95)=17.6s http_req_failed................: 15.06% ✓ 22133 ✗ 124794 http_req_receiving.............: avg=48.12µs min=0s med=38.36µs max=7.2ms p(90)=78.41µs p(95)=109.88µs http_req_sending...............: avg=27.77µs min=4.76µs med=12.04µs max=4.8ms p(90)=50.27µs p(95)=96.58µs http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...............: avg=4.02s min=230.21µs med=527.92ms max=1m0s p(90)=13.66s p(95)=17.46s http_reqs......................: 146927 222.287318/s iteration_duration.............: avg=5.01s min=1s med=1.52s max=1m1s p(90)=14.65s p(95)=18.45s iterations.....................: 146907 222.25706/s vus............................: 14 min=1 max=3000 vus_max........................: 3000 min=3000 max=3000 running (11m00.7s), 0000/3000 VUs, 128809 complete and 25 interrupted iterations ``` ## 5m-5m-5m-5m-5m hpa ``` ``` sc ``` http_req_blocked...............: avg=29.26µs min=1.16µs med=3.91µs max=31.39ms p(90)=9.65µs p(95)=17.77µs http_req_connecting............: avg=21.13µs min=0s med=0s max=31.27ms p(90)=0s p(95)=0s ✗ http_req_duration..............: avg=5.25s min=326.17µs med=476.64ms max=1m0s p(90)=15.99s p(95)=28.08s { expected_response:true }...: avg=4.08s min=2.18ms med=480.2ms max=59.97s p(90)=13.24s p(95)=20.1s http_req_failed................: 7.17% ✓ 21371 ✗ 276650 http_req_receiving.............: avg=50.72µs min=0s med=41.2µs max=8.42ms p(90)=83.91µs p(95)=117.27µs http_req_sending...............: avg=29.34µs min=4.91µs med=12.9µs max=7.31ms p(90)=49.72µs p(95)=93.06µs http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...............: avg=5.25s min=269.85µs med=476.58ms max=1m0s p(90)=15.99s p(95)=28.08s http_reqs......................: 298021 198.557027/s iteration_duration.............: avg=6.25s min=1s med=1.47s max=1m1s p(90)=16.98s p(95)=29.03s iterations.....................: 298002 198.544368/s vus............................: 56 min=1 max=3000 vus_m ```