# Fault-Diagnosis **Repository Path**: debug-zhang/fault-diagnosis ## Basic Information - **Project Name**: Fault-Diagnosis - **Description**: 面向微服务架构的异常检测与根因定位工具 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-03-09 - **Last Updated**: 2022-06-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # microservice_admin 异常检测和故障排除系统前端+后端 ## 安装和运行microservice_admin前端及后端 (microservice_admin在本地运行) 前置条件 - Python3.6及以上 - Node JS - kubectl(能够连接到实验室K8s集群) 进入microservice_admin目录下 安装后端依赖 ```shell python3 -m pip install flask python3 -m pip install sqlitedict python3 -m pip install flask-cors python3 -m pip uninstall urllib3 python3 -m pip install urllib3==1.24.3 python3 -m pip install kubernetes python3 -m pip install prometheus-api-client ``` 运行后端 ```sshell python3 server.py ``` 安装前端依赖 ```shell npm install ``` 运行前端 ```shell npm run dev ``` 浏览器会自动打开前端界面 ![1609313194567](frontend/assets/1609313194567.png) ## 安装和运行拓扑关系编辑器 (目前microservice_admin使用Vue框架开发,拓扑关系编辑器使用React框架开发,暂未集成到一起,暂时使用iframe嵌入到microservice_admin中,因此需要与microservice_admin安装到一台机器上) 进入alarm_editor目录下 安装依赖 ```shell npm install ``` 运行 ```shell npm start ``` 嵌入到平台中运行效果 ![1609244314694](frontend/assets/1609244314694.png) ## 在K8s集群中部署VNC远程桌面,在其中运行Prometheus等组件 ### 基本信息 VNC远程桌面镜像:jisuozhao/ubuntu_vnc:microservice_admin 包含的组件: - Prometheus 9090端口 - Grafana 3000端口 - codeserver(VS code网页版)8080端口 VNC远程桌面部署yaml(目前已在实验室K8s集群中部署,仅作备份): ```yaml apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: microservice name: microservice spec: progressDeadlineSeconds: 2147483647 replicas: 1 revisionHistoryLimit: 2147483647 selector: matchLabels: app: microservice strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: creationTimestamp: null labels: app: microservice spec: containers: - image: jisuozhao/ubuntu_vnc:microservice_admin imagePullPolicy: IfNotPresent name: microservice ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: {} ``` ### 具体步骤 在VNC远程桌面(已部署到实验室K8s集群中,访问网址:http://vnc.ingress.isa.buaanlsde.cn/)中启动终端,然后启动Prometheus等依赖,效果如下: ![1609242605228](frontend/assets/1609242605228.png) 启动Prometheus ```shell sudo prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --storage.tsdb.retention=720d ``` 启动Grafana ```shell cd /root/Downloads/grafana-6.6.2 ./bin/grafana-server web ``` 启动codeserver(VS code网页版) ```shell code-server --auth none ``` 在Rancher中配置负载均衡(通过网址访问Prometheus等组件) ![1609242761097](frontend/assets/1609242761097.png) 通过网址访问Prometheus,查看是否启动成功 ![1609242800555](frontend/assets/1609242800555.png) ## 开发指南 ### 框架 前端: - 前端框架:Vue https://cn.vuejs.org/v2/guide/index.html - 前端具体模板:vue-element-admin https://panjiachen.gitee.io/vue-element-admin-site/zh/guide/ - 绘图组件:echarts http://echarts.apache.org/zh/index.html 后端: - 后端框架:Flask https://dormousehole.readthedocs.io/en/latest/quickstart.html#quickstart - 存储组件:sqlitedict https://github.com/RaRe-Technologies/sqlitedict - 连接Kubernetes:Python包kubernetes https://github.com/kubernetes-client/python - 连接Prometheus:Python包prometheus-api-client https://github.com/AICoE/prometheus-api-client-python ### 文件结构 (建议参考vue-element-admin模板文档:https://panjiachen.gitee.io/vue-element-admin-site/zh/guide/) 主要文件和目录: - microservice_admin/server.py 后端代码 - microservice_admin/src/router/index.js 页面路由 - microservice_admin/src/views/ 各个页面源文件 - microservice_admin/src/utils/global_function.js 前端全局函数 ### 时间关系暂未完成的功能 - 指标因果关系动态训练(目前是模型训练好的静态数据) - Prometheus配置文件修改(前端√ 后端×) - 目前仅阈值检测有效,其他异常检测模型暂未接入 - 报警功能目前完成前端界面,后端功能暂未实现 - 全局设置(前端√ 后端×)