diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/README.md b/plugins/tensorboard-plugins/tb_graph_ascend/README.md index 7cd935265554d493dc47472811cebbf1619c9a83..d46f0d598a37bf313d3b8da5176a78aed4710000 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/README.md +++ b/plugins/tensorboard-plugins/tb_graph_ascend/README.md @@ -63,92 +63,174 @@ ### 3. 解析数据说明 -将通过[msprobe](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe#10-%E5%88%86%E7%BA%A7%E5%8F%AF%E8%A7%86%E5%8C%96%E6%9E%84%E5%9B%BE%E6%AF%94%E5%AF%B9)工具构图功能采集得到的文件后缀为.vis 的模型结构文件(文件本身为 json 格式)放置于某个文件夹中,路径名称下文称之为 `output_path` +将通过[msprobe](https://gitee.com/ascend/mstt/tree/master/debug/accuracy_tools/msprobe#10-%E5%88%86%E7%BA%A7%E5%8F%AF%E8%A7%86%E5%8C%96%E6%9E%84%E5%9B%BE%E6%AF%94%E5%AF%B9)工具构图功能采集得到的文件后缀为.vis.db 的模型结构文件置于某个文件夹中,路径名称下文称之为 `output_path` -- E.g. \ - `---output_path` \ - `-----output.vis` \ - `-----output2.vis` +图构建: -### 4. 启动方式 +``` +├── output_path +| ├── build_{timestamp}.vis.db +``` -1. 启动 TensorBoard +图比对: - ``` - tensorboard --logdir output_path - ``` +``` +├── output_path +| ├── compare_{timestamp}.vis.db +``` - 注意:确保默认端口 6006 可连通。 +## 4.启动 tensorboard - 如果需要切换端口号需要在尾部加上指定的端口号,如`--port=6007` +### 4.1 可直连的服务器 - ``` - tensorboard --logdir output_path --port=6007 - ``` +将生成 vis 文件的路径**out_path**传入--logdir + +``` +tensorboard --logdir out_path --bind_all --port [可选,端口号] +``` + +启动后会打印日志: + +![tensorboard_1](./doc/images/tensorboard_1.png) + +ubuntu 是机器地址,6008 是端口号。 + +**注意,ubuntu 需要替换为真实的服务器地址,例如真实的服务器地址为 10.123.456.78,则需要在浏览器窗口输入http://10.123.456.78:6008** + +### 4.2 不可直连的服务器 + +**如果链接打不开(服务器无法直连需要挂 vpn 才能连接等场景),可以尝试以下方法,选择其一即可:** + +1.本地电脑网络手动设置代理,例如 Windows10 系统,在【手动设置代理】中添加服务器地址(例如 10.123.456.78) + +![proxy](./doc/images/proxy.png) + +然后,在服务器中输入: + +``` +tensorboard --logdir out_path --bind_all --port 6008[可选,端口号] +``` + +最后,在浏览器窗口输入http://10.123.456.78:6008 + +**注意,如果当前服务器开启了防火墙,则此方法无效,需要关闭防火墙,或者尝试后续方法** + +2.或者使用 vscode 连接服务器,在 vscode 终端输入: + +``` +tensorboard --logdir out_path +``` + +![tensorboard_2](./doc/images/tensorboard_2.png) + +按住 CTRL 点击链接即可 + +3.或者将构图结果件 vis 文件从服务器传输至本地电脑,在本地电脑中安装 tb_graph_ascend 插件查看构图结果 + +电脑终端输入: + +``` +tensorboard --logdir out_path +``` -2. 在浏览器上打开 tensorboard +按住 CTRL 点击链接即可 - 在浏览器中打开 URL: `http://localhost:6006`。 +## 5.浏览器查看 - 注意:如果`--logdir` 指定目录下的文件太大或太多,请等候,刷新浏览器查看加载结果。 +### 5.1 浏览器打开图 -3. 建议在本地启动 tensorboard,如果网络浏览器与启动 TensorBoard 的机器不在同一台机器上,需要远程启动,可参照[远程启动方式](#413-远程查看数据),但需用户自行评估**安全风险**。 +推荐使用谷歌浏览器,在浏览器中输入机器地址+端口号回车,出现 TensorBoard 页面,其中/#graph_ascend 会自动拼接。 -## 三、浏览器查看 +![vis_browser_1](./doc/images/vis_browser_1.png) -**注意:本工具不支持同时通过多个浏览器窗口同时访问同一个 TensorBoard 服务,否则会出现页面无法正常显示的情况。** +如果您切换了 TensorBoard 的其他功能,此时想回到模型分级可视化页面,可以点击左上方的**GRAPH_ASCEND** -### 3.1 主界面 +![vis_browser_2](./doc/images/vis_browser_2.png) -![输入图片说明](./doc/images/main-page.png) +### 5.2 查看图 -### 3.2 操作方式: +![vis_show_info.png](./doc/images/vis_show_info.png) -- **节点双击打开,单击选中。** -- **选中的节点边框呈现蓝色,比对场景下若其存在对应节点,则对应节点边框为浅蓝色。** -- **键盘 WS 根据鼠标位置放大缩小,AD 左右移动。** -- **鼠标滚轮上下移动,鼠标可拖动页面。** -- **比对场景鼠标右键可选中节点,并可展开至对应侧的节点并选中。** +MicroStep 是指在一次完整的权重更新前执行的多次前向和反向传播过程,一次完整的训练迭代(step)可以进一步细分为多个更小的步骤(micro step)。其中分级可视化工具通过识别模型首层结构中一次完整的前反向作为一次 micro step。 -![输入图片说明](./doc/images/operator-image.png) +### 5.3 名称搜索 -### 3.3 名称搜索 +![vis_search_info.png](./doc/images/vis_search_info.png) -![输入图片说明](./doc/images/vis_search_info.png) +### 5.4 精度筛选 -### 3.4 精度筛选/溢出筛选 +![vis_precision_info.png](./doc/images/vis_precision_info.png) -注意:单图场景不存在精度筛选和溢出筛选,下图为双图比对场景。
+### 5.5 未匹配节点筛选 -![输入图片说明](./doc/images/vis_precision_info.png) +不符合匹配规则的节点为无匹配节点,颜色标灰。适用于排查两个模型结构差异的场景。 -### 3.5 未匹配节点筛选 +![vis_unmatch_info.png](./doc/images/vis_unmatch_info.png) -参考匹配说明 ,不符合匹配规则的节点为无匹配节点,颜色标灰。适用于排查两个模型结构差异的场景。
+### 5.6 手动选择节点匹配 -![输入图片说明](./doc/images/vis_unmatch_info.png) +可通过浏览器界面,通过鼠标选择两个待匹配的灰色节点进行匹配。当前暂不支持真实数据模式。 -### 3.6 手动选择节点匹配 +![vis_match_info.png](./doc/images/vis_match_info.png) -可通过浏览器界面,通过鼠标选择两个待匹配的灰色节点进行匹配。当前暂不支持真实数据模式。
-如果选中"操作选中节点及其子节点":
-点击匹配后会将两个节点及其子节点按照 Module 名称依次匹配,取消匹配后会将子节点的匹配关系清除。
-否则:
-点击匹配后只会将两个节点进行匹配,取消匹配后会将节点的匹配关系清除 -注意:匹配结束之后,需要点击保存才能持久化到源文件里面 +## 6.图比对说明 -![输入图片说明](./doc/images/vis_match_info.png) +### 6.1 颜色 -### 3.7 生成匹配配置文件 +颜色越深,精度比对差异越大,越可疑,具体信息可见浏览器页面左下角颜色图例。 -可保存已经已匹配节点的匹配关系到配置文件中,并支持读取配置文件中的数据,进行匹配操作。
-默认保存在当前目录下,文件名为`[当前文件名].vis.config`,每次切换文件都会扫描当前录下的后缀名为.vis.config 配置文件,并更新配置文件列表。 -注意:匹配结束之后,需要点击保存才能持久化到源文件里面 -![输入图片说明](./doc/images/vis_save_match_info.png) +#### 6.1.1 真实数据模式 -### 3.8 支持用户自定义精度指标配置 +节点中所有输入的最小双千指标和所有输出的最小双千分之一指标的差值,反映了双千指标的下降情况,**该数值越大,表明两组模型的精度差异越大,在图中标注的对应颜色会更深**。 -![输入图片说明](./doc/images/vis_update_precision.png) +`One Thousandth Err Ratio(双千分之一)精度指标:Tensor中的元素逐个与对应的标杆数据对比,相对误差小于千分之一的比例占总元素个数的比例,比例越接近1越好` + +如果调试侧(NPU)节点的 output 指标中的最大值(MAX)或最小值(MIN)中存在 nan/inf/-inf,直接标记为最深颜色。 + +#### 6.1.2 统计信息模式 + +节点中输出的统计量相对误差,**该数值越大,表明两组模型的精度差异越大,在图中标注的对应颜色会更深**。 + +`相对误差:abs((npu统计值 - bench统计值) / bench统计值)` + +如果调试侧(NPU)节点的 output 指标中的最大值(MAX)或最小值(MIN)中存在 nan/inf/-inf,直接标记为最深颜色。 + +#### 6.1.3 md5 模式 + +节点中任意输入输出的 md5 值不同。 + +### 6.2 指标说明 + +精度比对从三个层面评估 API 的精度,依次是:真实数据模式、统计数据模式和 MD5 模式。比对结果分别有不同的指标。 + +**公共指标**: + +- name: 参数名称,例如 input.0 +- type: 类型,例如 torch.Tensor +- dtype: 数据类型,例如 torch.float32 +- shape: 张量形状,例如[32, 1, 32] +- Max: 最大值 +- Min: 最小值 +- Mean: 平均值 +- Norm: L2-范数 + +**真实数据模式指标**: + +- Cosine: tensor 余弦相似度 +- EucDist: tensor 欧式距离 +- MaxAbsErr: tensor 最大绝对误差 +- MaxRelativeErr: tensor 最大相对误差 +- One Thousandth Err Ratio: tensor 相对误差小于千分之一的比例(双千分之一) +- Five Thousandth Err Ratio: tensor 相对误差小于千分之五的比例(双千分之五) + +**统计数据模式指标** + +- (Max, Min, Mean, Norm) diff: 统计量绝对误差 +- (Max, Min, Mean, Norm) RelativeErr: 统计量相对误差 + +**MD5 模式指标** + +- md5: CRC-32 值 ## 四、附录 @@ -172,11 +254,13 @@ - 在启动指令尾部加上`--bind_all`或`--host={服务器IP}`参数启用远程查看方式,如: - ``` - tensorboard --logdir output_path --port=6006 --host=xxx.xxx.xxx.xxx - 或 - tensorboard --logdir output_path --port=6006 --bind_all - ``` +``` + +tensorboard --logdir output_path --port=6006 --host=xxx.xxx.xxx.xxx +或 +tensorboard --logdir output_path --port=6006 --bind_all + +``` - 在打开浏览器访问界面时,需将 URL 内主机名由`localhost`替换为主机的 ip 地址,如`http://xxx.xxx.xxx.xxx:6006` @@ -189,3 +273,5 @@ ### 4.3 公网地址说明 [公网地址说明](./doc/公网地址说明.csv) + + diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/main-page.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/main-page.png deleted file mode 100644 index b8e2a6dbcc5f55f3369406148dfc378890ccdc73..0000000000000000000000000000000000000000 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/main-page.png and /dev/null differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/operator-image.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/operator-image.png deleted file mode 100644 index b4463c05dc0e6a379d68592ec4129bd397ae0dd6..0000000000000000000000000000000000000000 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/operator-image.png and /dev/null differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/proxy.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/proxy.png new file mode 100644 index 0000000000000000000000000000000000000000..3033214904ca3a8a1f50f187a382c47c23f05786 Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/proxy.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/tensorboard_1.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/tensorboard_1.png new file mode 100644 index 0000000000000000000000000000000000000000..e99ff9ea47f0e3eb25ec324640589248398a7f5a Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/tensorboard_1.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/tensorboard_2.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/tensorboard_2.png new file mode 100644 index 0000000000000000000000000000000000000000..ed8b024a4b811bb24e9ae23f1b0ca8d04e229992 Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/tensorboard_2.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_browser_1.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_browser_1.png new file mode 100644 index 0000000000000000000000000000000000000000..ff10a9cd742a42a0133481bf20f83ff95ddf8a49 Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_browser_1.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_browser_2.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_browser_2.png new file mode 100644 index 0000000000000000000000000000000000000000..1cc076fc66bf6fde67b92cd5f619bca74c4b840a Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_browser_2.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_match_info.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_match_info.png index 5785c81a2f308a4fb87db1c8528262e3b1821932..adc470d4bfe8b706b1b05ccb331246930bcfabb4 100644 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_match_info.png and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_match_info.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_precision_info.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_precision_info.png index 79c6ff77f4fffedfcbaee47767d3f8a4f1b0d5b3..33497c110f49bcd09f0caa0e3b632ae973620c61 100644 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_precision_info.png and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_precision_info.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_save_match_info.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_save_match_info.png deleted file mode 100644 index 3670076eb8dc4cc315cfc89acec3d1d8d739ed6e..0000000000000000000000000000000000000000 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_save_match_info.png and /dev/null differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_search_info.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_search_info.png index 7c51a804862591005725e1c2e1da0ff0ac152df1..13dba9b83b6c3b58ab67265827ef32bf5fb1822b 100644 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_search_info.png and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_search_info.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_show_info.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_show_info.png new file mode 100644 index 0000000000000000000000000000000000000000..c08d59be266f96e2b473ee1cb7141120e0ee3aa4 Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_show_info.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_showcase.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_showcase.png new file mode 100644 index 0000000000000000000000000000000000000000..fe71f73ad01410caaa47333723ce040ccc2d88dc Binary files /dev/null and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_showcase.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_unmatch_info.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_unmatch_info.png index 5f698d109543e6171e2df28bafa83a09d3dd351d..6fb0c74a2a09986e22503dcc822c7b90506eb159 100644 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_unmatch_info.png and b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_unmatch_info.png differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_update_precision.png b/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_update_precision.png deleted file mode 100644 index b764fc983c0178e6f2f1d77807a6a4635a7dbd9e..0000000000000000000000000000000000000000 Binary files a/plugins/tensorboard-plugins/tb_graph_ascend/doc/images/vis_update_precision.png and /dev/null differ diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_db.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_db.py index 416af2ac82b12b95ca942bf52c1c612d6b50cebc..65e6b3c7eac12f911117b80838bc2d919694622d 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_db.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_db.py @@ -212,7 +212,7 @@ class DbGraphService(GraphServiceStrategy): return {'success': True, 'data': result} except Exception as e: logger.error('get node info failed:' + str(e)) - return {'success': False, 'error': '获取节点信息失败:' + str(e), 'data': None} + return {'success': False, 'error': '获取节点信息失败', 'data': None} def add_match_nodes(self, npu_node_name, bench_node_name, meta_data, is_match_children): try: @@ -233,50 +233,23 @@ class DbGraphService(GraphServiceStrategy): graph_data = self.repo.query_matched_nodes_info(npu_node_name, bench_node_name, rank, step) match_result = MatchNodesController.process_task_add(graph_data, npu_node_name, bench_node_name, task) - # 处理匹配结果 - update_data = [] - for item in match_result: - if item.get('success') is True: - nodes = item.get('data', []) - for node in nodes: - update_data.append(node) - if len(update_data) > 0: - # DB:更新数据库节点信息 - update_db_res = self.repo.update_nodes_info(update_data, rank, step) - if not update_db_res: - return {'success': False, 'error': '更新数据库失败(Update database failed) '} - # 视图:调用更新update_hirarchy方法,同步更新图 - LayoutHierarchyModel.update_current_hierarchy_data(update_data) - # 返回:返回更新后的节点信息 - config_data = GraphState.get_global_value("config_data") - result = { - 'success': True, - 'data': { - 'npuMatchNodes': config_data.get('npuMatchNodes', {}), - 'benchMatchNodes': config_data.get('benchMatchNodes', {}), - 'npuUnMatchNodes': config_data.get('npuUnMatchNodes', []), - 'benchUnMatchNodes': config_data.get('benchUnMatchNodes', []) - } - } - else: - result = {'success': False, 'error': '选择的节点不可匹配(Selected nodes do not match) '} - return result + return self._generate_matched_result(match_result, rank, step) else: return {'success': False, 'error': '任务类型不支持(Task type not supported)'} except Exception as e: logger.error(str(e)) - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '匹配节点失败', 'data': None} def add_match_nodes_by_config(self, config_file_name, meta_data): + if not self.conn: + return {'success': False, 'error': 'database connection not init'} + result = {} + rank = meta_data.get('rank') + step = meta_data.get('step') + if rank is None or step is None: + return {'success': False, 'error': 'rank or step is null'} + task = self.config_info.get('task') try: - if not self.conn: - return {'success': False, 'error': 'database connection not init'} - result = {} - rank = meta_data.get('rank') - step = meta_data.get('step') - if rank is None or step is None: - return {'success': False, 'error': 'rank or step is null'} - task = self.config_info.get('task') match_node_links, error = GraphUtils.safe_load_data(meta_data.get('run'), config_file_name) graph_data = self.repo.query_matched_nodes_info_by_config(match_node_links, rank, step) if error: @@ -285,37 +258,15 @@ class DbGraphService(GraphServiceStrategy): if task == 'md5' or task == 'summary': match_result = MatchNodesController.process_task_add_child_layer_by_config(graph_data, match_node_links, task) - update_data = [] - for item in match_result: - if item.get('success') is True: - nodes = item.get('data', []) - for node in nodes: - update_data.append(node) - - if len(update_data) > 0: - update_db_res = self.repo.update_nodes_info(update_data, rank, step) - if not update_db_res: - return {'success': False, 'error': '更新数据库失败(Update database failed) '} - # 视图:调用更新update_hirarchy方法,同步更新图 - LayoutHierarchyModel.update_current_hierarchy_data(update_data) - # 返回:返回更新后的节点信息 - config_data = GraphState.get_global_value("config_data") - result['success'] = True - result['data'] = { - 'matchReslut': match_result, - 'npuMatchNodes': config_data.get('npuMatchNodes', {}), - 'benchMatchNodes': config_data.get('benchMatchNodes', {}), - 'npuUnMatchNodes': config_data.get('npuUnMatchNodes', []), - 'benchUnMatchNodes': config_data.get('benchUnMatchNodes', []) - } - else: - result = {'success': False, 'error': '选择的节点不可匹配(Selected nodes do not match) '} + result = self._generate_matched_result(match_result, rank, step) + if result.get('data'): + result['data']['matchResult'] = [item.get('success', False) for item in match_result] return result else: return {'success': False, 'error': '任务类型不支持(Task type not supported)'} except Exception as e: logger.error(str(e)) - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '匹配节点失败', 'data': None} def delete_match_nodes(self, npu_node_name, bench_node_name, meta_data, is_unmatch_children): try: @@ -338,40 +289,12 @@ class DbGraphService(GraphServiceStrategy): graph_data = self.repo.query_matched_nodes_info(npu_node_name, bench_node_name, rank, step) match_result = MatchNodesController.process_task_delete(graph_data, npu_node_name, bench_node_name, task) - # 遍历match_result,找到的success=true的节点,合并所有的data字段(数组类型),合并到update_db_data - update_data = [] - for item in match_result: - if item.get('success') is True: - nodes = item.get('data', []) - for node in nodes: - update_data.append(node) - - if len(update_data) > 0: - # DB:更新数据库节点信息 - update_db_res = self.repo.update_nodes_info(update_data, rank, step) - if not update_db_res: - return {'success': False, 'error': '更新数据库失败(Update database failed) '} - # 视图:调用更新update_hirarchy方法,同步更新图 - LayoutHierarchyModel.update_current_hierarchy_data(update_data) - # 返回:返回更新后的节点信息 - config_data = GraphState.get_global_value("config_data") - result = { - 'success': True, - 'data': { - 'npuMatchNodes': config_data.get('npuMatchNodes', {}), - 'benchMatchNodes': config_data.get('benchMatchNodes', {}), - 'npuUnMatchNodes': config_data.get('npuUnMatchNodes', []), - 'benchUnMatchNodes': config_data.get('benchUnMatchNodes', []) - } - } - else: - result = {'success': False, 'error': '未找到可匹配的节点(Matched node not found) '} - return result + return self._generate_matched_result(match_result, rank, step) else: return {'success': False, 'error': '任务类型不支持(Task type not supported) '} except Exception as e: logger.error('delete_match_nodes error: {}'.format(e)) - return {'success': False, '操作失败': str(e), 'data': None} + return {'success': False, 'error': '删除匹配节点失败', 'data': None} def update_precision_error(self, meta_data, filter_value): try: @@ -442,7 +365,7 @@ class DbGraphService(GraphServiceStrategy): return {'success': False, 'error': '未找到可更新的节点(Matched node not found) '} except Exception as e: logger.error('update_precision_error error: {}'.format(e)) - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '更新节点精度失败(Update node precision failed)', 'data': None} def update_colors(self, colors): try: @@ -454,7 +377,7 @@ class DbGraphService(GraphServiceStrategy): return {'success': False, 'error': '更新数据库失败(Update database failed) '} return {'success': True} except Exception as e: - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '更新颜色失败(Update color failed)', 'data': None} def save_matched_relations(self, meta_data): try: @@ -477,5 +400,34 @@ class DbGraphService(GraphServiceStrategy): except Exception as e: logger.error('save_matched_relations error: {}'.format(e)) - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '保存匹配关系失败(Save matched relations failed)', 'data': None} + def _generate_matched_result(self, match_result, rank, step): + update_data = [] + for item in match_result: + if item.get('success') is True: + nodes = item.get('data', []) + for node in nodes: + update_data.append(node) + if len(update_data) > 0: + # DB:更新数据库节点信息 + update_db_res = self.repo.update_nodes_info(update_data, rank, step) + if not update_db_res: + return {'success': False, 'error': '更新数据库失败(Update database failed) '} + # 视图:调用更新update_hirarchy方法,同步更新图 + LayoutHierarchyModel.update_current_hierarchy_data(update_data) + # 返回:返回更新后的节点信息 + config_data = GraphState.get_global_value("config_data") + result = { + 'success': True, + 'data': { + 'npuMatchNodes': config_data.get('npuMatchNodes', {}), + 'benchMatchNodes': config_data.get('benchMatchNodes', {}), + 'npuUnMatchNodes': config_data.get('npuUnMatchNodes', []), + 'benchUnMatchNodes': config_data.get('benchUnMatchNodes', []) + } + } + else: + result = {'success': False, 'error': '选择的节点不可匹配(Selected nodes do not match) '} + return result + diff --git a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_vis.py b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_vis.py index 9c3dfa33ae91aa92d8723bff98f80a0e672f6e24..052ce7eaa68c6b692efb08978d3019120dc1616e 100644 --- a/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_vis.py +++ b/plugins/tensorboard-plugins/tb_graph_ascend/server/app/service/graph_service_vis.py @@ -69,7 +69,7 @@ class JsonGraphService(GraphServiceStrategy): json_data = GraphUtils.safe_json_loads(buffer) yield f"data: {json.dumps({'progress': 99, 'status': 'loading'})}\n\n" except json.JSONDecodeError as e: - yield f"data: {json.dumps({'progress': current_progress, 'error': str(e)})}\n\n" + yield f"data: {json.dumps({'progress': current_progress, 'error': 'Failed to parse JSON'})}\n\n" if json_data is not None: # 验证存储 GraphState.set_global_value('current_file_data', json_data) @@ -86,7 +86,7 @@ class JsonGraphService(GraphServiceStrategy): tag = self.tag graph_data, error_message = GraphUtils.get_graph_data({'run': run_name, 'tag': tag}) if error_message or not graph_data: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} config = {} try: # 读取全局信息,tag层面 @@ -123,7 +123,7 @@ class JsonGraphService(GraphServiceStrategy): micro_step = meta_data.get('microStep') graph_data, error_message = GraphUtils.get_graph_data({'run': self.run, 'tag': self.tag}) if error_message or not graph_data: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} result = {} try: if not graph_data.get(NPU): @@ -171,14 +171,14 @@ class JsonGraphService(GraphServiceStrategy): return {'success': True, 'data': result} except Exception as e: logger.error('get node list error:' + str(e)) - return {'success': False, 'error': '获取节点列表失败:' + str(e)} + return {'success': False, 'error': 'Failed to get node list'} def change_node_expand_state(self, node_info, meta_data): try: graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} if self.repo is None: return {'success': False, 'error': 'initlize graph json failed'} graph_type = node_info.get('nodeType') @@ -207,7 +207,7 @@ class JsonGraphService(GraphServiceStrategy): # 遍历所有的NPU节点,如果节点的精度值在values中,则返回该节点 graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} precision = [] is_filter_unmatch_nodes = True if '无匹配节点' in values else False @@ -240,7 +240,7 @@ class JsonGraphService(GraphServiceStrategy): # 遍历所有的NPU节点,如果节点的精度值在values中,则返回该节点 graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} overflow = [] try: # 单图 @@ -258,13 +258,13 @@ class JsonGraphService(GraphServiceStrategy): return {'success': False, 'error': '多图模式下不支持溢出检测'} except Exception as e: logger.error('search overflow node failed:' + str(e)) - return {'success': False, 'error': '获取符合溢出检测节点失败:' + str(e)} + return {'success': False, 'error': '获取符合溢出检测节点失败'} def update_precision_error(self, meta_data, filter_value): try: graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} npu_node_list = graph_data.get(NPU, {}).get('node', {}) for _, node_info in npu_node_list.items(): output_statistical_diff = node_info.get('output_data', None) @@ -296,7 +296,7 @@ class JsonGraphService(GraphServiceStrategy): return {'success': True, 'data': {}} except Exception as e: logger.error('update precision error error:' + str(e)) - return {'success': False, 'error': str(e)} + return {'success': False, 'error': '更新精度误差失败'} def update_hierarchy_data(self, graph_type): if (graph_type == NPU or graph_type == BENCH): @@ -308,7 +308,7 @@ class JsonGraphService(GraphServiceStrategy): def get_node_info(self, node_info, meta_data): graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} try: graph_type = node_info.get('nodeType') node_name = node_info.get('nodeName') @@ -330,12 +330,12 @@ class JsonGraphService(GraphServiceStrategy): return {'success': True, 'data': result} except Exception as e: logger.error('get node info error:' + str(e)) - return {'success': False, 'error': '获取节点信息失败:' + str(e), 'data': None} + return {'success': False, 'error': '获取节点信息失败', 'data': None} def add_match_nodes(self, npu_node_name, bench_node_name, meta_data, is_match_children): graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} task = graph_data.get('task') result = {} try: @@ -351,7 +351,7 @@ class JsonGraphService(GraphServiceStrategy): else: return {'success': False, 'error': '任务类型不支持(Task type not supported) '} except Exception as e: - return {'success': False, '操作失败': str(e), 'data': None} + return {'success': False, 'error': '操作失败', 'data': None} def add_match_nodes_by_config(self, config_file_name, meta_data): graph_data, error_message = GraphUtils.get_graph_data(meta_data) @@ -366,17 +366,20 @@ class JsonGraphService(GraphServiceStrategy): if task == 'md5' or task == 'summary': match_result = MatchNodesController.process_task_add_child_layer_by_config(graph_data, match_node_links, task) - return self._generate_matched_result(match_result) + result = self._generate_matched_result(match_result) + if result.get('data'): + result['data']['matchResult'] = [item.get('success', False) for item in match_result] + return result else: return {'success': False, 'error': '任务类型不支持(Task type not supported)'} except Exception as e: logger.error(str(e)) - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '操作失败', 'data': None} def delete_match_nodes(self, npu_node_name, bench_node_name, meta_data, is_unmatch_children): graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} task = graph_data.get('task') try: # 根据任务类型计算误差 @@ -392,14 +395,14 @@ class JsonGraphService(GraphServiceStrategy): else: return {'success': False, 'error': '任务类型不支持(Task type not supported) '} except Exception as e: - return {'success': False, '操作失败': str(e), 'data': None} + return {'success': False, 'error': '操作失败', 'data': None} def save_data(self, meta_data): if not meta_data: return {'success': False, 'error': '参数为空'} graph_data, error_message = GraphUtils.get_graph_data(meta_data) if error_message: - return {'success': False, 'error': error_message} + return {'success': False, 'error': 'Failed to load file'} try: _, error = GraphUtils.safe_save_data(graph_data, self.run, f"{self.tag}.vis") @@ -427,7 +430,7 @@ class JsonGraphService(GraphServiceStrategy): GraphUtils.safe_save_data(first_file_data, self.run, f"{first_run_tag}.vis") return {'success': True, 'error': None, 'data': {}} except Exception as e: - return {'success': False, 'error': str(e), 'data': None} + return {'success': False, 'error': '更新颜色失败', 'data': None} def save_matched_relations(self, meta_data): run = meta_data.get('run')