登录
注册
开源
企业版
高校版
搜索
帮助中心
使用条款
关于我们
开源
企业版
高校版
私有云
模力方舟
AI 队友
登录
注册
轻量养虾,开箱即用!低 Token + 稳定算力,Gitee & 模力方舟联合出品的 PocketClaw 正式开售!点击了解详情~
代码拉取完成,页面将自动刷新
仓库状态说明
捐赠
捐赠前请先登录
取消
前往登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
Watch
不关注
关注所有动态
仅关注版本发行动态
关注但不提醒动态
68
Star
259
Fork
191
Ascend
/
modelzoo
暂停
代码
Issues
157
Pull Requests
9
Wiki
统计
流水线
服务
JavaDoc
PHPDoc
质量分析
Jenkins for Gitee
腾讯云托管
腾讯云 Serverless
悬镜安全
阿里云 SAE
Codeblitz
SBOM
开发画像分析
我知道了,不再自动展开
更新失败,请稍后重试!
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
[南航][Flownet2][pycharm调试modelarts-NPU时报错 GeOp7_0GEOP::::DoRunAsync Failed]
DONE
#I3U3AT
Bug-Report
xyt24
创建于
2021-06-02 21:24
[南航][Flownet2][pycharm调试modelarts-NPU时报错 GeOp7_0GEOP::::DoRunAsync Failed] 一、问题现象: pycharm调试modelarts-NPU时报错 (https://images.gitee.com/uploads/images/2021/0602/210129_a8d0d243_9131327.png) [ERROR] TEFUSION(153,python):2021-06-02-18:50:58.279.546 ===========END OF DETAILED OP INFO OF NODE gradients/FlowNet2/fuse_conv0/Conv2D_grad/Conv2DBackpropFilter============= [ERROR] FE(153,python):2021-06-02-18:50:58.281.649 [../../../../../../fusion_engine/optimizer/adapter/tbe_adapter/tbe_op_store_adapter.cc:581]1430 ParallelCompileOp:"Thread[281439610335712] recompile single op[gradients/FlowNet2/fuse_conv1/Conv2D_grad/Conv2DBackpropFilter] failed" [ERROR] FE(153,python):2021-06-02-18:50:58.281.669 [../../../../../../fusion_engine/optimizer/adapter/tbe_adapter/tbe_op_store_adapter.cc:581]1430 ParallelCompileOp:"Thread[281439610335712] recompile single op[gradients/FlowNet2/fuse_conv1_1/Conv2D_grad/Conv2DBackpropFilter] failed" [ERROR] FE(153,python):2021-06-02-18:50:58.281.676 [../../../../../../fusion_engine/optimizer/adapter/tbe_adapter/tbe_op_store_adapter.cc:581]1430 ParallelCompileOp:"Thread[281439610335712] recompile single op[gradients/FlowNet2/fuse_conv0/Conv2D_grad/Conv2DBackpropFilter] failed" [ERROR] FE(153,python):2021-06-02-18:50:58.281.758 [../../../../../../fusion_engine/optimizer/graph_optimizer/op_compiler/op_compiler.cc:532]1430 CompileOp:"CompileOp failed, graph[partition0_rank74_new_sub_graph341]." [ERROR] FE(153,python):2021-06-02-18:50:58.281.829 [../../../../../../fusion_engine/optimizer/graph_optimizer/fe_graph_optimizer.cc:672]1430 OptimizeFusedCompileOpAndCalcTensorSize:"CompileOp failed, graph name = partition0_rank74_new_sub_graph341." [ERROR] FE(153,python):2021-06-02-18:50:58.281.834 [../../../../../../fusion_engine/optimizer/graph_optimizer/fe_graph_optimizer.cc:844]1430 OptimizeFusedGraph:"Failed to do optimize fused graph after UB match for graph[partition0_rank74_new_sub_graph341]." [ERROR] GE(153,python):2021-06-02-18:50:58.282.063 [../../../../../../graphengine/ge/graph/optimize/graph_optimize.cc:118]1430 OptimizeSubGraph: ErrorNo: -1(failed) [OptimizeSubGraph][OptimizeFusedGraph]: graph optimize failed, ret:-1 [ERROR] GE(153,python):2021-06-02-18:50:58.282.078 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:2519]1430 ProcessSubGraphWithMultiThreads: ErrorNo: -1(failed) SubGraph optimize Failed AIcoreEngine [ERROR] GE(153,python):2021-06-02-18:50:58.282.661 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:674]763 SetSubgraph: ErrorNo: -1(failed) Multiply optimize subgraph failed [ERROR] GE(153,python):2021-06-02-18:50:58.282.712 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:3014]763 OptimizeSubgraph: ErrorNo: -1(failed) Graph set subgraph Failed [EVENT] GE(153,python):2021-06-02-18:50:58.282.745 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:723]763 PreRunOptimizeSubGraph:[GEPERFTRACE] The time cost of GraphManager::OptimizeSubgraph is [84278749] micro second. [ERROR] GE(153,python):2021-06-02-18:50:58.282.752 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:723]763 PreRunOptimizeSubGraph: ErrorNo: -1(failed) Failed to process GraphManager_OptimizeSubgraph [ERROR] GE(153,python):2021-06-02-18:50:58.282.812 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:816]763 PreRun: ErrorNo: -1(failed) Run PreRunOptimizeSubGraph failed for graph:ge_default_20210602184920. [ERROR] GE(153,python):2021-06-02-18:50:58.305.897 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:2879]763 ReturnError: ErrorNo: -1(failed) PreRun Failed, thread exit... [EVENT] TDT(153,python):2021-06-02-18:50:58.617.156 [../../../../../../tdt/common/src/log.cpp:148]"HostSendPool: {blockSize: 3072K, totalNum: 4, freeNum: 4}" "HostRecvPool: {blockSize: 3072K, totalNum: 1, freeNum: 1}" "DeviceRecvPool: " "HostCtrlPool: {SendPool: 4, FreePool: 4}, {RecvPool: 1, FreePool: 1}",[../../../../../../tdt/common/src/memory_pool.cpp:689:GetHostPoolStatus]747 [EVENT] TDT(153,python):2021-06-02-18:50:59.617.311 [../../../../../../tdt/common/src/log.cpp:148]"HostSendPool: {blockSize: 3072K, totalNum: 4, freeNum: 4}" "HostRecvPool: {blockSize: 3072K, totalNum: 1, freeNum: 1}" "DeviceRecvPool: " "HostCtrlPool: {SendPool: 4, FreePool: 4}, {RecvPool: 1, FreePool: 1}",[../../../../../../tdt/common/src/memory_pool.cpp:689:GetHostPoolStatus]747 [EVENT] TDT(153,python):2021-06-02-18:51:00.617.455 [../../../../../../tdt/common/src/log.cpp:148]"HostSendPool: {blockSize: 3072K, totalNum: 4, freeNum: 4}" "HostRecvPool: {blockSize: 3072K, totalNum: 1, freeNum: 1}" "DeviceRecvPool: " "HostCtrlPool: {SendPool: 4, FreePool: 4}, {RecvPool: 1, FreePool: 1}",[../../../../../../tdt/common/src/memory_pool.cpp:689:GetHostPoolStatus]747 [ERROR] FMK(153,python):2021-06-02-18:51:01.306.100 [tf_adapter/common/adp_logger.cc:32][TF_ADAPTER] [tf_adapter/kernels/geop_npu.cc:701] GeOp7_0GEOP::::DoRunAsync Failed 2021-06-02 18:51:01.306160: F tf_adapter/kernels/geop_npu.cc:702] GeOp7_0GEOP::::DoRunAsync Failed 初步分析: 类似问题issue: [Link description](https://gitee.com/ascend/modelzoo/issues/I278GS?from=project-issue) https://gitee.com/ascend/modelzoo/issues/I278GS?from=project-issue 光流部分使用了相同的warp和correlation算子,应该是类似问题。 二、软件版本: pycharm toolkit: frequently-used Ascend-Powered-Engine TF1.15-python3.7-aarch64 三、日志信息 Atlas服务器: 完整日志: [Link description](https://pan.baidu.com/s/1zFDDsczUfruv1XgGyrMhjQ) https://pan.baidu.com/s/1zFDDsczUfruv1XgGyrMhjQ 提取码: idhw ModelArts: 提供 jobid deviceid ({ "pod_name": "job9fb41a83-job-ma-flownet2-npu-2021-0", "server_id": "192.168.0.5", "devices": [ { "device_id": "2", "device_ip": "192.3.163.238" })
[南航][Flownet2][pycharm调试modelarts-NPU时报错 GeOp7_0GEOP::::DoRunAsync Failed] 一、问题现象: pycharm调试modelarts-NPU时报错 (https://images.gitee.com/uploads/images/2021/0602/210129_a8d0d243_9131327.png) [ERROR] TEFUSION(153,python):2021-06-02-18:50:58.279.546 ===========END OF DETAILED OP INFO OF NODE gradients/FlowNet2/fuse_conv0/Conv2D_grad/Conv2DBackpropFilter============= [ERROR] FE(153,python):2021-06-02-18:50:58.281.649 [../../../../../../fusion_engine/optimizer/adapter/tbe_adapter/tbe_op_store_adapter.cc:581]1430 ParallelCompileOp:"Thread[281439610335712] recompile single op[gradients/FlowNet2/fuse_conv1/Conv2D_grad/Conv2DBackpropFilter] failed" [ERROR] FE(153,python):2021-06-02-18:50:58.281.669 [../../../../../../fusion_engine/optimizer/adapter/tbe_adapter/tbe_op_store_adapter.cc:581]1430 ParallelCompileOp:"Thread[281439610335712] recompile single op[gradients/FlowNet2/fuse_conv1_1/Conv2D_grad/Conv2DBackpropFilter] failed" [ERROR] FE(153,python):2021-06-02-18:50:58.281.676 [../../../../../../fusion_engine/optimizer/adapter/tbe_adapter/tbe_op_store_adapter.cc:581]1430 ParallelCompileOp:"Thread[281439610335712] recompile single op[gradients/FlowNet2/fuse_conv0/Conv2D_grad/Conv2DBackpropFilter] failed" [ERROR] FE(153,python):2021-06-02-18:50:58.281.758 [../../../../../../fusion_engine/optimizer/graph_optimizer/op_compiler/op_compiler.cc:532]1430 CompileOp:"CompileOp failed, graph[partition0_rank74_new_sub_graph341]." [ERROR] FE(153,python):2021-06-02-18:50:58.281.829 [../../../../../../fusion_engine/optimizer/graph_optimizer/fe_graph_optimizer.cc:672]1430 OptimizeFusedCompileOpAndCalcTensorSize:"CompileOp failed, graph name = partition0_rank74_new_sub_graph341." [ERROR] FE(153,python):2021-06-02-18:50:58.281.834 [../../../../../../fusion_engine/optimizer/graph_optimizer/fe_graph_optimizer.cc:844]1430 OptimizeFusedGraph:"Failed to do optimize fused graph after UB match for graph[partition0_rank74_new_sub_graph341]." [ERROR] GE(153,python):2021-06-02-18:50:58.282.063 [../../../../../../graphengine/ge/graph/optimize/graph_optimize.cc:118]1430 OptimizeSubGraph: ErrorNo: -1(failed) [OptimizeSubGraph][OptimizeFusedGraph]: graph optimize failed, ret:-1 [ERROR] GE(153,python):2021-06-02-18:50:58.282.078 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:2519]1430 ProcessSubGraphWithMultiThreads: ErrorNo: -1(failed) SubGraph optimize Failed AIcoreEngine [ERROR] GE(153,python):2021-06-02-18:50:58.282.661 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:674]763 SetSubgraph: ErrorNo: -1(failed) Multiply optimize subgraph failed [ERROR] GE(153,python):2021-06-02-18:50:58.282.712 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:3014]763 OptimizeSubgraph: ErrorNo: -1(failed) Graph set subgraph Failed [EVENT] GE(153,python):2021-06-02-18:50:58.282.745 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:723]763 PreRunOptimizeSubGraph:[GEPERFTRACE] The time cost of GraphManager::OptimizeSubgraph is [84278749] micro second. [ERROR] GE(153,python):2021-06-02-18:50:58.282.752 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:723]763 PreRunOptimizeSubGraph: ErrorNo: -1(failed) Failed to process GraphManager_OptimizeSubgraph [ERROR] GE(153,python):2021-06-02-18:50:58.282.812 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:816]763 PreRun: ErrorNo: -1(failed) Run PreRunOptimizeSubGraph failed for graph:ge_default_20210602184920. [ERROR] GE(153,python):2021-06-02-18:50:58.305.897 [../../../../../../graphengine/ge/graph/manager/graph_manager.cc:2879]763 ReturnError: ErrorNo: -1(failed) PreRun Failed, thread exit... [EVENT] TDT(153,python):2021-06-02-18:50:58.617.156 [../../../../../../tdt/common/src/log.cpp:148]"HostSendPool: {blockSize: 3072K, totalNum: 4, freeNum: 4}" "HostRecvPool: {blockSize: 3072K, totalNum: 1, freeNum: 1}" "DeviceRecvPool: " "HostCtrlPool: {SendPool: 4, FreePool: 4}, {RecvPool: 1, FreePool: 1}",[../../../../../../tdt/common/src/memory_pool.cpp:689:GetHostPoolStatus]747 [EVENT] TDT(153,python):2021-06-02-18:50:59.617.311 [../../../../../../tdt/common/src/log.cpp:148]"HostSendPool: {blockSize: 3072K, totalNum: 4, freeNum: 4}" "HostRecvPool: {blockSize: 3072K, totalNum: 1, freeNum: 1}" "DeviceRecvPool: " "HostCtrlPool: {SendPool: 4, FreePool: 4}, {RecvPool: 1, FreePool: 1}",[../../../../../../tdt/common/src/memory_pool.cpp:689:GetHostPoolStatus]747 [EVENT] TDT(153,python):2021-06-02-18:51:00.617.455 [../../../../../../tdt/common/src/log.cpp:148]"HostSendPool: {blockSize: 3072K, totalNum: 4, freeNum: 4}" "HostRecvPool: {blockSize: 3072K, totalNum: 1, freeNum: 1}" "DeviceRecvPool: " "HostCtrlPool: {SendPool: 4, FreePool: 4}, {RecvPool: 1, FreePool: 1}",[../../../../../../tdt/common/src/memory_pool.cpp:689:GetHostPoolStatus]747 [ERROR] FMK(153,python):2021-06-02-18:51:01.306.100 [tf_adapter/common/adp_logger.cc:32][TF_ADAPTER] [tf_adapter/kernels/geop_npu.cc:701] GeOp7_0GEOP::::DoRunAsync Failed 2021-06-02 18:51:01.306160: F tf_adapter/kernels/geop_npu.cc:702] GeOp7_0GEOP::::DoRunAsync Failed 初步分析: 类似问题issue: [Link description](https://gitee.com/ascend/modelzoo/issues/I278GS?from=project-issue) https://gitee.com/ascend/modelzoo/issues/I278GS?from=project-issue 光流部分使用了相同的warp和correlation算子,应该是类似问题。 二、软件版本: pycharm toolkit: frequently-used Ascend-Powered-Engine TF1.15-python3.7-aarch64 三、日志信息 Atlas服务器: 完整日志: [Link description](https://pan.baidu.com/s/1zFDDsczUfruv1XgGyrMhjQ) https://pan.baidu.com/s/1zFDDsczUfruv1XgGyrMhjQ 提取码: idhw ModelArts: 提供 jobid deviceid ({ "pod_name": "job9fb41a83-job-ma-flownet2-npu-2021-0", "server_id": "192.168.0.5", "devices": [ { "device_id": "2", "device_ip": "192.3.163.238" })
评论 (
13
)
登录
后才可以发表评论
状态
DONE
TODO
Analysing
ACCEPTED
WIP
Feedback
TEST
DONE
REJECTED
负责人
未设置
张韦全
zhang_weiquan
负责人
协作者
+负责人
+协作者
标签
未设置
项目
未立项任务
未立项任务
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
分支 (
-
)
标签 (
-
)
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
预计工期
(小时)
参与者(5)
1
https://gitee.com/ascend/modelzoo.git
git@gitee.com:ascend/modelzoo.git
ascend
modelzoo
modelzoo
点此查找更多帮助
搜索帮助
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
评论
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册