name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
函数式编程数据下沉,动态shape场景,”RuntimeError: Acl compile and execute failed, op_type_:Conv2DBackpropFilter“
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
master_20230602121730_cf1239054ef820eb634c49883d243f4be2684bae/
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
test_ms_cell_functional_programming_datasink_007
source /home/miniconda3/bin/activate ci
source ~/solution_test/env_set.source -e ascend
export DEVICE_TYPE=Ascend_Arm
export TRAIN_MODE=GRAPH_MODE
cd solution_test/cases/01frame_func/17cell_function_coding/support_data_sinking/resnet/
pytest -s test_ms_cell_functional_programming_datasink_007.py
校验日志loss是否符合预期
训练正常loss正常收敛
Traceback (most recent call last):
File "train_functional.py", line 414, in <module>
train_net(net)
File "train_functional.py", line 312, in train_net
loss = sink_process()
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/data_sink.py", line 220, in sink_process
out = real_sink_fun()
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 627, in staging_specialize
return executor(*args, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 105, in wrapper
results = fn(*arg, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 346, in __call__
output = self._graph_executor(tuple(new_inputs), phase)
RuntimeError: Acl compile and execute failed, op_type_:Conv2DBackpropFilter
----------------------------------------------------
- Ascend Error Message:
----------------------------------------------------
E50029: The op[conv2Dbp] input parameter[pads] should be [>= 0], actual the input is [[-1, -1, 0, 0, ]]
TraceBack (most recent call last):
op get pads is illegal[FUNC:VerifyConv2dbpPads][FILE:nn_calculation_ops.cc][LINE:2482]
The op[conv2Dbp] input parameter[pads] should be [>= 0], actual the input is [[-1, -1, 0, 0, ]]
padding and pads both check fail![FUNC:VerifyConv2dbpCommon][FILE:nn_calculation_ops.cc][LINE:2663]
Verifying Conv2DBackpropFilter failed.[FUNC:InferShapeAndType][FILE:infershape_pass.cc][LINE:137]
Call InferShapeAndType for node:Conv2DBackpropFilter(Conv2DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:119]
process pass InferShapePass on node:Conv2DBackpropFilter failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:571]
build graph failed, graph id:0, ret:1343242270[FUNC:BuildModel][FILE:ge_generator.cc][LINE:1492]
[Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
[Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(Please search "Ascend Error Message" at https://www.mindspore.cn for error code description)
EZ9999: Inner Error!
EZ9999 The error from device(3), serial number is 3, there is an aicore error, core id is 0, error code = 0, dump info: pc start: 0x100012408226838c, current: 0x124082268d3c, vec error info: 0xcedbf1a, mte error info: 0x63, ifu error info: 0x26d37bfe2df80, ccu error info: 0, cube error info: 0x9f, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x1240c00d1c00, errorStr: time out or trap error.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:510]
TraceBack (most recent call last):
The extend info from device(3), serial number is 3, there is aicore error, core id is 0, aicore int: 0x1, aicore error2: 0, axi clamp ctrl: 0x5, axi clamp state: 0x3f00000000000717, biu status0: 0x101e44800000000, biu status1: 0x940002092a0000, clk gate mask: 0x1000, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0, vector cube ecc 1bit error: 0, run stall: 0, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:541]
责任人 樊瑞
Please assign maintainer to check this issue.
请为此issue分配处理人。
@chensijie_Remzz
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!
MindSpore master __commit_id__ = ''[sha1]:0b0c9a12,[branch]:(HEAD->master,origin/master,origin/HEAD)''
,CANN-06.15
的测试结果
备注:测试是需要适当调整超时时间,上面截图是设置成了500次,实际执行283次
MindSpore 的版本和你发的保持一致,CANN包换成 06.15 号的
樊瑞(00488392) 2023-07-05 12:06
PASS了,不过跑的还是时间有些长
需要更换run包回归
回归时间:2023.07.11
回归版本: commit_id = '[sha1]:b47655c13,[branch]:(HEAD,origin/master,origin/HEAD,master)'
回归步骤:参考问题单复现步骤
回归结果: 用例通过
回归结论: 回归通过
INFO 2023-07-11 16:29:11 - test_ms_cell_functional_programming_datasink_007 - process_handle.py:is_process_exist:170 - No residual processes need to be cleaned.
INFO 2023-07-11 16:29:11 - test_ms_cell_functional_programming_datasink_007 - base.py:teardown:120 - The base teardown is running
======================== 1 passed in 4681.15s (1:18:01) ========================
登录 后才可以发表评论