2.3K Star 8K Fork 4.2K

GVPMindSpore / mindspore

 / 详情

[ST][MS][函数式编程]函数式编程数据下沉,动态shape场景,”RuntimeError: Acl compile and execute failed, op_type_:Conv2DBackpropFilter“

DONE
Bug-Report
创建于  
2023-06-06 15:58
name about labels
Bug Report Use this template for reporting a bug kind/bug

Describe the current behavior / 问题描述 (Mandatory / 必填)

函数式编程数据下沉,动态shape场景,”RuntimeError: Acl compile and execute failed, op_type_:Conv2DBackpropFilter“

Environment / 环境信息 (Mandatory / 必填)

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :
    master_20230602121730_cf1239054ef820eb634c49883d243f4be2684bae/
    -- Python version (e.g., Python 3.7.5) :
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:
/mode graph

Related testcase / 关联用例 (Mandatory / 必填)

test_ms_cell_functional_programming_datasink_007

Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

source /home/miniconda3/bin/activate ci
source ~/solution_test/env_set.source -e ascend
export DEVICE_TYPE=Ascend_Arm
export TRAIN_MODE=GRAPH_MODE
cd solution_test/cases/01frame_func/17cell_function_coding/support_data_sinking/resnet/
pytest -s test_ms_cell_functional_programming_datasink_007.py
校验日志loss是否符合预期

Describe the expected behavior / 预期结果 (Mandatory / 必填)

训练正常loss正常收敛

Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

Traceback (most recent call last):
  File "train_functional.py", line 414, in <module>
    train_net(net)
  File "train_functional.py", line 312, in train_net
    loss = sink_process()
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/data_sink.py", line 220, in sink_process
    out = real_sink_fun()
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 627, in staging_specialize
    return executor(*args, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 105, in wrapper
    results = fn(*arg, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 346, in __call__
    output = self._graph_executor(tuple(new_inputs), phase)
RuntimeError: Acl compile and execute failed, op_type_:Conv2DBackpropFilter

----------------------------------------------------
- Ascend Error Message:
----------------------------------------------------
E50029: The op[conv2Dbp] input parameter[pads] should be [>= 0], actual the input is [[-1, -1, 0, 0, ]]
        TraceBack (most recent call last):
        op get pads is illegal[FUNC:VerifyConv2dbpPads][FILE:nn_calculation_ops.cc][LINE:2482]
        The op[conv2Dbp] input parameter[pads] should be [>= 0], actual the input is [[-1, -1, 0, 0, ]]
        padding and pads both check fail![FUNC:VerifyConv2dbpCommon][FILE:nn_calculation_ops.cc][LINE:2663]
        Verifying Conv2DBackpropFilter failed.[FUNC:InferShapeAndType][FILE:infershape_pass.cc][LINE:137]
        Call InferShapeAndType for node:Conv2DBackpropFilter(Conv2DBackpropFilter) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:119]
        process pass InferShapePass on node:Conv2DBackpropFilter failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:571]
        build graph failed, graph id:0, ret:1343242270[FUNC:BuildModel][FILE:ge_generator.cc][LINE:1492]
        [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 1343242270[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]

(Please search "Ascend Error Message" at https://www.mindspore.cn for error code description)
EZ9999: Inner Error!
EZ9999  The error from device(3), serial number is 3, there is an aicore error, core id is 0, error code = 0, dump info: pc start: 0x100012408226838c, current: 0x124082268d3c, vec error info: 0xcedbf1a, mte error info: 0x63, ifu error info: 0x26d37bfe2df80, ccu error info: 0, cube error info: 0x9f, biu error info: 0, aic error mask: 0x65000200d000288, para base: 0x1240c00d1c00, errorStr: time out or trap error.[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:510]
        TraceBack (most recent call last):
        The extend info from device(3), serial number is 3, there is aicore error, core id is 0, aicore int: 0x1, aicore error2: 0, axi clamp ctrl: 0x5, axi clamp state: 0x3f00000000000717, biu status0: 0x101e44800000000, biu status1: 0x940002092a0000, clk gate mask: 0x1000, dbg addr: 0, ecc en: 0, mte ccu ecc 1bit error: 0, vector cube ecc 1bit error: 0, run stall: 0, dbg data0: 0, dbg data1: 0, dbg data2: 0, dbg data3: 0, dfx data: 0[FUNC:PrintCoreErrorInfo][FILE:device_error_proc.cc][LINE:541]

Special notes for this issue/备注 (Optional / 选填)

责任人 樊瑞

评论 (6)

chensijie_Remzz 创建了Bug-Report
chensijie_Remzz 添加了
 
kind/bug
标签
chensijie_Remzz 添加了
 
attr/function
标签
chensijie_Remzz 添加了
 
v2.1.0
标签
chensijie_Remzz 添加了
 
stage/func-debug
标签
chensijie_Remzz 添加了
 
sig/ds
标签
chensijie_Remzz 添加协作者fary86
展开全部操作日志

Please assign maintainer to check this issue.
请为此issue分配处理人。
@chensijie_Remzz

Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!

duanjiali 添加协作者duanjiali
duanjiali 负责人duanjiali 修改为fary86
duanjiali 取消协作者fary86
fary86 任务状态TODO 修改为WIP

输入图片说明
MindSpore master __commit_id__ = ''[sha1]:0b0c9a12,[branch]:(HEAD->master,origin/master,origin/HEAD)''CANN-06.15的测试结果
备注:测试是需要适当调整超时时间,上面截图是设置成了500次,实际执行283次
输入图片说明

fary86 任务状态WIP 修改为VALIDATION
fary86 添加了
 
rct/newfeature
标签
fary86 添加了
 
ctl/componenttest
标签
fary86 移除了
 
ctl/componenttest
标签
fary86 移除了
 
ctl/componenttest
标签
fary86 添加了
 
ctl/solutiontest
标签
fary86 添加了
 
rca/algorithm
标签
fary86 里程碑B-SIG-ASCEND 修改为B-ComponentTest
fary86 添加协作者fary86
fary86 负责人fary86 修改为chensijie_Remzz

MindSpore 的版本和你发的保持一致,CANN包换成 06.15 号的
樊瑞(00488392) 2023-07-05 12:06
PASS了,不过跑的还是时间有些长

需要更换run包回归

回归时间:2023.07.11

回归版本: commit_id = '[sha1]:b47655c13,[branch]:(HEAD,origin/master,origin/HEAD,master)'

回归步骤:参考问题单复现步骤

回归结果: 用例通过

回归结论: 回归通过

INFO 2023-07-11 16:29:11 - test_ms_cell_functional_programming_datasink_007 - process_handle.py:is_process_exist:170 - No residual processes need to be cleaned.
INFO 2023-07-11 16:29:11 - test_ms_cell_functional_programming_datasink_007 - base.py:teardown:120 - The base teardown is running


======================== 1 passed in 4681.15s (1:18:01) ========================
chensijie_Remzz 任务状态VALIDATION 修改为DONE

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
6568201 fary86 1584438549
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助