2.4K Star 8.1K Fork 4.4K

GVPMindSpore / mindspore

 / 详情

[MS][ST][DYN]graph模式下,动态shape场景,tile算子infer问题,使ApplyMomentum算子出现variable_shape which is [const vector]{1} , but got [const vector]{1, 1, 1, 1}

DONE
Bug-Report
创建于  
2023-09-20 15:53
name about labels
Bug Report Use this template for reporting a bug kind/bug

Describe the current behavior / 问题描述 (Mandatory / 必填)

graph模式下,动态shape场景,tile算子infer存在问题,使ApplyMomentum算子出现variable_shape which is [const vector]{1} , but got [const vector]{1, 1, 1, 1}

Environment / 环境信息 (Mandatory / 必填)

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU/CPU/

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :
    -- Python version (e.g., Python 3.7.5) :
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:

/mode graph

Related testcase / 关联用例 (Mandatory / 必填)

solution_test/cases/03subject_test/02usability/model_develop/dynamic_shape/test_ms_dynamic_shape_hw_dynamic_sink_mode_0001/test_ms_dynamic_shape_hw_dynamic_sink_mode_0001.py

Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

  1. 复制自定义网络模型到当前目录
  2. 设置 HW维为动态(配置文件中参数设置为None)
  3. 执行网络训练

Describe the expected behavior / 预期结果 (Mandatory / 必填)

执行成功无异常

Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

Traceback (most recent call last):
  File "../test_ms_dynamic_shape_hw_dynamic_sink_mode_0001_GRAPH_MODE/train_dynamic_net.py", line 393, in <module>
    train_net_with_model()
  File "../test_ms_dynamic_shape_hw_dynamic_sink_mode_0001_GRAPH_MODE/train_dynamic_net.py", line 232, in train_net_with_model
    sink_size=config.sink_size)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 1070, in train
    initial_epoch=initial_epoch)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 113, in wrapper
    func(self, *args, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 623, in _train
    cb_params, sink_size, initial_epoch, valid_infos)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 707, in _train_dataset_sink_process
    outputs = train_network(*inputs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 670, in __call__
    out = self.compile_and_run(*args, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 1013, in compile_and_run
    return _cell_graph_executor(self, *new_args, phase=self.phase)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1560, in __call__
    return self.run(obj, *args, phase=phase)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1599, in run
    return self._exec_pip(obj, *args, phase=phase_real)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 115, in wrapper
    results = fn(*arg, **kwargs)
  File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1579, in _exec_pip
    return self._graph_executor(args, phase)
ValueError: For primitive[ApplyMomentum], the gradient_shape must be equal to variable_shape which is [const vector]{1} , but got [const vector]{1, 1, 1, 1}.

----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/core/utils/check_convert_utils.h:192 CheckValue

Special notes for this issue/备注 (Optional / 选填)

评论 (7)

田桐 创建了Bug-Report
田桐 添加了
 
kind/bug
标签
田桐 添加了
 
sig/ds
标签
田桐 添加了
 
attr/function
标签
田桐 添加了
 
stage/func-debug
标签
田桐 添加了
 
v2.2.0
标签
田桐 添加协作者leiwei2
展开全部操作日志

Please assign maintainer to check this issue.
请为此issue分配处理人。
@田桐

感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表

  1. 如果您刚刚接触MindSpore,或许您可以在教程找到答案
  2. 如果您是资深Pytorch用户,您或许需要:
    与PyTorch典型区别 / PyTorch与MindSpore API映射表
  3. 如果您遇到动态图问题,可以设置mindspore.set_context(pynative_synchronize=True)查看报错栈协助定位
  4. 模型精度调优问题可参考官网调优指南
  5. 如果您反馈的是框架BUG,请确认您在ISSUE中提供了MindSpore版本、使用的后端类型(CPU、GPU、Ascend)、环境、训练的代码官方链接以及可以复现报错的代码的启动方式等必要的定位信息
  6. 如果您已经定位出问题根因,欢迎提交PR参与MindSpore开源社区,我们会尽快review
田桐 优先级主要 修改为严重

影响用例失败较多:修改为严重问题

原因:tile算子infershape写的过于泛化导致后续,反向产生中 dout的shape不等于out的shape导致反向推导错误。

解决方法:优化tile算子infershape,动态shape反向产生中增加报警

i-robot 添加了
 
foruda
标签

自我验证PASS
输入图片说明

修复已合入请求回归

NaCN 任务状态TODO 修改为VALIDATION
NaCN 添加了
 
rct/bugfix
标签
NaCN 添加了
 
rca/algorithm
标签
NaCN 添加了
 
ctl/codereview
标签
NaCN 添加协作者NaCN
NaCN 负责人NaCN 修改为田桐
NaCN 里程碑B-SIG-AKG 修改为B-SolutionTest

回归时间:2023.9.27
回归版本:master 0926
回归结果:输入图片说明
回归结论:回过通过

田桐 任务状态VALIDATION 修改为DONE

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助

53164aa7 5694891 3bd8fe86 5694891