name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
[resnet18][pynative][910][偶现]网络训练失败The pionter 【auto_grad_meta_data】 is null
resnet18网络仓地址:https://gitee.com/mindspore/models/tree/master/official/cv/ResNet
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device Ascend
pass版本:
mindspore版本:r2.1_20230719161523_c433b910
run包:HISI_C30/20230715
PyNative
/Graph
):Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
test_ms_resnet18_cifar10_pynative_train_infer_910_8p_0001
cd official/cv/ResNet/;
bash run_standalone_train.sh [DATASET_PATH] [CONFIG_PATH] [RESUME_CKPT](可选)
set mode=pynative
网络resnet18训练成功 性能达标
Traceback (most recent call last):
File "train.py", line 234, in <module>
train_net()
File "/home/jenkins/workspace/TDT_deployment/solution_test/cases/02network/00cv/resnet18/pynative/test_ms_resnet18_cifar10_pynative_train_infer_910_8p_0001/scripts/train_parallel1/src/model_utils/moxing_adapter.py", line 104, in wrapped_func
run_func(*args, **kwargs)
File "train.py", line 228, in train_net
sink_size=dataset.get_dataset_size(), dataset_sink_mode=dataset_sink_mode)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 1066, in train
initial_epoch=initial_epoch)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 113, in wrapper
func(self, *args, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 620, in _train
cb_params, sink_size, initial_epoch, valid_infos)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/model.py", line 703, in _train_dataset_sink_process
outputs = train_network(*inputs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 664, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 660, in __call__
output = self._run_construct(args, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 444, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/train/dataset_helper.py", line 101, in construct
return self.network(*outputs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 664, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 660, in __call__
output = self._run_construct(args, kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 444, in _run_construct
output = self.construct(*cast_inputs, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 423, in construct
loss = self.network(*inputs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 664, in __call__
raise err
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/nn/cell.py", line 661, in __call__
_pynative_executor.end_graph(self, output, *args, **kwargs)
File "/home/miniconda3/envs/ci/lib/python3.7/site-packages/mindspore/common/api.py", line 1304, in end_graph
self._executor.end_graph(obj, output, *args, *(kwargs.values()))
RuntimeError: The pointer[auto_grad_meta_data] is null.
----------------------------------------------------
- Framework Unexpected Exception Raised:
----------------------------------------------------
This exception is caused by framework's unexpected error. Please create an issue at https://gitee.com/mindspore/mindspore/issues to get help.
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/pipeline/pynative/grad/auto_grad.cc:1744 MapParameter
走给罗超
Please assign maintainer to check this issue.
请为此issue分配处理人。
@sunjiawei999
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!
使用最新2.1的代码编包,多台机器,尚未复现
与该问题单相同问题https://e.gitee.com/mind_spore/dashboard?issue=I7BHKY
目前测试,开发都未再复现该问题
2023/7/27 CCB:
遗留原因:此问题在多台机器上多次测试均未复现,先继续观察,经CCB裁决,作为偶现问题先遗留
影响:resnet18在Ascend 动态图模式下偶现训练失败
规避措施:偶现问题,用户网络中如果出现此类问题,可以重新训练
2023/9/9 CCB:
2.1.1版本出口满足降级标准,7月份以来一直未复现,降级为一般单跟踪
网络:LSTM
版本:r2.2_20231102_188a4d04_2023-11-06 20:22:02
也出现相同问题:The pionter 【auto_grad_meta_data】 is null
ccb结论:一直不复现,降级到不重要
登录 后才可以发表评论