74 Star 219 Fork 167

Ascend / modelzoo

 / 详情

【众智】【电子科技大学】【ID1383】【PVANet】PVANet模型迁移训练报错

DONE
Bug-Report
Opened this issue  
2022-06-01 16:40

一、问题现象(附报错日志上下文):
使用modelarts的pycharm toolkit远程训练迁移后的代码后,报错
参考颜亚文老师回复添加CANN镜像后,仍然运行失败
提示模型变量未初始化,未迁移代码在本地GPU上运行正常

迁移训练报错部分信息如下:
输入图片说明

二、软件版本:
-- CANN 版本 (e.g., CANN 3.0.x,5.x.x):
CANN 5.1.rc2.alpha003
--Tensorflow/Pytorch/MindSpore 版本:
Tensorflow 1.15
--Python 版本 (e.g., Python 3.7.5):
Python3.7.5
--操作系统版本 (e.g., Ubuntu 18.04)
Euler2.8.3

三、测试步骤:
参考原代码地址:https://github.com/LongJun123456/Faster-rcnn-tensorflow
OBS地址:obs://faster-rcnn-npu/Faster-rcnn-npu_20220516190824/
OBS共享URL:
URL:
https://e-share.obs-website.cn-north-1.myhuaweicloud.com?token=EPoRhRvcj2Bp2hILFRhYSRAM1CluVIEU8A/aCu+LhJHUuG+n2vUsEptzY9rJAxMawEAT2s3JYiHI2SflbyTCNk3ZNc2MgfJC0nSiWIHBNJhrFc3ik5aTIm8ogpqLOAlaJyeFWL4FJIAdgFv3bQ3xl1uqHcrPzmbQYTmNCWOsJeBaTkjKFmSByyebin0kjUH9PV7/1b8sAc4kdBoMZXySDnmReYLYO2BVWpFWkT6Y52kmdxmg6IbUWW4YPXcdVlJZiS4e+Pdbeavy5FDmxEqb4DhyS2UCgRiSfYKRvDw1YEH5UCDOCvzWHnoHc7VVT0FYsUzTtXiyrJlLzrs1X7zUIJX+BLTziUMpYhAnCQ9MoIJYOyJfiTBYWKuZSi65Jzbilnl7f/3u9+cF9zhFF9O2Cyw9616Phy5yCfbeKriVd77aXYiy0cWjTvfCJlK2SvhZdD5lNz2uKYPkPyds/HYqwPT3SrvdYvDZsJApSjmZ/dpDgBr3J84ZzcTnRJhkjYyeUIFLLBxrj/fHdcVoNvSbc9/IbZEBazkI5GHCUJAJqDQX0L31e4Lgph/DKR8+RjpcGtu7EydF1Ng7Ve8/l8zVy6SGbMUVDQh/kTsKYO8wCQ8l6dnGW/Rk3/MEiyVf84nBYTIYzdAIF7BAZii1tdKmKA==

提取码:
123456

  • 有效期至:
    2022/06/02 17:00:29
    1、项目代码位于OBS链接中 MA-new-Faster-rcnn-npu_20220516190824-06-01-22-52/code/文件夹中;
    数据集和预训练模型位于OBS链接中dataset/文件夹中
    2、使用自动迁移工具对原代码进行迁移
    3、通过argparse对代码中输入输出路径进行相应修改
    4、使用VOC2007数据集进行远程训练后报错

四、日志信息:
日志暂无法上传,后续上传
错误日志位于OBS链接中 MA-new-Faster-rcnn-npu_20220516190824-06-01-22-52/log/文件夹中

Comments (5)

peacehow createdBug-Report
peacehow changed description
wangxiaodan1103 changed issue state from TODO to Analysing
wangxiaodan1103 set assignee to chenhu
Expand operation logs
peacehow changed description
peacehow changed description
peacehow changed title

ascend-share/5.1.rc2.alpha003_tensorflow-ascend910-cp37-euleros2.8-aarch64-training:1.15.0-21.0.2_0520

使用最新的镜像执行一下

peacehow changed description

使用镜像后,仍然存在训练报错的原因,请老师帮忙分析一下可能的原因

只要使用tf.Variable()语法,必须先初始化变量,即

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

检查一下脚本是否正确

@peacehow 当前版本不支持在模型定义内部使用tf.py_func函数。尝试在Gitee内部的FastRCNN上进行PAVNet迁移。

颜亚文 changed issue state from Analysing to DONE

Sign in to comment

Status
Assignees
Projects
Milestones
Pull Requests
Successfully merging a pull request will close this issue.
Branches
Planed to start   -   Planed to end
-
Top level
Priority
Duration (hours)
参与者(4)
1
https://gitee.com/ascend/modelzoo.git
git@gitee.com:ascend/modelzoo.git
ascend
modelzoo
modelzoo

Search