74 Star 219 Fork 167

Ascend / modelzoo

 / 详情

用PytorchToCaffe工具转换成的Faster_rcnn的caffe模型,进一步转换为om模型时,报错Segmentation fault (core dump)

DONE
Bug-Report
创建于  
2020-11-13 00:37

Environment

  • Hardware Environment(Ascend/GPU/CPU): Atlas 200 DK

  • Software Environment:
    -- MindSpore version (source or binary): 没用,用的ATC命令行工具
    -- Python version : Python 3.7.5(环境是照着200DK教程装的)
    -- OS platform and distribution: Linux Ubuntu 18.04

Steps to reproduce the issue

(1)使用PytorchToCaffe工具,将部分faster_rcnn的Pytorch模型转换成Caffe模型
(1.1) 维护PytorchToCaffe工具,使其支持昇腾AI处理器支持的拓展算子Proposal
注:用这个工具是因为我们与华为合作的部门也用经过维护的这个工具

(1.1.1)首先在PytorchToCaffe/Caffe/caffe.proto中,按照教程中的方法添加LayerParameter和ProposalParameter的定义,再运行protoc --python_out ./ caffe.proto命令使修改生效。

(1.1.2)在PytorchToCaffe/pytorch_to_caffe.py中,进行对Proposal层的支持,添加的代码如下

    def _proposal(raw, cls_prob, bbox_delta):
        rois, actual_rois_num = raw(cls_prob, bbox_delta)
        
        bottom_blobs=[]
        bottom_blobs.append(log.blobs(cls_prob))
        bottom_blobs.append(log.blobs(bbox_delta))
        
        top_blobs=log.add_blobs([rois, actual_rois_num],name='proposal_blob')
        
        layer_name = log.add_layer(name='Proposal')
        
        layer = caffe_net.Layer_param(name=layer_name, type='Proposal',
                                          bottom=bottom_blobs, top=top_blobs)
        layer.proposal_param()
        log.cnet.add_layer(layer)
        return rois, actual_rois_num
        
    F.proposal = Rp(F.proposal, _proposal)

(1.1.3)在pytorch的torch.nn.functional.py中添加代码如下

    def proposal(cls_prob, bbox_delta, img_inf0=None):
        batch_size = cls_prob.shape[0]
        batch_size = bbox_delta.shape[0]
        rois = torch.randn([batch_size, 5, 19])
        actual_rois_num = torch.randn([batch_size, 8])
        return rois, actual_rois_num

由于转模型的时候,并不会把Pytorch层里怎么运算的转过去,所以在这个函数中,我只是让Proposal的输入输出维度保持正确,而没有进行具体计算。请问我在这里这样处理是否正确?

(1.1.4)在Caffe/layer_param.py中的类class Layer_param()下添加生成Proposal层超参数的生成方法

    def proposal_param(self):
        proposal_param = pb.ProposalParameter()
        proposal_param.feat_stride = 16
        proposal_param.base_size = 16 
        proposal_param.min_size = 16
        proposal_param.pre_nms_topn = 3000
        proposal_param.post_nms_topn = 304
        proposal_param.iou_threshold = 0.7
        proposal_param.output_actual_rois_num = True
        self.param.proposal_param.CopyFrom(proposal_param)

(1.1.5)使用经过上述过程维护的PytorchToCaffe工具,将部分Faster_rcnn模型转换成Caffe模型。
Pytorch模型定义及转换代码请见此链接。转换完成的.prototxt文件请见附件1(toy03.prototxt)。由于.caffemodel模型太大,暂不上传。为了方便老师查看,此模型可视化如下:
待转换模型
可以看到,这'部分Faster_rcnn模型'就是Faster_RCNN的backbone加上Region Proposal Network。

(2)将Caffe模型,用ATC命令,转换成om模型,此时报错
(2.1)运行脚本对模型进行转换
其中--insert_op_conf="aipp_faster_rcnn.cfg文件是社区案例中提供的样例文件。

#!/bin/sh
export PATH=/usr/local/python3.7.5/bin:$PATH:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/ccec_compiler/bin:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/bin
export PYTHONPATH=$PYTHONPATH:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/python/site-packages/te.egg:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/python/site-packages/te:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/python/site-packages/topi.egg:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/python/site-packages/topi:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/python/site-packages/auto_tune.egg:/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/python/site-packages/schedule_search.egg
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/lib64:/usr/local/Ascend/ascend-toolkit/20.0.RC1/driver/lib64:/usr/local/Ascend/ascend-toolkit/20.0.RC1/add-ons:/usr/local/python3.7.5/lib
export SLOG_PRINT_TO_STDOUT=1
export ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/20.0.RC1/opp
/usr/local/Ascend/ascend-toolkit/20.0.RC1/atc/bin/atc \
--model="faster_rcnn.prototxt" \
--weight="faster_rcnn.caffemodel" \
--framework=0 \
--output="/root/modelzoo/exp09/device/faster_rcnn" \
--soc_version=Ascend310 \
--insert_op_conf="aipp_faster_rcnn.cfg"

(2.2)此时发生报错

[EVENT] FE(6056,atc):2020-11-12-22:54:02.317.747 [fusion_engine/graph_optimizer/fe_graph_optimizer.cpp:246]OptimizeOriginalGraph:"[FE_PERFORMANCE]The time cost of FEGraphOptimizer::OptimizeQuantGraph is [1064293] micro second."
Segmentation fault (core dumped)
root@UbuntuForAscend:/home/pyf/Downloads/Convert_faster_rcnn_exp# 2020-11-12 22:54:03,732 6067 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,762 6062 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,833 6060 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,863 6061 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,889 6065 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,902 6066 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,930 6063 PCOMPILE Master process dead. worker process quiting..
2020-11-12 22:54:03,946 6064 PCOMPILE Master process dead. worker process quiting..
/usr/local/python3.7.5/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 35 leaked semaphores to clean up at shutdown
  len(cache))

其完整错误日志请见附件2(log.txt)。
打开GE DUMP图的开关(export DUMP_GE_GRAPH=1),当前目录下生成的.pbtxt和.txt格式的图请见附件3(附件3.rar)。

Special notes for this issue

我尝试过去掉Proposal层,进行转换,能够成功。初步认为问题发生在atc转换工具上,或者是pytorch转caffe时,.caffemodel模型的问题上。

附件

请见网盘
https://cloud.tsinghua.edu.cn/d/31db2689e4db4a9f9083/

.caffemodel文件请见
https://cloud.tsinghua.edu.cn/d/2f869e9438c14ac5be31/

评论 (14)

yifanpu001 创建了Bug-Report
yifanpu001 关联仓库设置为Ascend/modelzoo
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
yifanpu001 修改了描述
展开全部操作日志

我们会尽快分析您的日志

从日志和你的操作看不出为什么会出错。方便提供下权重文件么,这样我们可以尝试转下,获取更多的信息,谢谢

@zhutian 好的,稍等,我中午发给您

yifanpu001 修改了描述

从日志和你的操作看不出为什么会出错。方便提供下权重文件么,这样我们可以尝试转下,获取更多的信息,谢谢

@zhutian 老师您好,权重文件已经上传到https://cloud.tsinghua.edu.cn/d/2f869e9438c14ac5be31/了

外网可以访问么?打开后显示如下:
输入图片说明

外网可以访问么?打开后显示如下:
输入图片说明

@zhengtao 可以访问的,刷新一下试试

外网可以访问么?打开后显示如下:
输入图片说明

@zhengtao 上面的回复里,把中文字也自动加到链接里了,点这个试试,
https://cloud.tsinghua.edu.cn/d/2f869e9438c14ac5be31/

好的,这次可以下载了,我们拿去复现下问题,有进展会第一时间通知您!

好的,这次可以下载了,我们拿去复现下问题,有进展会第一时间通知您!

@zhengtao 谢谢,麻烦老师了

现在20.1版本已经发布,你可以https://www.huaweicloud.com/ascend/home.html 资源中心 链接中下载到该版本的开发环境报。
在20.1版本上,我用下面的命令转你的模型
export PATH=/usr/local/python3.7.5/bin:$PATH:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/ccec_compiler/bin:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/bin
export PYTHONPATH=$PYTHONPATH:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/python/site-packages:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/python/site-packages/auto_tune.egg/auto_tune:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/python/site-packages/schedule_search.egg:/home/c75/Ascend/ascend-toolkit/20.1.rc1/opp/op_impl/built-in/ai_core/tbe
export LD_LIBRARY_PATH=/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/lib64:/home/c75/Ascend/ascend-toolkit/20.1.rc1/driver/lib64:/home/c75/Ascend/ascend-toolkit/20.1.rc1/add-ons:/usr/local/python3.7.5/lib:/home/c75/Ascend/ascend-toolkit/20.1.rc1/acllib/lib64
export SLOG_PRINT_TO_STDOUT=1
export ASCEND_OPP_PATH=/home/c75/Ascend/ascend-toolkit/20.1.rc1/opp

/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/bin/atc --input_shape="blob1:1,3,600,800" --weight="/home/c75/work/tinghua_fastercnn/toy_fasterrcnn_with_proposal.caffemodel" --check_report=/home/c75/modelzoo/toy03/device/network_analysis.report --input_format=NCHW --output="/home/c75/modelzoo/toy03/device/toy03" --soc_version=Ascend310 --framework=0 --model="/home/c75/work/tinghua_fastercnn/toy03.prototxt"

报错
2020-11-13 02:46:12 E11019: Op[Proposal1]'s input[2] is not linked in weight file
这个可能是你的权重文件中,proposal算子的input2连接有问题

[DEBUG] GE(4625,atc):2020-11-13-02:57:24.766.886 [framework/domi/parser/caffe/caffe_parser.cc:991]4625 AddNode:Caffe layer name:Proposal1, layer type Proposal
[INFO] GE(4625,atc):2020-11-13-02:57:24.766.953 [framework/domi/parser/caffe/caffe_parser.cc:1144]4625 AddTensorDescToOpDescByIr:After GetOpDescFromOperator op[Proposal1] type[Proposal] have all input size: 3, caffe_input_size:2 blob_size 0 output size: 2
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.018 [framework/domi/parser/caffe/caffe_parser.cc:1166]4625 AddTensorDescToOpDescByIr:op [Proposal1], type[Proposal], update input(0) with name cls_prob success
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.050 [framework/domi/parser/caffe/caffe_parser.cc:1166]4625 AddTensorDescToOpDescByIr:op [Proposal1], type[Proposal], update input(1) with name bbox_delta success
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.074 [framework/domi/parser/caffe/caffe_parser.cc:1175]4625 AddTensorDescToOpDescByIr:op [Proposal1], type[Proposal], update output(0) with name rois success
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.090 [framework/domi/parser/caffe/caffe_parser.cc:1175]4625 AddTensorDescToOpDescByIr:op [Proposal1], type[Proposal], update output(1) with name actual_rois_num success
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.104 [framework/domi/parser/caffe/caffe_parser.cc:1005]4625 AddNode:After AddTensorDescToOpDescByIr op[Proposal1] type[Proposal] have input size: 3, output size: 2
[DEBUG] GE(4625,atc):2020-11-13-02:57:24.767.140 [framework/domi/parser/caffe/caffe_custom_parser_adapter.cc:44]4625 ParseParams:Caffe layer name = Proposal1, layer type= Proposal, parse params
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.207 [framework/domi/parser/caffe/caffe_parser.cc:1009]4625 AddNode:After op parser op[Proposal1] type[Proposal] have input size: 3, output size: 2
[INFO] GE(4625,atc):2020-11-13-02:57:24.767.220 [framework/domi/parser/caffe/caffe_parser.cc:1013]4625 AddNode:Enter caffe parser. op name:Proposal1, type:Proposal
[DEBUG] GE(4625,atc):2020-11-13-02:57:24.767.681 [framework/domi/parser/caffe/caffe_parser.cc:1214]4625 AddEdges:Start add edge: From view2:0 To Proposal1:1.
[DEBUG] GE(4625,atc):2020-11-13-02:57:24.767.750 [framework/domi/parser/caffe/caffe_parser.cc:1214]4625 AddEdges:Start add edge: From conv15:0 To Proposal1:0.
[INFO] GE(4625,atc):2020-11-13-02:57:24.768.389 [framework/domi/parser/caffe/caffe_parser.cc:1356]4625 AddEdge4Output:output in top_blob: Proposal1
[DEBUG] GE(4625,atc):2020-11-13-02:57:24.768.407 [framework/domi/parser/caffe/caffe_parser.cc:1361]4625 AddEdge4Output:Start add edge for out node: From Proposal1:0 To toy_fasterrcnn_Node_Output:0.
[INFO] GE(4625,atc):2020-11-13-02:57:24.768.413 [framework/domi/parser/caffe/caffe_parser.cc:1356]4625 AddEdge4Output:output in top_blob: Proposal1
[DEBUG] GE(4625,atc):2020-11-13-02:57:24.768.426 [framework/domi/parser/caffe/caffe_parser.cc:1361]4625 AddEdge4Output:Start add edge for out node: From Proposal1:1 To toy_fasterrcnn_Node_Output:1.
[DEBUG] GE(4625,atc):2020-11-13-02:57:24.768.675 [common/graph/./compute_graph.cc:745]4625 DFSTopologicalSorting:node_vec.push_back Proposal1
[INFO] GE(4625,atc):2020-11-13-02:57:24.768.728 [framework/domi/parser/caffe/caffe_parser.cc:2469]4625 GetLeafNodeTops:The top of out node [Proposal1] is [proposal_blob1]
[INFO] GE(4625,atc):2020-11-13-02:57:24.768.755 [framework/domi/parser/caffe/caffe_parser.cc:2469]4625 GetLeafNodeTops:The top of out node [Proposal1] is [proposal_blob2]
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.204 [framework/domi/parser/caffe/caffe_parser.cc:1752]4625 Parse:out node name = Proposal1.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.265 [framework/domi/parser/caffe/caffe_parser.cc:1752]4625 Parse:out node name = Proposal1.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.271 [framework/domi/parser/caffe/caffe_parser.cc:1750]4625 Parse:node name = Proposal1.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.887 [common/graph/./compute_graph.cc:1033]4625 Dump:node name = conv15, out data node name = Proposal1.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.948 [common/graph/./compute_graph.cc:1033]4625 Dump:node name = view2, out data node name = Proposal1.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.954 [common/graph/./compute_graph.cc:1028]4625 Dump:node name = Proposal1.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.960 [common/graph/./compute_graph.cc:1033]4625 Dump:node name = Proposal1, out data node name = toy_fasterrcnn_Node_Output.
[INFO] GE(4625,atc):2020-11-13-02:57:24.769.966 [common/graph/./compute_graph.cc:1033]4625 Dump:node name = Proposal1, out data node name = toy_fasterrcnn_Node_Output.
[DEBUG] GE(4625,atc):2020-11-13-02:57:25.493.601 [framework/domi/parser/caffe/caffe_parser.cc:2078]4625 ParseLayerField:Parse result(name : Proposal1)
[INFO] GE(4625,atc):2020-11-13-02:57:25.493.628 [framework/domi/parser/caffe/caffe_parser.cc:2037]4625 ParseLayerParameter:Parse layer Proposal1
[DEBUG] GE(4625,atc):2020-11-13-02:57:25.493.642 [framework/domi/parser/caffe/caffe_parser.cc:2312]4625 ConvertLayerParameter:Caffe layer name: Proposal1 , layer type: Proposal.
[INFO] GE(4625,atc):2020-11-13-02:57:25.493.663 [framework/domi/parser/caffe/caffe_custom_parser_adapter.cc:79]4625 ParseWeights:layer: Proposal1 blobs_size: 0 bottom_size: 2
[ERROR] GE(4625,atc):2020-11-13-02:57:25.497.945 [framework/domi/parser/caffe/caffe_parser.cc:2356]4625 CheckNodes: ErrorNo: -1(failed) Op[Proposal1]'s input 2 is not linked.
E11019: Op[Proposal1]'s input[2] is not linked.

这个是你的Proposal1相关信息,从
GetOpDescFromOperator op[Proposal1] type[Proposal] have all input size: 3, caffe_input_size:2 blob_size 0 output size: 2
看,你的Proposal是不是有3个输入,但是在caffe里面实际只有两个输入,这样空出来一个输入没有使用?

现在20.1版本已经发布,你可以https://www.huaweicloud.com/ascend/home.html 资源中心 链接中下载到该版本的开发环境报。
在20.1版本上,我用下面的命令转你的模型
export PATH=/usr/local/python3.7.5/bin:$PATH:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/ccec_compiler/bin:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/bin
export PYTHONPATH=$PYTHONPATH:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/python/site-packages:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/python/site-packages/auto_tune.egg/auto_tune:/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/python/site-packages/schedule_search.egg:/home/c75/Ascend/ascend-toolkit/20.1.rc1/opp/op_impl/built-in/ai_core/tbe
export LD_LIBRARY_PATH=/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/lib64:/home/c75/Ascend/ascend-toolkit/20.1.rc1/driver/lib64:/home/c75/Ascend/ascend-toolkit/20.1.rc1/add-ons:/usr/local/python3.7.5/lib:/home/c75/Ascend/ascend-toolkit/20.1.rc1/acllib/lib64
export SLOG_PRINT_TO_STDOUT=1
export ASCEND_OPP_PATH=/home/c75/Ascend/ascend-toolkit/20.1.rc1/opp
/home/c75/Ascend/ascend-toolkit/20.1.rc1/atc/bin/atc --input_shape="blob1:1,3,600,800" --weight="/home/c75/work/tinghua_fastercnn/toy_fasterrcnn_with_proposal.caffemodel" --check_report=/home/c75/modelzoo/toy03/device/network_analysis.report --input_format=NCHW --output="/home/c75/modelzoo/toy03/device/toy03" --soc_version=Ascend310 --framework=0 --model="/home/c75/work/tinghua_fastercnn/toy03.prototxt"
报错
2020-11-13 02:46:12 E11019: Op[Proposal1]'s input[2] is not linked in weight file
这个可能是你的权重文件中,proposal算子的input2连接有问题

@zhutian 老师您好,根据官方文档,对于检测模型来说,需要手动修改.prototxt文件,加入新的输入节点img_info,再在Proposal等层中加入img_info这样的输入节点。我已经手动修改好了.prototxt文件,在下面链接中,请问老师可以再帮忙看一下什么问题吗?
https://cloud.tsinghua.edu.cn/d/450409d4b1124d4baf9a/

zhengtao 负责人设置为zhutian

E19000: Path[/home/c75/Ascend/ascend-toolkit/20.1.rc1/x86_64-linux/opp/op_impl/custom/ai_core/tbe/config/ascend310]'s realpath is empty, errmsg[The file path does not exist.]
E11004: caffe net input_shape size[2] is not equal input size[3].
应该是你加入的那个img_info节点输入维度是2,实际要求的是3,你看下这个输入shape维度是啥地方定义的呢

你好,请问问题解决了么,谢谢

zhutian 任务状态TODO 修改为DONE
吴定远 关联仓库Ascend/modelzoo-his 修改为Ascend/modelzoo

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
1
https://gitee.com/ascend/modelzoo.git
git@gitee.com:ascend/modelzoo.git
ascend
modelzoo
modelzoo

搜索帮助