一、问题现象(附报错日志上下文):
首先通过atc转换命令将pb模型转换为om模型后,然后使用msame工具对om模型进行推理验证时报出了om模型加载失败的错误。如下图所示:
二、软件版本:
-- CANN 版本 (e.g., CANN 3.0.x,5.x.x): cann5.0.4
--Tensorflow版本:1.15
--Python 版本 (e.g., Python 3.7.5):3.7.5
--操作系统版本 (e.g., Ubuntu 18.04):华为镜像服务器
四、日志信息:
使用msame推理是打印出error的输出如下:
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.665 [logger.cc:1477] 11353 KernelTaskEventLogProc: device_id=0, stream_id=7, task_id=1, kernel_name=(none), task_type=AiCpuKernel, task_launched_num=13
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.672 [npu_driver.cc:376] 11353 CommandSend: Command send success, device_id=0, ts_id=0, sq_id=7, reportCount=1, command=0xffc0070040, cmdCount=1.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.682 [api_impl.cc:455] 11350 StreamSynchronize: stream_id=7
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.689 [osal.cc:262] 11350 CreateNotifier: Runtime_alloc_size 104
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.702 [npu_driver.cc:350] 11353 CommandOccupy: sqId=7, deviceId=0, tsId=0, command=0xffc0070080, cmdCount=1.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.708 [engine.cc:840] 11353 SendingRun: CommandOccupy. sqId=7, cqId=0, deviceId=0, retCode=0.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.713 [task.cc:1652] 11353 ToCommandBody: streamId=7, task_id=2, eventid=1031, flag=0xe, Task flag=0xe, timelineBase=0xffffffffffffffff, offset=0xffffffff, waitCqflag=0, waitCqId=0
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.720 [stream.cc:883] 11353 AddTaskToStream: recorded public task to stream, stream_id=7, task_id=2, task_type=2, head=1, tail=3
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.726 [logger.cc:1520] 11353 TaskLaunchedEx: device_id=0, stream_id=7, task_id=2, event_id=1031,task_type=EventRecord, task_launched_num=14
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.188.732 [npu_driver.cc:376] 11353 CommandSend: Command send success, device_id=0, ts_id=0, sq_id=7, reportCount=2, command=0xffc0070080, cmdCount=1.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.190.375 [engine.cc:651] 11354 ProcessTaskReport: RTS_DRIVER: report receive, stream_id=7, sq_id=7, task_id=1, sq_head=3
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.190.386 [task.cc:872] 11354 SetResult: Kernel payLoad=2416312362, highTaskId=0, deviceId=0, rtCode=0x2a, errorTaskId=576, errorStreamId=96
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.190.397 [engine.cc:964]11354 ReportExceptProc:Task exception! device_id=0, stream_id=7, task_id=1, type=1, retCode=0x2a.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.190.408 [device_error_proc.cc:625] 11354 ProcErrorInfo: Begin to process device error info.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.191.933 [device_error_proc.cc:573] 11354 ProcessOneElementInRingBuffer: it needs process 1 error messages, tschVer=2, headSize=24, elementSize=11350.
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.191.948 [device_error_proc.cc:554]11354 ProcessAicpuErrorInfo:An exception occurred during AICPU execution, stream_id:0, task_id:0, errcode:4294967303, msg:aicpu execute failed.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.191.955 [device_error_proc.cc:665] 11354 ProcErrorInfo: finished to process device error info, retCode=0.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.074 [engine.cc:973] 11354 ReportExceptProc: excptCallBack_ is null.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.086 [stream.cc:929] 11354 TryDelRecordedTask: del public task from stream, stream_id=7, tailTaskId=1, delTaskId=1, head=2, tail=3
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.094 [logger.cc:1583] 11354 TaskFinished: device_id=0, stream_id=7, task_id=1, task_type=1,task_finish_num=13
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.114 [task.cc:806]11354 PreCheckTaskErr:Kernel task happen error, retCode=0x2a, [aicpu exception].
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.134 [task.cc:679]11354 GetFirstExtendInfoForAicpuTask:Get extend info for tf kernel, device_id=0, stream_id=7, task_id=1, extInfoLen=0, extInfoAddr=0. extInfoLen is invalid, valid range is [8, 18446744073709551615].
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.184 [task.cc:615]11354 PrintAicpuErrorInfo:Aicpu kernel execute failed, device_id=0, stream_id=7, task_id=1, fault so_name=, fault kernel_name=, fault op_name=, extend_info=.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.192 [task.cc:108] 11354 TaskFailCallBack: rtCode=0x715002a,[aicpu exception], errorTaskId=1, errorStreamId=7
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.200 [engine.cc:577] 11354 ProcessTask: ProcessTask.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.207 [engine.cc:651] 11354 ProcessTaskReport: RTS_DRIVER: report receive, stream_id=7, sq_id=7, task_id=2, sq_head=3
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.216 [engine.cc:651] 11354 ProcessTaskReport: RTS_DRIVER: report receive, stream_id=7, sq_id=7, task_id=2, sq_head=3
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.222 [stream.cc:929] 11354 TryDelRecordedTask: del public task from stream, stream_id=7, tailTaskId=2, delTaskId=2, head=3, tail=3
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.228 [logger.cc:1583] 11354 TaskFinished: device_id=0, stream_id=7, task_id=2, task_type=2,task_finish_num=14
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.239 [engine.cc:577] 11354 ProcessTask: ProcessTask.
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.256 [stream.cc:704]11350 GetError:[LOAD][DEFAULT]Stream Synchronize failed, stream_id=7 retCode=0x2a, [aicpu exception].
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.273 [logger.cc:285]11350 StreamSynchronize:[LOAD][DEFAULT]Stream synchronize failed, stream = 0x55ab46e6ff10
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.299 [api_c.cc:597]11350 rtStreamSynchronize:[LOAD][DEFAULT]ErrCode=507018, desc=[aicpu exception], InnerCode=0x715002a
[ERROR] RUNTIME(11350,msame):2021-11-17-13:01:59.192.307 [error_message_manage.cc:41]11350 ReportFuncErrorReason:[LOAD][DEFAULT]rtStreamSynchronize execute failed, reason=[aicpu exception]
[ERROR] GE(11350,msame):2021-11-17-13:01:59.192.337 [model_manager.cc:163]11350 KernelLaunchEx: ErrorNo: 1343225859(Failed to call runtime API!) [LOAD][DEFAULT][Call][RtStreamSynchronize] failed. ret:0x7BC8A
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.351 [logger.cc:356] 11350 DevFree: Device memory free. mem=0x108080059000
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.362 [stream.cc:609] 11350 TearDown: stream_id=7, delay recycle task num=0.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.379 [npu_driver.cc:350] 11353 CommandOccupy: sqId=7, deviceId=0, tsId=0, command=0xffc00700c0, cmdCount=1.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.385 [engine.cc:840] 11353 SendingRun: CommandOccupy. sqId=7, cqId=0, deviceId=0, retCode=0.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.392 [stream.cc:883] 11353 AddTaskToStream: recorded public task to stream, stream_id=7, task_id=3, task_type=6, head=3, tail=4
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.398 [logger.cc:1572] 11353 TaskLaunchedEx: device_id=0, stream_id=7, task_id=3, task_type=6,task_launched_num=15
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.404 [npu_driver.cc:376] 11353 CommandSend: Command send success, device_id=0, ts_id=0, sq_id=7, reportCount=1, command=0xffc00700c0, cmdCount=1.
[ERROR] GE(11350,msame):2021-11-17-13:01:59.192.420 [kernel_ex_task_info.cc:173]11350 Init: ErrorNo: 507018() [LOAD][DEFAULT][Create][AicpuSession] error. session id:15056924531701601793
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.440 [engine.cc:651] 11354 ProcessTaskReport: RTS_DRIVER: report receive, stream_id=7, sq_id=7, task_id=3, sq_head=4
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.446 [stream.cc:929] 11354 TryDelRecordedTask: del public task from stream, stream_id=7, tailTaskId=3, delTaskId=3, head=4, tail=4
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.452 [logger.cc:1583] 11354 TaskFinished: device_id=0, stream_id=7, task_id=3, task_type=6,task_finish_num=15
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.461 [stream.cc:381] 11354 FreeStreamId: free stream_id=7,
[ERROR] GE(11350,msame):2021-11-17-13:01:59.192.483 [davinci_model.cc:3228]11350 InitTaskInfo: ErrorNo: 507018() [LOAD][DEFAULT][Init][Task] index:62 failed, ret:507018.
[ERROR] GE(11350,msame):2021-11-17-13:01:59.192.491 [davinci_model.cc:489]11350 DoTaskSink: ErrorNo: 4294967295(failed) [LOAD][DEFAULT][Init][TaskInfo] failed, model_id:1.
[ERROR] GE(11350,msame):2021-11-17-13:01:59.192.499 [davinci_model.cc:643]11350 Init: ErrorNo: 4294967295(failed) [LOAD][DEFAULT][Call][DoTaskSink] failed, model_id:1.
[ERROR] GE(11350,msame):2021-11-17-13:01:59.192.506 [model_manager.cc:1400]11350 LoadModelOffline: ErrorNo: 4294967295(failed) [LOAD][DEFAULT][Init][DavinciModel] failed, ret:507018.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.535 [npu_driver.cc:506] 11354 NormalSqCqFree: Normal sqcq free start, sqId=7, cqId=0, deviceId=0, tsId=0.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.545 [stream.cc:168] 11354 DeAllocStreamSqCq: [SqCqManage]success to release sq, sq_id=7, cq_id=0, stream_id=7, is_sq_need_release=1, is_cq_need_reuse=1.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.554 [npu_driver.cc:275] 11354 StreamIdFree: Stream id free success, stream_id=7, device_id=0, tsId=0
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.569 [engine.cc:577] 11354 ProcessTask: ProcessTask.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.587 [logger.cc:764] 11350 ModelUnbindStream: model unbind stream, model_id=0, stream_id=6,
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.595 [osal.cc:262] 11350 CreateNotifier: Runtime_alloc_size 104
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.612 [npu_driver.cc:350] 11353 CommandOccupy: sqId=6, deviceId=0, tsId=0, command=0xffc0060080, cmdCount=1.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.617 [engine.cc:840] 11353 SendingRun: CommandOccupy. sqId=6, cqId=0, deviceId=0, retCode=0.
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.623 [task.cc:1652] 11353 ToCommandBody: streamId=6, task_id=2, eventid=1030, flag=0xe, Task flag=0xe, timelineBase=0xffffffffffffffff, offset=0xffffffff, waitCqflag=0, waitCqId=0
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.192.633 [stream.cc:874] 11353 AddTaskToStream: recorded persistent task to stream, stream_id=6, task_id=2, task_type=2, head=0, tail=1, delay recycle task num=1
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.206.618 [logger.cc:418] 11350 ManagedMemFree: managed memory free. mem = 0x10804006f200
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.206.623 [logger.cc:356] 11350 DevFree: Device memory free. mem=0x108080055000
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.206.628 [logger.cc:418] 11350 ManagedMemFree: managed memory free. mem = 0x10804006f400
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.206.633 [logger.cc:356] 11350 DevFree: Device memory free. mem=0x108080056000
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.206.638 [logger.cc:418] 11350 ManagedMemFree: managed memory free. mem = 0x10804006f600
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.206.643 [logger.cc:356] 11350 DevFree: Device memory free. mem=0x108080057000
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.268.532 [logger.cc:418] 11350 ManagedMemFree: managed memory free. mem = 0x10804006f800
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.268.554 [logger.cc:356] 11350 DevFree: Device memory free. mem=0x108080058000
[ERROR] GE(11350,msame):2021-11-17-13:01:59.280.191 [graph_loader.cc:230]11350 LoadModelFromData: ErrorNo: 507018() [LOAD][DEFAULT][Load][Model] failed, model_id:1.
[ERROR] ASCENDCL(11350,msame):2021-11-17-13:01:59.283.041 [model.cpp:226]11350 ModelLoadFromFileWithMem: [LOAD][DEFAULT][Model][FromData]load model from data failed, ge result[507018]
[ERROR] load model from file failed, model file is /home/HwHiAiUser/AscendProjects/modelnet10.om
[ERROR] load model from file failed
[WARN] no model had been loaded, unload failed
[ERROR] Sample process failed
[INFO] ASCENDCL(11350,msame):2021-11-17-13:01:59.287.416 [stream.cpp:46]11350 aclrtDestroyStream: start to execute aclrtDestroyStream
[INFO] GE(11350,msame):2021-11-17-13:01:59.287.503 [single_op_manager.cc:65]11350 ReleaseResource:ReleaseResource in. resource id = 0x55ab46718a70
[INFO] GE(11350,msame):2021-11-17-13:01:59.287.584 [graph_caching_allocator.cc:372]11350 TryFreeBlocks:Try free blocks.
[INFO] GE(11350,msame):2021-11-17-13:01:59.287.648 [graph_caching_allocator.cc:328]11350 FreeCachedBlocks:Free cached blocks
[EVENT] GE(11350,msame):2021-11-17-13:01:59.287.707 [graph_caching_allocator.cc:440]11350 PrintStatics:Called counts[malloc: 0 free: 0].
[EVENT] GE(11350,msame):2021-11-17-13:01:59.287.772 [graph_caching_allocator.cc:390]11350 PrintCount:Malloc total[size: 0 count: 0].
[EVENT] GE(11350,msame):2021-11-17-13:01:59.287.830 [graph_caching_allocator.cc:390]11350 PrintCount: Using total[size: 0 count: 0].
[EVENT] GE(11350,msame):2021-11-17-13:01:59.287.882 [graph_caching_allocator.cc:390]11350 PrintCount: Free total[size: 0 count: 0].
[INFO] ASCENDCL(11350,msame):2021-11-17-13:01:59.287.936 [stream_executor.cpp:181]11350 RemoveExecutor: To remove executor by stream = 94194109614704
[INFO] ASCENDCL(11350,msame):2021-11-17-13:01:59.288.004 [stream.cpp:51]11350 aclrtDestroyStream: release singleop success
[INFO] RUNTIME(11350,msame):2021-11-17-13:01:59.288.235 [stream.cc:609] 11350 TearDown: stream_id=4, delay recycle task num=0.
[INFO] ASCENDCL(11350,msame):2021-11-17-13:01:59.288.316 [stream.cpp:57]11350 aclrtDestroyStream: aclrtDestroyStream success
[INFO] end to destroy stream
使用的atc转换命令为:
atc --model=/home/HwHiAiUser/AscendProjects/VSL/pb_model/modelnet10.pb --framework=3 --output=/home/HwHiAiUser/AscendProjects/modelnet10 --soc_version=Ascend310 --input_shape="Placeholder:100,30,30,30,1" --log=info --out_nodes="latent_feature:0"
使用的msame推理命令为:
./msame --model "/home/HwHiAiUser/AscendProjects/modelnet10.om" --input "/home/HwHiAiUser/AscendProjects/VSL/ModelNet10.bin" --output "/home/HwHiAiUser/AscendProjects/VSL/om/" --outfmt TXT --loop 1
提取码:
123456
*有效期至: 2022/11/12 13:28:32 GMT+08:00
msame版本配套问题,当前ModelArts的离线推理环境是5.0.3前的版本,CANN包和msame需要使用配套版本。
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
调整版本后,离线推理流程已经通过。
登录 后才可以发表评论