代码拉取完成,页面将自动刷新
环境910B单机16卡机器
一、问题现象(附报错日志上下文):
对APE大模型进行3并发测试,报错。
(py39) root@gzxj-sys-rpm46kwprrx:~/APE# ./run_test.sh
/root/miniconda3/envs/py39/lib/python3.9/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(
/root/miniconda3/envs/py39/lib/python3.9/site-packages/torchvision/transforms/functional_pil.py:5: UserWarning: The torchvision.transforms.functional_pil module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
warnings.warn(
[04/10 06:46:19 detectron2]: Arguments: Namespace(config_file='configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py', webcam=False, video_input=None, input=None, output=None, confidence_threshold=0.1, opts=['train.init_checkpoint=/root/APE/model_final.pth', 'model.model_language.cache_dir=', 'model.model_vision.select_box_nums_for_evaluation=500', 'model.model_vision.text_feature_bank_reset=True', 'model.model_vision.backbone.net.xattn=False'], text_prompt=None, with_box=True, with_mask=False, with_sseg=False)
Please 'pip install xformers'
Please 'pip install xformers'
Please 'pip install apex'
Please 'pip install xformers'
=========== args.opts ============ ['train.init_checkpoint=/root/APE/model_final.pth', 'model.model_language.cache_dir=', 'model.model_vision.select_box_nums_for_evaluation=500', 'model.model_vision.text_feature_bank_reset=True', 'model.model_vision.backbone.net.xattn=False']
ANTLR runtime and generated code versions disagree: 4.9.3!=4.8
ANTLR runtime and generated code versions disagree: 4.9.3!=4.8
======== shape of rope freq torch.Size([1024, 64]) ========
======== shape of rope freq torch.Size([4096, 64]) ========
[04/10 06:46:24 ape.data.detection_utils]: Using builtin metadata 'image_count' for dataset '['lvis_v1_train+coco_panoptic_separated']'
[04/10 06:46:24 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: torch.Size([1203]) num_classes: 1256
[04/10 06:46:24 ape.modeling.ape_deta.deformable_criterion]: pad fed_loss_cls_weights with type cat and value 0
[04/10 06:46:24 ape.modeling.ape_deta.deformable_criterion]: pad fed_loss_classes with tensor([1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214,
1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226,
1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238,
1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250,
1251, 1252, 1253, 1254, 1255])
[04/10 06:46:24 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: tensor([ 1.0000, 1.0000, 3.1623, 7.3485, 43.8520, 25.0998, 5.5678, 8.3066,
2.6458, 3.3166, 1.0000, 5.4772, 7.0711, 6.7082, 5.2915, 10.6771,
13.8924, 4.5826, 9.5394, 5.5678, 38.3275, 43.8634, 9.3274, 8.7750,
3.3166, 6.8557, 4.5826, 6.8557, 8.3666, 42.8719, 4.3589, 23.0434,
3.3166, 46.6798, 10.6301, 5.0990, 2.2361, 7.4833, 8.5440, 5.6569,
11.3137, 24.9600, 3.4641, 7.2111, 3.3166, 41.0731, 9.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
0.0000, 0.0000, 0.0000, 0.0000])
[04/10 06:46:24 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: torch.Size([1256]) num_classes: 1256
[04/10 06:46:24 ape.data.detection_utils]: Using builtin metadata 'image_count' for dataset '['openimages_v6_train_bbox_nogroup']'
[04/10 06:46:24 ape.modeling.ape_deta.deformable_criterion]: fed_loss_cls_weights: torch.Size([601]) num_classes: 601
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 0
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: lvis_v1_train+coco_panoptic_separated
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing+stuff
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 1
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: objects365_train_fixname
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 2
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: openimages_v6_train_bbox_nogroup
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 3
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: visualgenome_77962_box_and_region
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 4
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: sa1b_6m
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 5
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: refcoco-mixed_group-by-image
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 6
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: gqa_region_train
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 7
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: phrasecut_train
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 8
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: flickr30k_separateGT_train
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_id: 9
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_name: refcoco-mixed
[04/10 06:46:24 ape.modeling.ape_deta.deformable_detr]: dataset_entity: thing
[04/10 06:47:13 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /root/APE/model_final.pth ...
[04/10 06:47:13 fvcore.common.checkpoint]: [Checkpointer] Loading from /root/APE/model_final.pth ...
Namespace(config_file='configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py', webcam=False, video_input=None, input=None, output=None, confidence_threshold=0.1, opts=['train.init_checkpoint=/root/APE/model_final.pth', 'model.model_language.cache_dir=', 'model.model_vision.select_box_nums_for_evaluation=500', 'model.model_vision.text_feature_bank_reset=True', 'model.model_vision.backbone.net.xattn=False'], text_prompt=None, with_box=True, with_mask=False, with_sseg=False)
INFO: Started server process [75357]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8198 (Press CTRL+C to quit)
/root/APE/ape/modeling/text/clip_wrapper_eva02.py:117: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.)
attention_mask[i, : end_token_idx[i] + 1] = 1
/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
[WARNING] nms proposals (0) < 900, running naive topk
[WARNING] nms proposals (0) < 900, running naive topk
INFO: 10.92.54.160:60802 - "POST /infer HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/anyio/to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/root/APE/demo/api.py", line 144, in interface
predictions, visualized_output, visualized_outputs, metadata = demo.run_on_image(
File "/root/APE/demo/predictor_lazy.py", line 212, in run_on_image
predictions = self.predictor(image, text_prompt, mask_prompt)
File "/root/APE/ape/engine/defaults.py", line 99, in __call__
predictions = self.model([inputs])[0]
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/APE/ape/modeling/ape_deta/ape_deta.py", line 39, in forward
losses = self.model_vision(batched_inputs, do_postprocess=do_postprocess)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/APE/ape/modeling/ape_deta/deformable_detr_segm_vl.py", line 428, in forward
) = self.transformer(
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/APE/ape/modeling/ape_deta/deformable_transformer_vl.py", line 605, in forward
keep_inds_topk = keep_inds[keep_inds_mask]
RuntimeError: InnerRun:/usr1/02/workspace/j_ywhtRpPk/pytorch/torch_npu/csrc/framework/OpParamMaker.cpp:219 NPU error, error code is 500002
[Error]: A GE error occurs in the system.
Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
EH9999: Inner Error!
EH9999 [Exec][Op]Execute op failed. op type = NonZero, ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
[WARNING] nms proposals (0) < 900, running naive topk
INFO: 10.92.54.160:60898 - "POST /infer HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
return await self.app(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/anyio/to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/root/APE/demo/api.py", line 144, in interface
predictions, visualized_output, visualized_outputs, metadata = demo.run_on_image(
File "/root/APE/demo/predictor_lazy.py", line 212, in run_on_image
predictions = self.predictor(image, text_prompt, mask_prompt)
File "/root/APE/ape/engine/defaults.py", line 99, in __call__
predictions = self.model([inputs])[0]
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/APE/ape/modeling/ape_deta/ape_deta.py", line 39, in forward
losses = self.model_vision(batched_inputs, do_postprocess=do_postprocess)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/APE/ape/modeling/ape_deta/deformable_detr_segm_vl.py", line 428, in forward
) = self.transformer(
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/APE/ape/modeling/ape_deta/deformable_transformer_vl.py", line 605, in forward
keep_inds_topk = keep_inds[keep_inds_mask]
RuntimeError: InnerRun:/usr1/02/workspace/j_ywhtRpPk/pytorch/torch_npu/csrc/framework/OpParamMaker.cpp:219 NPU error, error code is 500002
[Error]: A GE error occurs in the system.
Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
EH9999: Inner Error!
EH9999 [Exec][Op]Execute op failed. op type = NonZero, ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
[04/10 06:53:56 detectron2]: ./tmp/1712731968.7509215.jpg: detected 7 instances in 67.40s
INFO: 10.92.54.160:60698 - "POST /infer HTTP/1.1" 200 OK
[WARNING] nms proposals (0) < 900, running naive topk
[WARNING] nms proposals (0) < 900, running naive topk
[WARNING] nms proposals (0) < 900, running naive topk
[04/10 06:54:44 detectron2]: ./tmp/1712732025.6573904.jpg: detected 14 instances in 58.55s
INFO: 10.92.54.160:35880 - "POST /infer HTTP/1.1" 200 OK
[04/10 06:54:45 detectron2]: ./tmp/1712732026.2869112.jpg: detected 14 instances in 59.58s
INFO: 10.92.54.160:36146 - "POST /infer HTTP/1.1" 200 OK
[04/10 06:54:56 detectron2]: ./tmp/1712732039.161774.jpg: detected 14 instances in 57.36s
INFO: 10.92.54.160:36822 - "POST /infer HTTP/1.1" 200 OK
[WARNING] nms proposals (0) < 900, running naive topk
[WARNING] nms proposals (0) < 900, running naive topk
[WARNING] nms proposals (0) < 900, running naive topk
[04/10 06:55:44 detectron2]: ./tmp/1712732085.083786.jpg: detected 14 instances in 59.38s
INFO: 10.92.54.160:39896 - "POST /infer HTTP/1.1" 200 OK
EH9999: Inner Error!
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
EH9999 synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
E90003: Compile operator failed, cause: Template constraint, detailed information: check_op_cap func failed, check_type: op_select_format, op_type:LinSpace failed, failure details:
Compile_info: empty_compile_info
Inputs: {'shape': (1,), 'ori_shape': (), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'float32', 'addr_type': 0, 'ddr_base_prop': 0, 'total_shape': [1], 'slice_offset': (), 'L1_addr_offset': 0, 'L1_fusion_type': -1, 'L1_workspace_size': -1, 'valid_shape': (), 'split_index': 0, 'is_first_layer': False, 'range': (), 'ori_range': (), 'atomic_type': '', 'input_c_values': -1}
{'shape': (1,), 'ori_shape': (), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'float32', 'addr_type': 0, 'ddr_base_prop': 0, 'total_shape': [1], 'slice_offset': (), 'L1_addr_offset': 0, 'L1_fusion_type': -1, 'L1_workspace_size': -1, 'valid_shape': (), 'split_index': 0, 'is_first_layer': False, 'range': (), 'ori_range': (), 'atomic_type': '', 'input_c_values': -1}
{'shape': (1,), 'ori_shape': (1,), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'int32', 'addr_type': 0, 'ddr_base_prop': 0, 'total_shape': [1], 'slice_offset': (), 'L1_addr_offset': 0, 'L1_fusion_type': -1, 'L1_workspace_size': -1, 'valid_shape': (), 'split_index': 0, 'is_first_layer': False, 'range': (), 'ori_range': (), 'atomic_type': '', 'input_c_values': -1}
Outputs: {'shape': (-2,), 'ori_shape': (-2,), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'float32', 'addr_type': 0, 'ddr_base_prop': 0, 'total_shape': [-2], 'slice_offset': (), 'L1_addr_offset': 0, 'L1_fusion_type': -1, 'L1_workspace_size': -1, 'valid_shape': (), 'split_index': 0, 'range': (), 'ori_range': (), 'atomic_type': '', 'input_c_values': -1}
Attrs: [].
TraceBack (most recent call last):
The error from device(chipId:1, dieId:0), serial number is 12, there is an aivec error exception, core id is 34, error code = 0x800000, dump info: pc start: 0x1240c140d0a8, current: 0x1240c140d300, vec error info: 0xd115465300, mte error info: 0x3403000096, ifu error info: 0x23c9f37f1f880, ccu error info: 0x5f082e8a43b6e9e3, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x1241803e6400.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1165]
The extend info: errcode:(0x800000, 0, 0) errorStr: The DDR address of the MTE instruction is out of range. fixp_error0 info: 0x3000096, fixp_error1 info: 0x34 fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1177]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1677]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1454]
Aicore kernel execute failed, device_id=1, stream_id=2, report_stream_id=2, task_id=49049, flip_num=0, fault kernel_name=Cast_e87590d11ccda8b259ab6b1ea7212319_high_performance_210000000, program id=121, hash=3394887288916785353.[FUNC:GetError][FILE:stream.cc][LINE:1454]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1454]
rtStreamSynchronizeWithTimeout execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
Assert ((rt_ret) == 0) failed[FUNC:DoRtStreamSyncWithTimeout][FILE:utils.cc][LINE:40]
[Exec][Op]Execute op failed. op type = NonMaxSuppressionV3, ge result = 1343225857[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
./run_test.sh: line 54: 75357 Aborted (core dumped) python3.9 demo/api.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --with-box --opts train.init_checkpoint="/root/APE/model_final.pth" model.model_language.cache_dir="" model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True model.model_vision.backbone.net.xattn=False
二、软件版本:
-- CANN 版本: 7.0.RC1.10
-- Python 版本:3.9
-- 操作系统版本: Ubuntu 18.04
-- arch : x86_64
三、测试步骤
在910b上适配了APE大模型,并使用fastapi 代码进行测试:
# Copyright (c) Facebook, Inc. and its affiliates.
import argparse
import json
import multiprocessing as mp
import os
import tempfile
import time
import warnings
from collections import abc
import sys
import numpy as np
import tqdm
import torch
import torch_npu
from detectron2.config import LazyConfig, get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.evaluation.coco_evaluation import instances_to_coco_json
# from detectron2.projects.deeplab import add_deeplab_config
# from detectron2.projects.panoptic_deeplab import add_panoptic_deeplab_config
from detectron2.utils.logger import setup_logger
from predictor_lazy import VisualizationDemo
import base64
import io
import gc
import uvicorn
import requests
from ctypes import *
from PIL import Image
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
# constants
WINDOW_NAME = "APE"
def setup_cfg(args):
# load config from file and command-line arguments
cfg = LazyConfig.load(args.config_file)
print ("=========== args.opts ============", args.opts)
cfg = LazyConfig.apply_overrides(cfg, args.opts)
if "output_dir" in cfg.model:
cfg.model.output_dir = cfg.train.output_dir
if "model_vision" in cfg.model and "output_dir" in cfg.model.model_vision:
cfg.model.model_vision.output_dir = cfg.train.output_dir
if "train" in cfg.dataloader:
if isinstance(cfg.dataloader.train, abc.MutableSequence):
for i in range(len(cfg.dataloader.train)):
if "output_dir" in cfg.dataloader.train[i].mapper:
cfg.dataloader.train[i].mapper.output_dir = cfg.train.output_dir
else:
if "output_dir" in cfg.dataloader.train.mapper:
cfg.dataloader.train.mapper.output_dir = cfg.train.output_dir
if "model_vision" in cfg.model:
cfg.model.model_vision.test_score_thresh = args.confidence_threshold
else:
cfg.model.test_score_thresh = args.confidence_threshold
# default_setup(cfg, args)
setup_logger(name="ape")
setup_logger(name="timm")
return cfg
def get_parser():
parser = argparse.ArgumentParser(description="Detectron2 demo for builtin configs")
parser.add_argument(
"--config-file",
default="configs/quick_schedules/mask_rcnn_R_50_FPN_inference_acc_test.yaml",
metavar="FILE",
help="path to config file",
)
parser.add_argument("--webcam", action="store_true", help="Take inputs from webcam.")
parser.add_argument("--video-input", help="Path to video file.")
parser.add_argument(
"--input",
nargs="+",
help="A list of space separated input images; "
"or a single glob pattern such as 'directory/*.jpg'",
)
parser.add_argument(
"--output",
help="A file or directory to save output visualizations. "
"If not given, will show output in an OpenCV window.",
)
parser.add_argument(
"--confidence-threshold",
type=float,
default=0.1,
help="Minimum score for instance predictions to be shown",
)
parser.add_argument(
"--opts",
help="Modify config options using the command-line 'KEY VALUE' pairs",
default=[],
nargs=argparse.REMAINDER,
)
parser.add_argument("--text-prompt", default=None)
parser.add_argument("--with-box", action="store_true", help="show box of instance")
parser.add_argument("--with-mask", action="store_true", help="show mask of instance")
parser.add_argument("--with-sseg", action="store_true", help="show mask of class")
return parser
class Req(BaseModel):
image: str
text: str
@app.post('/infer')
def interface(req: Req):
image, text_prompt = req.image, req.text
if not image or not text_prompt or '<' in text_prompt or '>' in text_prompt:
return {"error": "input error"}
image = Image.open(io.BytesIO(base64.b64decode(image))).convert("RGB")
fn = time.time()
try:
images = []
os.makedirs('./tmp', exist_ok=True)
image_path = f"./tmp/{fn}.jpg"
image.save(image_path)
images.append(image_path)
for path in tqdm.tqdm(images, disable=not args.output):
# use PIL, to be consistent with evaluation
try:
img = read_image(path, format="BGR")
except Exception as e:
print("*" * 60)
print("fail to open image: ", e)
print("*" * 60)
continue
start_time = time.time()
predictions, visualized_output, visualized_outputs, metadata = demo.run_on_image(
img,
text_prompt=text_prompt,
with_box=args.with_box,
with_mask=args.with_mask,
with_sseg=args.with_sseg,
)
logger.info(
"{}: {} in {:.2f}s".format(
path,
"detected {} instances".format(len(predictions["instances"]))
if "instances" in predictions
else "finished",
time.time() - start_time,
)
)
results = []
if "instances" in predictions:
results = instances_to_coco_json(
predictions["instances"].to(demo.cpu_device), path
)
for result in results:
result["category_name"] = metadata.thing_classes[result["category_id"]]
result["image_name"] = result["image_id"]
if args.output:
os.makedirs(args.output, exist_ok=True)
if os.path.isdir(args.output):
assert os.path.isdir(args.output), args.output
out_filename = os.path.join(args.output, os.path.basename(path))
else:
assert len(args.input) == 1, "Please specify a directory with args.output"
out_filename = args.output
out_filename = out_filename.replace(".webp", ".png")
out_filename = out_filename.replace(".crdownload", ".png")
out_filename = out_filename.replace(".jfif", ".png")
visualized_output.save(out_filename)
for i in range(len(visualized_outputs)):
out_filename = (
os.path.join(args.output, os.path.basename(path)) + "." + str(i) + ".png"
)
visualized_outputs[i].save(out_filename)
with open(out_filename + ".json", "w") as outp:
json.dump(results, outp)
gc.collect()
finally:
os.remove(f'./tmp/{fn}.jpg')
return {'result': results}
if __name__ == "__main__":
import setproctitle
setproctitle.setproctitle("APE")
torch.npu.set_device('npu:1')
# init model
mp.set_start_method("spawn", force=True)
args = get_parser().parse_args()
setup_logger(name="fvcore")
setup_logger(name="ape")
logger = setup_logger()
logger.info("Arguments: " + str(args))
cfg = setup_cfg(args)
demo = VisualizationDemo(cfg, args=args)
uvicorn.run(app, port=8198, host="0.0.0.0")
启动该脚本后,通过jmter发压,在1并发和2并发时无异常,3并发之后开始报错,程序崩溃。
可以提供下plog日志吗
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
这是 ~/ascend/log/debug/plog 下的日志:
[EVENT] PROFILING(75357,python3.9):2024-04-10-06:46:13.281.165 [msprof_callback_impl.cpp:324] >>> (tid:75357) Started to register profiling ctrl callback.
[EVENT] PROFILING(75357,python3.9):2024-04-10-06:46:13.281.241 [msprof_callback_impl.cpp:331] >>> (tid:75357) Started to register profiling hash id callback.
[EVENT] PROFILING(75357,python3.9):2024-04-10-06:46:13.281.244 [prof_atls_plugin.cpp:83] >>> (tid:75357) RegisterProfileCallback, callback type is 7
[EVENT] PROFILING(75357,python3.9):2024-04-10-06:46:13.281.246 [msprof_callback_impl.cpp:338] >>> (tid:75357) Started to register profiling enable host freq callback.
[EVENT] PROFILING(75357,python3.9):2024-04-10-06:46:13.281.248 [prof_atls_plugin.cpp:83] >>> (tid:75357) RegisterProfileCallback, callback type is 8
[EVENT] RUNTIME(75357,python3.9):2024-04-10-06:46:13.326.030 [runtime.cc:4300] 75357 GetVisibleDevices: ASCEND_RT_VISIBLE_DEVICES param was not set
[EVENT] PROFILING(75357,python3.9):2024-04-10-06:46:13.326.609 [prof_atls_plugin.cpp:160] >>> (tid:75357) Module[7] register callback of ctrl handle.
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.262.283 [prof_atls_plugin.cpp:160] >>> (tid:75357) Module[48] register callback of ctrl handle.
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.262.792 [prof_atls_plugin.cpp:160] >>> (tid:75357) Module[45] register callback of ctrl handle.
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.331.890 [prof_atls_plugin.cpp:160] >>> (tid:75357) Module[6] register callback of ctrl handle.
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.515.199 [msprof_callback_impl.cpp:79] >>> (tid:75357) MsprofCtrlCallback called, type: 255
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.515.328 [ai_drv_dev_api.cpp:333] >>> (tid:75357) Succeeded to DrvGetApiVersion version: 0x71a09
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:46:15.735.923 [device.cc:341] 75357 Init: isDoubledie:0, topologytype:0
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:46:15.748.754 [npu_driver.cc:5301] 75487 GetDeviceStatus: GetDeviceStatus status=1.
[TRACE] GE(75357,cqy-APE):2024-04-10-06:46:15.752.580 [status:INIT] [ge_api.cc:206]75357 GEInitializeImpl:GEInitialize start
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.807.632 [msprof_callback_impl.cpp:79] >>> (tid:75357) MsprofCtrlCallback called, type: 255
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:46:15.807.643 [ai_drv_dev_api.cpp:333] >>> (tid:75357) Succeeded to DrvGetApiVersion version: 0x71a09
[TRACE] GE(75357,cqy-APE):2024-04-10-06:46:15.834.967 [status:RUNNING] [ge_api.cc:270]75357 GEInitializeImpl:Initializing environment
[EVENT] TUNE(75357,cqy-APE):2024-04-10-06:46:16.106.948 [cann_kb_pyfunc_mgr.cpp:72][CANNKB][Tid:75357]"CannKbPyfuncMgr: Enter PyObjectInit, reference_ is 0!"
[EVENT] TUNE(75357,cqy-APE):2024-04-10-06:46:16.106.966 [handle_manager.cpp:115][CANNKB][Tid:75357]"Start to run init functions to load dynamic python lib!"
[EVENT] TUNE(75357,cqy-APE):2024-04-10-06:46:16.107.027 [handle_manager.cpp:407][CANNKB][Tid:75357]"Init functions of loading dynamic python lib end!"
[EVENT] TUNE(75357,cqy-APE):2024-04-10-06:46:16.107.032 [cann_kb_pyfunc_mgr.cpp:24][CANNKB][Tid:75357]"CANN_KB_Py has already been initialized."
[EVENT] TUNE(75357,cqy-APE):2024-04-10-06:46:16.911.819 [cann_kb_pyfunc_mgr.cpp:117][CANNKB][Tid:75357]"CannKbPyfuncMgr: Run PyObjectInit successfully!"
[EVENT] TBE(75357,cqy-APE):2024-04-10-06:46:16.979.829 [../../../../../../latest/python/site-packages/tbe/common/repository_manager/utils/repository_manager_log.py:30][log] [../../../../../../latest/python/site-packages/tbe/common/repository_manager/route.py:312][repository_manager] get_compiler_core_num core_num = [8].
[EVENT] TBE(75357,cqy-APE):2024-04-10-06:46:18.250.012 [../../../../../../latest/python/site-packages/te_fusion/parallel_compilation.py:552][init] The time cost of random buffer compile is [84758] micro second.
[EVENT] TBE(75357,cqy-APE):2024-04-10-06:46:18.262.724 [../../../../../../latest/python/site-packages/te_fusion/parallel_compilation.py:552][init] The time cost of random buffer compile is [5776] micro second.
[TRACE] GE(75357,cqy-APE):2024-04-10-06:46:19.208.553 [status:STOP] [ge_api.cc:313]75357 GEInitializeImpl:GEInitialize finished
[EVENT] PROFILING(75357,cqy-APE):2024-04-10-06:47:13.517.563 [prof_atls_plugin.cpp:160] >>> (tid:75357) Module[61] register callback of ctrl handle.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.077.231 [logger.cc:1046] 75975 ModelBindStream: model_id=320, stream_id=1024, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.078.393 [logger.cc:1059] 75975 ModelUnbindStream: model_id=320, stream_id=1024,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.265.095 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.266.025 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.631.236 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.632.042 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.265.095 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.266.025 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.631.236 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.632.042 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.813.736 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:49.814.654 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.041.245 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.042.079 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.186.111 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.186.886 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.300.641 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.301.319 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.389.288 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.389.956 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.513.523 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.514.358 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.604.153 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.604.877 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.933.862 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:52:50.934.377 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:39.686.132 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:39.687.823 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:42.312.607 [logger.cc:1046] 75977 ModelBindStream: model_id=1856, stream_id=1408, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:42.313.745 [logger.cc:1059] 75977 ModelUnbindStream: model_id=1856, stream_id=1408,
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:42.326.588 [logger.cc:1046] 76243 ModelBindStream: model_id=1856, stream_id=1216, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:42.327.475 [logger.cc:1059] 76243 ModelUnbindStream: model_id=1856, stream_id=1216,
[ERROR] GE(75357,cqy-APE):2024-04-10-06:53:42.329.551 [infer_shape.cc:223]76243 SetOutputShape: ErrorNo: 4294967295(failed) [FINAL][FINAL][SetOutputShape][SetOutputShape_NonZero13_4430]Node[NonZero13] output[0] dim_num=[4294967295] is greater than MaxDimNum[8]
[ERROR] ASCENDCL(75357,cqy-APE):2024-04-10-06:53:42.329.637 [op_executor.cpp:377]76243 DoExecuteAsync: [FINAL][FINAL][Exec][Op]Execute op failed. op type = NonZero, ge result = 4294967295
[ERROR] GE(75357,cqy-APE):2024-04-10-06:53:45.779.732 [infer_shape.cc:223]76426 SetOutputShape: ErrorNo: 4294967295(failed) [SetOutputShape][SetOutputShape_NonZero13_4430]Node[NonZero13] output[0] dim_num=[4294967295] is greater than MaxDimNum[8]
[ERROR] ASCENDCL(75357,cqy-APE):2024-04-10-06:53:45.779.749 [op_executor.cpp:377]76426 DoExecuteAsync: [Exec][Op]Execute op failed. op type = NonZero, ge result = 4294967295
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:45.997.279 [logger.cc:1046] 75975 ModelBindStream: model_id=1600, stream_id=832, flag=0.
[EVENT] RUNTIME(75357,cqy-APE):2024-04-10-06:53:45.998.073 [logger.cc:1059] 75975 ModelUnbindStream: model_id=1600, stream_id=832,
日志显示算子报错,能抽出最小复现用例吗?
请问这个问题解决了吗,我也遇到类似的情况
登录 后才可以发表评论