74 Star 572 Fork 1.1K

Ascend/pytorch

 / 详情

模型推理出现问题

DONE
需求
创建于  
2024-06-20 16:34

系统:eulerosv2r10.aarch64
硬件设备:昇腾910B4,Kunpeng-920
运行环境:
langchain-0.0.354
sentence-transformers-2.10.0
torch-2.1.2
torch-npu-2.1.0rc1.post20231013
torchvision-0.16.2
tornado-6.3.2
cann-7.0.1
问题场景:
能够成功启动服务,已经成功调通接口访问LLM的问答能力,但是在一次的调用过程中报错。
报错内容:
xxx
...
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/functional.py", line 2233, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Sync:/usr1/02/workspace/j_yxiCvvHE/pytorch/torch_npu/csrc/framework/OpCommand.cpp:162 NPU error, error code is 507035
[Error]: The vector core execution is abnormal.
Rectify the fault based on the error information in the log, or you can ask us at follwing gitee link by issues: https://gitee.com/ascend/pytorch/issue
EZ9999: Inner Error!
EZ9999 The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 0, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x10bf50ab4, mte error info: 0xc9501de1e7, ifu error info: 0x61db14d4a4b40, ccu error info: 0x3a7eb41945800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x1de1e7, fixp_error1 info: 0xc9 fsmId:0, tslot:0, thread:0, ctxid:0, blk:36, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 1, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x9ecf738099, ifu error info: 0x8f1602bdff40, ccu error info: 0x93f0783b45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0xf738099, fixp_error1 info: 0x9e fsmId:0, tslot:0, thread:0, ctxid:0, blk:37, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 2, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x5f994b3ced, ifu error info: 0x7f70ce860400, ccu error info: 0x1808f3f145800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0x94b3ced, fixp_error1 info: 0x5f fsmId:0, tslot:0, thread:0, ctxid:0, blk:38, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 3, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e74, vec error info: 0x81167c0060, mte error info: 0xf332e28f2c, ifu error info: 0x57f4e88ed5800, ccu error info: 0x8fffa99345800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x2e28f2c, fixp_error1 info: 0xf3 fsmId:0, tslot:0, thread:0, ctxid:0, blk:39, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 4, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0xc18ddfb2b1, ifu error info: 0x442e7c0868400, ccu error info: 0x11f6a4b045800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0xddfb2b1, fixp_error1 info: 0xc1 fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 5, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x7b77dfe04, ifu error info: 0x51f563c907800, ccu error info: 0xc680b83a45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0x77dfe04, fixp_error1 info: 0x7 fsmId:1, tslot:0, thread:0, ctxid:0, blk:1, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 6, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x401dd74fb3, mte error info: 0x73c0a5b9ef, ifu error info: 0x561f6722ccec0, ccu error info: 0xba01913045800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xa5b9ef, fixp_error1 info: 0x73 fsmId:1, tslot:0, thread:0, ctxid:0, blk:2, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 7, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x610abe99d0, mte error info: 0x308eb0f53e, ifu error info: 0x5260850379500, ccu error info: 0x2c90888a45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xeb0f53e, fixp_error1 info: 0x30 fsmId:1, tslot:0, thread:0, ctxid:0, blk:3, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 8, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x480637358b, mte error info: 0xabc21a0039, ifu error info: 0x43fe3c9effc40, ccu error info: 0xf7413af345800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x21a0039, fixp_error1 info: 0xab fsmId:1, tslot:0, thread:0, ctxid:0, blk:4, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 9, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x173ff099a9, ifu error info: 0x58de040818500, ccu error info: 0x6e1c11d645800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0xff099a9, fixp_error1 info: 0x17 fsmId:1, tslot:0, thread:0, ctxid:0, blk:5, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 10, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xe906566789, mte error info: 0xf09fd37d39, ifu error info: 0x6363df069200, ccu error info: 0xbae030a445800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xfd37d39, fixp_error1 info: 0xf0 fsmId:1, tslot:0, thread:0, ctxid:0, blk:6, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 11, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x7a0abc6e7c, mte error info: 0x9f773ef8a6, ifu error info: 0x77f747c415300, ccu error info: 0x2632103345800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x73ef8a6, fixp_error1 info: 0x9f fsmId:1, tslot:0, thread:0, ctxid:0, blk:7, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 12, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x731513b032, mte error info: 0xfdffd91378, ifu error info: 0x7452a66bd2640, ccu error info: 0xea01083c45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xfd91378, fixp_error1 info: 0xfd fsmId:1, tslot:0, thread:0, ctxid:0, blk:8, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 13, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x7715bedb00, mte error info: 0xbb259b1177, ifu error info: 0x7bb2cda8ee600, ccu error info: 0xc8fb992045800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x59b1177, fixp_error1 info: 0xbb fsmId:1, tslot:0, thread:0, ctxid:0, blk:9, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 14, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x4014571f8a, mte error info: 0xffe27f7321, ifu error info: 0x427609684b500, ccu error info: 0x1353187845800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x27f7321, fixp_error1 info: 0xff fsmId:1, tslot:0, thread:0, ctxid:0, blk:10, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 15, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x3a0aa0dd28, mte error info: 0xb7f86a0df0, ifu error info: 0x2cc044738cd40, ccu error info: 0xd8e22bf145800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x86a0df0, fixp_error1 info: 0xb7 fsmId:1, tslot:0, thread:0, ctxid:0, blk:11, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 20, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x3ff5bde31f, ifu error info: 0x17feaeb406e00, ccu error info: 0x8bd9216845800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0x5bde31f, fixp_error1 info: 0x3f fsmId:1, tslot:0, thread:0, ctxid:0, blk:12, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 21, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xe90f9be718, mte error info: 0xbffcd31f4, ifu error info: 0x19eca0d9c0400, ccu error info: 0x41819e4345800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xfcd31f4, fixp_error1 info: 0xb fsmId:1, tslot:0, thread:0, ctxid:0, blk:13, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 22, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x421ab23085, mte error info: 0x930f9961df, ifu error info: 0x471e15b9be0c0, ccu error info: 0x2e0e740745800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xf9961df, fixp_error1 info: 0x93 fsmId:1, tslot:0, thread:0, ctxid:0, blk:14, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 23, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xd911258a3c, mte error info: 0x8bbb1ffd33, ifu error info: 0x4db20dc544e00, ccu error info: 0x77f98bf345800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xb1ffd33, fixp_error1 info: 0x8b fsmId:1, tslot:0, thread:0, ctxid:0, blk:15, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 24, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xff08d60f0d, mte error info: 0xfdd7fd91fd, ifu error info: 0x5d63b2294240, ccu error info: 0xb9e408cc45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x7fd91fd, fixp_error1 info: 0xfd fsmId:1, tslot:0, thread:0, ctxid:0, blk:16, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 25, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xf813c7af94, mte error info: 0xbb5f90fcbd, ifu error info: 0x29f1fe8456000, ccu error info: 0xececf99145800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xf90fcbd, fixp_error1 info: 0xbb fsmId:1, tslot:0, thread:0, ctxid:0, blk:17, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 26, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x7011114019, mte error info: 0xe1f7f8718f, ifu error info: 0x49bbff300bec0, ccu error info: 0x61319c9445800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x7f8718f, fixp_error1 info: 0xe1 fsmId:1, tslot:0, thread:0, ctxid:0, blk:18, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 27, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x7a06173408, mte error info: 0x16ac035020, ifu error info: 0x7e4de96440000, ccu error info: 0x9a6f06f745800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xc035020, fixp_error1 info: 0x16 fsmId:1, tslot:0, thread:0, ctxid:0, blk:19, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 32, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x43111cd8eb, mte error info: 0x77aff3e9ff, ifu error info: 0x5c08005e061c0, ccu error info: 0x1f0c06c945800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xff3e9ff, fixp_error1 info: 0x77 fsmId:1, tslot:0, thread:0, ctxid:0, blk:20, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 33, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xfb04910480, mte error info: 0x7e79bfe047, ifu error info: 0x6ac1246df1140, ccu error info: 0xbbbc3fed45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x9bfe047, fixp_error1 info: 0x7e fsmId:1, tslot:0, thread:0, ctxid:0, blk:21, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 34, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222ef8, vec error info: 0x7114150e12, mte error info: 0xa0cef1f7d1, ifu error info: 0x4e7501d48eb80, ccu error info: 0x335021e9458000b4, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xef1f7d1, fixp_error1 info: 0xa0 fsmId:1, tslot:0, thread:0, ctxid:0, blk:22, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 35, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0xdbfb2fbdfd, ifu error info: 0xabaf570444c0, ccu error info: 0x7ccb196245800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0xb2fbdfd, fixp_error1 info: 0xdb fsmId:1, tslot:0, thread:0, ctxid:0, blk:23, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 36, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x17f3fe2bdd, ifu error info: 0x77570982449c0, ccu error info: 0x117da91f45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0x3fe2bdd, fixp_error1 info: 0x17 fsmId:1, tslot:0, thread:0, ctxid:0, blk:24, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 37, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x7915e4282c, mte error info: 0xb95a72fa5, ifu error info: 0x5e73b53638000, ccu error info: 0x720181c845800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x5a72fa5, fixp_error1 info: 0xb fsmId:1, tslot:0, thread:0, ctxid:0, blk:25, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 38, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xf1125f8b26, mte error info: 0xfcfb1eec5f, ifu error info: 0x1b95dd62d5580, ccu error info: 0x388d140545800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xb1eec5f, fixp_error1 info: 0xfc fsmId:1, tslot:0, thread:0, ctxid:0, blk:26, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 39, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x291ddc262a, mte error info: 0xd9f1e53c69, ifu error info: 0x69781d812d6c0, ccu error info: 0x48e814c545800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x1e53c69, fixp_error1 info: 0xd9 fsmId:1, tslot:0, thread:0, ctxid:0, blk:27, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 40, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0xfff3edfeaf, ifu error info: 0x421a19c375e80, ccu error info: 0xbe89013045800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0x3edfeaf, fixp_error1 info: 0xff fsmId:1, tslot:0, thread:0, ctxid:0, blk:28, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 41, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xd710978c08, mte error info: 0x13c33f0ba8, ifu error info: 0x5cec1dcde2140, ccu error info: 0x92b4b8fa45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x33f0ba8, fixp_error1 info: 0x13 fsmId:1, tslot:0, thread:0, ctxid:0, blk:29, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 42, error code = 0x4000000000000000, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x8b00000099, mte error info: 0x6f0878b23e, ifu error info: 0x6e248a5905040, ccu error info: 0x9450820d45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0x4000000000000000, 0x4000, 0) errorStr: VEC instruction error: the ub address out of bounds.CCU instruction address check error. fixp_error0 info: 0x878b23e, fixp_error1 info: 0x6f fsmId:1, tslot:0, thread:0, ctxid:0, blk:30, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 43, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xe31ad67105, mte error info: 0x2377fe3dec, ifu error info: 0x4f34cda388400, ccu error info: 0x6f205fb045800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x7fe3dec, fixp_error1 info: 0x23 fsmId:1, tslot:0, thread:0, ctxid:0, blk:31, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 44, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e74, vec error info: 0x7310c02544, mte error info: 0x9c72aa316f, ifu error info: 0x73ce9d0322780, ccu error info: 0x1abe986445800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0x2aa316f, fixp_error1 info: 0x9c fsmId:1, tslot:0, thread:0, ctxid:0, blk:32, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 45, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x64150fbd3b, mte error info: 0xdff0ff2ff0, ifu error info: 0x128971cffc400, ccu error info: 0xb708312d45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xff2ff0, fixp_error1 info: 0xdf fsmId:1, tslot:0, thread:0, ctxid:0, blk:33, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 46, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0xef035f320c, mte error info: 0x13ff389bff, ifu error info: 0x1519ca2fcc440, ccu error info: 0x6df4603f45800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xf389bff, fixp_error1 info: 0x13 fsmId:1, tslot:0, thread:0, ctxid:0, blk:34, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
The error from device(chipId:7, dieId:0), serial number is 1, there is an aivec error exception, core id is 47, error code = 0, dump info: pc start: 0x1240c02226d0, current: 0x1240c0222e80, vec error info: 0x7004415801, mte error info: 0x3adf9cbd70, ifu error info: 0x6e75160bfb900, ccu error info: 0x34f00ef045800099, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd000288, para base: 0x12410071a000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1138]
The extend info: errcode:(0, 0x4000, 0) errorStr: CCU instruction address check error. fixp_error0 info: 0xf9cbd70, fixp_error1 info: 0x3a fsmId:1, tslot:0, thread:0, ctxid:0, blk:35, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1150]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1612]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1483]
Aicore kernel execute failed, device_id=0, stream_id=2, report_stream_id=2, task_id=27752, flip_num=6420, fault kernel_name=GatherV2_2d13e5f1492a4db5869bd6ff10d0d1ac_high_precision_900011010, program id=11, hash=6982118785180833281.[FUNC:GetError][FILE:stream.cc][LINE:1483]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1483]
rtStreamSynchronizeWithTimeout execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

评论 (2)

Loki 创建了需求 11个月前
Loki 修改了描述 11个月前
展开全部操作日志

当前报错显示算子问题,麻烦提供下详细plog日志,协助进一步定位,或者可以更新cann版本看看问题是否还复现

huangyunlong 任务状态TODO 修改为DONE 10个月前

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(2)
huangyunlong-huangyunlong2022 Loki-luo-jihao
Python
1
https://gitee.com/ascend/pytorch.git
git@gitee.com:ascend/pytorch.git
ascend
pytorch
pytorch

搜索帮助