登录
注册
开源
企业版
高校版
搜索
帮助中心
使用条款
关于我们
开源
企业版
高校版
私有云
模力方舟
AI 队友
登录
注册
轻量养虾,开箱即用!低 Token + 稳定算力,Gitee & 模力方舟联合出品的 PocketClaw 正式开售!点击了解详情~
代码拉取完成,页面将自动刷新
仓库状态说明
开源项目
>
人工智能
&&
捐赠
捐赠前请先登录
取消
前往登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
Watch
不关注
关注所有动态
仅关注版本发行动态
关注但不提醒动态
205
Star
1.3K
Fork
1.2K
Ascend
/
MindSpeed-LLM
暂停
代码
Issues
3
Pull Requests
32
Wiki
统计
流水线
服务
质量分析
Jenkins for Gitee
腾讯云托管
腾讯云 Serverless
悬镜安全
阿里云 SAE
Codeblitz
SBOM
开发画像分析
我知道了,不再自动展开
更新失败,请稍后重试!
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
MindIE执行Qwen2.5-7b报错 ,RuntimeError: call aclnnInplaceZero failed, detail:EZ9999: Inner Error!
DONE
#IBHHBV
缺陷
禾了个禾
创建于
2025-01-13 17:22
vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json  nbpu-smi info  命令: /usr/local/Ascend/mindie/latest/mindie-service/bin/mindieservice_daemon 报错日志: 2025-01-10 08:44:33,158 [INFO] [pid: 4601] cpu_binding.py-205: rank_id: 1, device_id: 1, numa_id: 2, shard_devices: [0, 1], cpus: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] 2025-01-10 08:44:33,160 [INFO] [pid: 4601] cpu_binding.py-231: process 4601, new_affinity is [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], cpu count 16 2025-01-10 08:44:33,345 [INFO] [pid: 4599] cpu_binding.py-205: rank_id: 0, device_id: 0, numa_id: 2, shard_devices: [0, 1], cpus: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] 2025-01-10 08:44:33,346 [INFO] [pid: 4599] cpu_binding.py-231: process 4599, new_affinity is [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], cpu count 16 2025-01-10 08:44:33,723 [INFO] [pid: 4599] logging.py-180: model_runner.quantize: None , model_runner.kv_quant_type: None , model_runner.dytpe: torch.bfloat16 2025-01-10 08:44:34,047 [ERROR] model.py:39 - [Model] >>> Exception:Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4603, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4603] 2025-01-10-08:44:33.431.327 The argument is invalid.Reason: Set device failed, invalid device, set device=2, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 2 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 66, in __init__ self.generator_backend = get_generator_backend(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 17, in get_generator_backend return generator_cls(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 48, in __init__ super().__init__(model_config) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 87, in __init__ self.model_wrapper = get_model_wrapper(model_config, backend_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 15, in get_model_wrapper return wrapper_cls(**model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 19, in __init__ self.model_runner = ModelRunner( ^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 103, in __init__ self.process_group, self.device = initialize_distributed(self.rank, self.npu_id, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/dist.py", line 78, in initialize_distributed torch.npu.set_device(device) File "/usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/npu/utils.py", line 57, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4603, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4603] 2025-01-10-08:44:33.431.327 The argument is invalid.Reason: Set device failed, invalid device, set device=2, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 2 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] 2025-01-10 08:44:34,051 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:34,263 [ERROR] model.py:39 - [Model] >>> Exception:Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4611, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4611] 2025-01-10-08:44:33.639.173 The argument is invalid.Reason: Set device failed, invalid device, set device=3, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 3 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 66, in __init__ self.generator_backend = get_generator_backend(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 17, in get_generator_backend return generator_cls(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 48, in __init__ super().__init__(model_config) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 87, in __init__ self.model_wrapper = get_model_wrapper(model_config, backend_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 15, in get_model_wrapper return wrapper_cls(**model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 19, in __init__ self.model_runner = ModelRunner( ^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 103, in __init__ self.process_group, self.device = initialize_distributed(self.rank, self.npu_id, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/dist.py", line 78, in initialize_distributed torch.npu.set_device(device) File "/usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/npu/utils.py", line 57, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4611, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4611] 2025-01-10-08:44:33.639.173 The argument is invalid.Reason: Set device failed, invalid device, set device=3, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 3 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] 2025-01-10 08:44:34,267 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:39,705 [INFO] [pid: 4599] dist.py-79: initialize_distributed has been Set 2025-01-10 08:44:39,706 [INFO] [pid: 4599] logging.py-180: init tokenizer done: Qwen2TokenizerFast(name_or_path='/home/ma-user/work/MindSpeed-LLM/model_from_hf/qwen2.5_7b_hf', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), } 2025-01-10 08:44:39,711 [INFO] [pid: 4599] logging.py-180: NPUSocInfo(soc_name='', soc_version=202, need_nz=True, matmul_nd_nz=False) 2025-01-10 08:44:39,769 [INFO] [pid: 4599] flash_causal_qwen2.py-103: >>>> qwen_DecoderModel is called. 2025-01-10 08:44:43,269 [INFO] [pid: 4601] dist.py-79: initialize_distributed has been Set 2025-01-10 08:44:43,300 [INFO] [pid: 4601] flash_causal_qwen2.py-103: >>>> qwen_DecoderModel is called. 2025-01-10 08:44:45,299 [INFO] [pid: 4599] logging.py-180: model: FlashQwen2ForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (transformer): FlashQwenModel( (wte): TensorParallelEmbedding() (h): ModuleList( (0-27): 28 x FlashQwenLayer( (attn): FlashQwenAttention( (rotary_emb): PositionRotaryEmbedding() (c_attn): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): QwenMLP( (act): SiLU() (w2_w1): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (ln_1): QwenRMSNorm() (ln_2): QwenRMSNorm() ) ) (ln_f): QwenRMSNorm() ) (lm_head): TensorParallelHead( (linear): FastLinear() ) ) 2025-01-10 08:44:45,708 [ERROR] model.py:39 - [Model] >>> Exception:call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4599] 2025-01-10-08:44:45.689.087 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:45 (PID:4599, Device:0, RankID:-1) ERR01100 OPS call acl api failed Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 101, in __init__ self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 245, in warm_up self.generator_backend.warm_up(model_inputs) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 128, in warm_up _ = self.forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 92, in forward logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 65, in forward logits = self.forward_tensor( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 92, in forward_tensor logits = self.model_runner.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 157, in forward return self.model.forward(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/base/flash_causal_lm.py", line 380, in forward self.init_ascend_weight() File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 136, in init_ascend_weight weight_wrapper = self.get_weights() ^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 120, in get_weights weight_wrapper = WeightWrapper(self.soc_info, self.tp_rank, attn_wrapper, mlp_wrapper) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/data/weight_wrapper.py", line 49, in __init__ self.placeholder = torch.zeros(1, dtype=torch.float16, device="npu") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4599] 2025-01-10-08:44:45.689.087 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:45 (PID:4599, Device:0, RankID:-1) ERR01100 OPS call acl api failed 2025-01-10 08:44:45,712 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:50,279 [ERROR] model.py:39 - [Model] >>> Exception:call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4601] 2025-01-10-08:44:50.271.416 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:50 (PID:4601, Device:1, RankID:-1) ERR01100 OPS call acl api failed Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 101, in __init__ self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 245, in warm_up self.generator_backend.warm_up(model_inputs) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 128, in warm_up _ = self.forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 92, in forward logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 65, in forward logits = self.forward_tensor( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 92, in forward_tensor logits = self.model_runner.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 157, in forward return self.model.forward(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/base/flash_causal_lm.py", line 380, in forward self.init_ascend_weight() File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 136, in init_ascend_weight weight_wrapper = self.get_weights() ^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 120, in get_weights weight_wrapper = WeightWrapper(self.soc_info, self.tp_rank, attn_wrapper, mlp_wrapper) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/data/weight_wrapper.py", line 49, in __init__ self.placeholder = torch.zeros(1, dtype=torch.float16, device="npu") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4601] 2025-01-10-08:44:50.271.416 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:50 (PID:4601, Device:1, RankID:-1) ERR01100 OPS call acl api failed 2025-01-10 08:44:50,287 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:50.383812 4400 error [llm_infer_model_instance.cpp:45] Initialize modelBackends_ failed. 2025-01-10 08:44:50.384340 4400 error [llm_infer_model.cpp:20] Init model instance failed. LLMInferEngine failed to init LLMInferModels ERR: Failed to init endpoint! Please check the service log or console output. Killed
vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json  nbpu-smi info  命令: /usr/local/Ascend/mindie/latest/mindie-service/bin/mindieservice_daemon 报错日志: 2025-01-10 08:44:33,158 [INFO] [pid: 4601] cpu_binding.py-205: rank_id: 1, device_id: 1, numa_id: 2, shard_devices: [0, 1], cpus: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] 2025-01-10 08:44:33,160 [INFO] [pid: 4601] cpu_binding.py-231: process 4601, new_affinity is [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], cpu count 16 2025-01-10 08:44:33,345 [INFO] [pid: 4599] cpu_binding.py-205: rank_id: 0, device_id: 0, numa_id: 2, shard_devices: [0, 1], cpus: [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] 2025-01-10 08:44:33,346 [INFO] [pid: 4599] cpu_binding.py-231: process 4599, new_affinity is [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79], cpu count 16 2025-01-10 08:44:33,723 [INFO] [pid: 4599] logging.py-180: model_runner.quantize: None , model_runner.kv_quant_type: None , model_runner.dytpe: torch.bfloat16 2025-01-10 08:44:34,047 [ERROR] model.py:39 - [Model] >>> Exception:Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4603, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4603] 2025-01-10-08:44:33.431.327 The argument is invalid.Reason: Set device failed, invalid device, set device=2, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 2 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 66, in __init__ self.generator_backend = get_generator_backend(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 17, in get_generator_backend return generator_cls(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 48, in __init__ super().__init__(model_config) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 87, in __init__ self.model_wrapper = get_model_wrapper(model_config, backend_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 15, in get_model_wrapper return wrapper_cls(**model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 19, in __init__ self.model_runner = ModelRunner( ^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 103, in __init__ self.process_group, self.device = initialize_distributed(self.rank, self.npu_id, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/dist.py", line 78, in initialize_distributed torch.npu.set_device(device) File "/usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/npu/utils.py", line 57, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4603, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4603] 2025-01-10-08:44:33.431.327 The argument is invalid.Reason: Set device failed, invalid device, set device=2, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 2 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] 2025-01-10 08:44:34,051 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:34,263 [ERROR] model.py:39 - [Model] >>> Exception:Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4611, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4611] 2025-01-10-08:44:33.639.173 The argument is invalid.Reason: Set device failed, invalid device, set device=3, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 3 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 66, in __init__ self.generator_backend = get_generator_backend(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 17, in get_generator_backend return generator_cls(model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 48, in __init__ super().__init__(model_config) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 87, in __init__ self.model_wrapper = get_model_wrapper(model_config, backend_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 15, in get_model_wrapper return wrapper_cls(**model_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 19, in __init__ self.model_runner = ModelRunner( ^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 103, in __init__ self.process_group, self.device = initialize_distributed(self.rank, self.npu_id, world_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/dist.py", line 78, in initialize_distributed torch.npu.set_device(device) File "/usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/npu/utils.py", line 57, in set_device torch_npu._C._npu_setDevice(device_id) RuntimeError: Initialize:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:221 NPU function error: c10_npu::SetDevice(device_id_), error code is 107001 [ERROR] 2025-01-10-08:44:34 (PID:4611, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: Invalid device ID. Check whether the device ID is valid. EE1001: [PID: 4611] 2025-01-10-08:44:33.639.173 The argument is invalid.Reason: Set device failed, invalid device, set device=3, valid device range is [0, 2) Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): rtSetDevice execute failed, reason=[device id error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] open device 3 failed, runtime result = 107001.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] 2025-01-10 08:44:34,267 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:39,705 [INFO] [pid: 4599] dist.py-79: initialize_distributed has been Set 2025-01-10 08:44:39,706 [INFO] [pid: 4599] logging.py-180: init tokenizer done: Qwen2TokenizerFast(name_or_path='/home/ma-user/work/MindSpeed-LLM/model_from_hf/qwen2.5_7b_hf', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), 151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), } 2025-01-10 08:44:39,711 [INFO] [pid: 4599] logging.py-180: NPUSocInfo(soc_name='', soc_version=202, need_nz=True, matmul_nd_nz=False) 2025-01-10 08:44:39,769 [INFO] [pid: 4599] flash_causal_qwen2.py-103: >>>> qwen_DecoderModel is called. 2025-01-10 08:44:43,269 [INFO] [pid: 4601] dist.py-79: initialize_distributed has been Set 2025-01-10 08:44:43,300 [INFO] [pid: 4601] flash_causal_qwen2.py-103: >>>> qwen_DecoderModel is called. 2025-01-10 08:44:45,299 [INFO] [pid: 4599] logging.py-180: model: FlashQwen2ForCausalLM( (rotary_embedding): PositionRotaryEmbedding() (attn_mask): AttentionMask() (transformer): FlashQwenModel( (wte): TensorParallelEmbedding() (h): ModuleList( (0-27): 28 x FlashQwenLayer( (attn): FlashQwenAttention( (rotary_emb): PositionRotaryEmbedding() (c_attn): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (mlp): QwenMLP( (act): SiLU() (w2_w1): TensorParallelColumnLinear( (linear): FastLinear() ) (c_proj): TensorParallelRowLinear( (linear): FastLinear() ) ) (ln_1): QwenRMSNorm() (ln_2): QwenRMSNorm() ) ) (ln_f): QwenRMSNorm() ) (lm_head): TensorParallelHead( (linear): FastLinear() ) ) 2025-01-10 08:44:45,708 [ERROR] model.py:39 - [Model] >>> Exception:call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4599] 2025-01-10-08:44:45.689.087 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:45 (PID:4599, Device:0, RankID:-1) ERR01100 OPS call acl api failed Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 101, in __init__ self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 245, in warm_up self.generator_backend.warm_up(model_inputs) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 128, in warm_up _ = self.forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 92, in forward logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 65, in forward logits = self.forward_tensor( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 92, in forward_tensor logits = self.model_runner.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 157, in forward return self.model.forward(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/base/flash_causal_lm.py", line 380, in forward self.init_ascend_weight() File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 136, in init_ascend_weight weight_wrapper = self.get_weights() ^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 120, in get_weights weight_wrapper = WeightWrapper(self.soc_info, self.tp_rank, attn_wrapper, mlp_wrapper) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/data/weight_wrapper.py", line 49, in __init__ self.placeholder = torch.zeros(1, dtype=torch.float16, device="npu") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4599] 2025-01-10-08:44:45.689.087 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:45 (PID:4599, Device:0, RankID:-1) ERR01100 OPS call acl api failed 2025-01-10 08:44:45,712 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:50,279 [ERROR] model.py:39 - [Model] >>> Exception:call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4601] 2025-01-10-08:44:50.271.416 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:50 (PID:4601, Device:1, RankID:-1) ERR01100 OPS call acl api failed Traceback (most recent call last): File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/model.py", line 37, in initialize return self.python_model.initialize(config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 144, in initialize self.generator = Generator( ^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 101, in __init__ self.warm_up(max_prefill_tokens, max_seq_len, max_input_len, max_iter_times, inference_mode) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 245, in warm_up self.generator_backend.warm_up(model_inputs) File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 128, in warm_up _ = self.forward(model_inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/utils/decorators/time_decorator.py", line 38, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 92, in forward logits = self.model_wrapper.forward(model_inputs, self.cache_pool.npu_cache, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 65, in forward logits = self.forward_tensor( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/python3.11.10/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 92, in forward_tensor logits = self.model_runner.forward( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/runner/model_runner.py", line 157, in forward return self.model.forward(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/base/flash_causal_lm.py", line 380, in forward self.init_ascend_weight() File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 136, in init_ascend_weight weight_wrapper = self.get_weights() ^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/models/qwen2/flash_causal_qwen2.py", line 120, in get_weights weight_wrapper = WeightWrapper(self.soc_info, self.tp_rank, attn_wrapper, mlp_wrapper) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Ascend/llm_model/atb_llm/utils/data/weight_wrapper.py", line 49, in __init__ self.placeholder = torch.zeros(1, dtype=torch.float16, device="npu") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: call aclnnInplaceZero failed, detail:EZ9999: Inner Error! EZ9999: [PID: 4601] 2025-01-10-08:44:50.271.416 Parse dynamic kernel config fail. TraceBack (most recent call last): AclOpKernelInit failed opType ZerosLike ADD_TO_LAUNCHER_LIST_AICORE failed. [ERROR] 2025-01-10-08:44:50 (PID:4601, Device:1, RankID:-1) ERR01100 OPS call acl api failed 2025-01-10 08:44:50,287 [ERROR] model.py:42 - [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'} 2025-01-10 08:44:50.383812 4400 error [llm_infer_model_instance.cpp:45] Initialize modelBackends_ failed. 2025-01-10 08:44:50.384340 4400 error [llm_infer_model.cpp:20] Init model instance failed. LLMInferEngine failed to init LLMInferModels ERR: Failed to init endpoint! Please check the service log or console output. Killed
评论 (
2
)
登录
后才可以发表评论
状态
DONE
TODO
WIP
DONE
CLOSED
REJECTED
负责人
未设置
标签
未设置
项目
未立项任务
未立项任务
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
分支 (
-
)
标签 (
-
)
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
预计工期
(小时)
参与者(2)
Python
1
https://gitee.com/ascend/MindSpeed-LLM.git
git@gitee.com:ascend/MindSpeed-LLM.git
ascend
MindSpeed-LLM
MindSpeed-LLM
点此查找更多帮助
搜索帮助
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
评论
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册