ascend裸机运行vllm失败，transdata算子报错

![输入图片说明](https://foruda.gitee.com/images/1765958349036311333/449d8448_11596285.png "BD411F44-A82D-45E6-B819-53DBF8E22424.png")
root@674f5f297e81:/workspace# vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct --host 0.0.0.0 --port 8000 --tensor-parallel-size 4 --enable-expert-parallel --max-num-seqs 16 --max-num-batched-tokens 4096 --trust-remote-code --no-enable-prefix-caching --gpu-memory-utilization 0.8 --max-model-len 32768 --served-model-name qwen3-vl
INFO 12-17 02:49:07 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:49:07 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:49:07 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:49:07 [__init__.py:207] Platform plugin ascend is activated
WARNING 12-17 02:49:13 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(APIServer pid=106356) INFO 12-17 02:49:15 [api_server.py:1839] vLLM API server version 0.11.0rc3
(APIServer pid=106356) INFO 12-17 02:49:15 [utils.py:233] non-default args: {'model_tag': 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'host': '0.0.0.0', 'model': 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'trust_remote_code': True, 'max_model_len': 32768, 'served_model_name': ['qwen3-vl'], 'tensor_parallel_size': 4, 'enable_expert_parallel': True, 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 4096, 'max_num_seqs': 16}
(APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(APIServer pid=106356) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(APIServer pid=106356) INFO 12-17 02:49:36 [model.py:547] Resolved architecture: Qwen3VLMoeForConditionalGeneration
(APIServer pid=106356) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=106356) INFO 12-17 02:49:36 [model.py:1510] Using max model len 32768
(APIServer pid=106356) INFO 12-17 02:49:36 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=4096.
(APIServer pid=106356) INFO 12-17 02:49:36 [platform.py:141] Non-MLA LLMs forcibly disable the chunked prefill feature,as the performance of operators supporting this feature functionality is currently suboptimal.
(APIServer pid=106356) INFO 12-17 02:49:36 [platform.py:227] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode
(APIServer pid=106356) INFO 12-17 02:49:36 [utils.py:357] Calculated maximum supported batch sizes for ACL graph: 11
(APIServer pid=106356) WARNING 12-17 02:49:36 [utils.py:360] Currently, communication is performed using FFTS+ method, which reduces the number of available streams and, as a result, limits the range of runtime shapes that can be handled. To both improve communication performance and increase the number of supported shapes, set HCCL_OP_EXPANSION_MODE=AIV.
(APIServer pid=106356) INFO 12-17 02:49:36 [utils.py:390] No adjustment needed for ACL graph batch sizes: Qwen3VLMoeForConditionalGeneration model (layers: 48) with 7 sizes
(APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
INFO 12-17 02:49:48 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:49:48 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:49:48 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:49:48 [__init__.py:207] Platform plugin ascend is activated
WARNING 12-17 02:49:54 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
(EngineCore_DP0 pid=106762) INFO 12-17 02:49:54 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(EngineCore_DP0 pid=106762) INFO 12-17 02:49:55 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc3) with config: model='Qwen/Qwen3-VL-30B-A3B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen3-VL-30B-A3B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=qwen3-vl, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.unified_ascend_attention_with_output","vllm.mla_forward"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":32,"local_cache_dir":null}
(EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(EngineCore_DP0 pid=106762) INFO 12-17 02:49:55 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3], buffer_handle=(4, 16777216, 10, 'psm_fd6690f9'), local_subscribe_addr='ipc:///tmp/527bc7ed-8e43-4e18-a8b6-5ac75098f56f', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated
INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated
INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated
INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated
WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated
INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated
INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated
INFO 12-17 02:50:29 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 12-17 02:50:29 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 12-17 02:50:29 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 12-17 02:50:29 [__init__.py:207] Platform plugin ascend is activated
WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 12-17 02:50:37 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_595d3aa5'), local_subscribe_addr='ipc:///tmp/da85f15a-38da-4e46-b555-bcf4e415f5e5', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_1cda08d7'), local_subscribe_addr='ipc:///tmp/04d255b3-7ac4-4290-adcb-fb946ec7c851', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_615a67c3'), local_subscribe_addr='ipc:///tmp/5d47fbea-d8c2-4a52-97b2-0d8ee955071b', remote_subscribe_addr=None, remote_addr_ipv6=False)
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_3d4d7667'), local_subscribe_addr='ipc:///tmp/f4456559-0ee5-4548-a7b7-3d4bacd040ca', remote_subscribe_addr=None, remote_addr_ipv6=False)
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
INFO 12-17 02:50:45 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_55ee9d5d'), local_subscribe_addr='ipc:///tmp/dd3f793d-ebfe-4eb4-bc9e-64b091bb80cd', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 12-17 02:50:45 [parallel_state.py:1208] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 12-17 02:50:45 [parallel_state.py:1208] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
INFO 12-17 02:50:45 [parallel_state.py:1208] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
INFO 12-17 02:50:45 [parallel_state.py:1208] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
.(Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:09 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct...
.(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:12 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct...
..(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:16 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct...
.(Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:18 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct...
(Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:20 [layer.py:1052] [EP Rank 2/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->64, 1->65, 2->66, 3->67, 4->68, 5->69, 6->70, 7->71, 8->72, 9->73, 10->74, 11->75, 12->76, 13->77, 14->78, 15->79, 16->80, 17->81, 18->82, 19->83, 20->84, 21->85, 22->86, 23->87, 24->88, 25->89, 26->90, 27->91, 28->92, 29->93, 30->94, 31->95.
(Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:20 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it.
..(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:24 [layer.py:1052] [EP Rank 1/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->32, 1->33, 2->34, 3->35, 4->36, 5->37, 6->38, 7->39, 8->40, 9->41, 10->42, 11->43, 12->44, 13->45, 14->46, 15->47, 16->48, 17->49, 18->50, 19->51, 20->52, 21->53, 22->54, 23->55, 24->56, 25->57, 26->58, 27->59, 28->60, 29->61, 30->62, 31->63.
(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:24 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it.
.(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:27 [layer.py:1052] [EP Rank 3/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->96, 1->97, 2->98, 3->99, 4->100, 5->101, 6->102, 7->103, 8->104, 9->105, 10->106, 11->107, 12->108, 13->109, 14->110, 15->111, 16->112, 17->113, 18->114, 19->115, 20->116, 21->117, 22->118, 23->119, 24->120, 25->121, 26->122, 27->123, 28->124, 29->125, 30->126, 31->127.
(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:27 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it.
(Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:32 [layer.py:1052] [EP Rank 0/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->0, 1->1, 2->2, 3->3, 4->4, 5->5, 6->6, 7->7, 8->8, 9->9, 10->10, 11->11, 12->12, 13->13, 14->14, 15->15, 16->16, 17->17, 18->18, 19->19, 20->20, 21->21, 22->22, 23->23, 24->24, 25->25, 26->26, 27->27, 28->28, 29->29, 30->30, 31->31.
(Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:32 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it.
...(Worker_TP0_EP0 pid=106900) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
Loading safetensors checkpoint shards:   0% Completed | 0/13 [00:00<?, ?it/s]
(Worker_TP1_EP1 pid=106901) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(Worker_TP2_EP2 pid=106902) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
(Worker_TP3_EP3 pid=106903) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct
Loading safetensors checkpoint shards:   8% Completed | 1/13 [00:06<01:12,  6.03s/it]
Loading safetensors checkpoint shards:  15% Completed | 2/13 [00:07<00:38,  3.49s/it]
Loading safetensors checkpoint shards:  23% Completed | 3/13 [00:13<00:46,  4.69s/it]
Loading safetensors checkpoint shards:  31% Completed | 4/13 [00:19<00:46,  5.18s/it]
Loading safetensors checkpoint shards:  38% Completed | 5/13 [00:24<00:39,  4.96s/it]
Loading safetensors checkpoint shards:  46% Completed | 6/13 [00:30<00:36,  5.27s/it]
Loading safetensors checkpoint shards:  54% Completed | 7/13 [00:36<00:32,  5.46s/it]
Loading safetensors checkpoint shards:  62% Completed | 8/13 [00:42<00:28,  5.62s/it]
Loading safetensors checkpoint shards:  69% Completed | 9/13 [00:47<00:22,  5.72s/it]
Loading safetensors checkpoint shards:  77% Completed | 10/13 [00:53<00:17,  5.77s/it]
Loading safetensors checkpoint shards:  85% Completed | 11/13 [00:59<00:11,  5.81s/it]
Loading safetensors checkpoint shards:  92% Completed | 12/13 [01:05<00:05,  5.90s/it]
Loading safetensors checkpoint shards: 100% Completed | 13/13 [01:11<00:00,  5.91s/it]
Loading safetensors checkpoint shards: 100% Completed | 13/13 [01:11<00:00,  5.52s/it]
(Worker_TP0_EP0 pid=106900) 
(Worker_TP0_EP0 pid=106900) INFO 12-17 02:52:47 [default_loader.py:267] Loading weights took 71.87 seconds
.(Worker_TP2_EP2 pid=106902) INFO 12-17 02:52:48 [default_loader.py:267] Loading weights took 71.21 seconds
(Worker_TP1_EP1 pid=106901) INFO 12-17 02:52:48 [default_loader.py:267] Loading weights took 72.10 seconds
..(Worker_TP3_EP3 pid=106903) INFO 12-17 02:52:49 [default_loader.py:267] Loading weights took 71.74 seconds
.(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     self.worker.load_model()
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 291, in load_model
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     self.model_runner.load_model()
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2630, in load_model
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     self.model = get_model(vllm_config=self.vllm_config)
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 119, in get_model
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     return loader.load_model(vllm_config=vllm_config,
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     process_weights_after_loading(model, model_config, target_device)
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     quant_method.process_weights_after_loading(module)
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/common_fused_moe.py", line 129, in process_weights_after_loading
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     layer.w13_weight.data = torch_npu.npu_format_cast(
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1158, in __call__
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]     return self._op(*args, **(kwargs or {}))
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:234 OPS function error: Identity, error code is 500002
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [ERROR] 2025-12-17-02:52:49 (PID:106900, Device:0, RankID:-1) ERR01100 OPS call acl api failed
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Error]: A GE error occurs in the system.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         Rectify the fault based on the error information in the ascend log.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] EZ3002: [PID: 106900] 2025-12-17-02:52:49.499.381 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         TraceBack (most recent call last):
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         Optype [TransData] of Ops kernel [aicpu_ascend_kernel] is unsupported. Reason: Transdata op, groups should be greater than 1, but now is 1.
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         No supported Ops kernel and engine are found for [trans_TransData_42], optype [TransData].
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148]
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122]
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         build graph failed, graph id:49, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1624]
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597]         build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] 
(Worker_TP3_EP3 pid=106903) INFO 12-17 02:52:50 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP0_EP0 pid=106900) INFO 12-17 02:52:50 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP1_EP1 pid=106901) INFO 12-17 02:52:51 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP1_EP1 pid=106901) Exception ignored in: <finalize object at 0xfffe61e2f680; dead>
(Worker_TP1_EP1 pid=106901) Traceback (most recent call last):
(Worker_TP1_EP1 pid=106901)   File "/usr/local/python3.11.13/lib/python3.11/weakref.py", line 585, in __call__
(Worker_TP1_EP1 pid=106901)     def __call__(self, _=None):
(Worker_TP1_EP1 pid=106901) 
(Worker_TP1_EP1 pid=106901)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 538, in signal_handler
(Worker_TP1_EP1 pid=106901)     raise SystemExit()
(Worker_TP1_EP1 pid=106901) SystemExit: 
(Worker_TP2_EP2 pid=106902) INFO 12-17 02:52:51 [multiproc_executor.py:558] Parent process exited, terminating worker
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] WorkerProc failed to start.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Traceback (most recent call last):
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     self.worker.load_model()
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 291, in load_model
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     self.model_runner.load_model()
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2630, in load_model
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     self.model = get_model(vllm_config=self.vllm_config)
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 119, in get_model
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     return loader.load_model(vllm_config=vllm_config,
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     process_weights_after_loading(model, model_config, target_device)
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     quant_method.process_weights_after_loading(module)
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/common_fused_moe.py", line 129, in process_weights_after_loading
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     layer.w13_weight.data = torch_npu.npu_format_cast(
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1158, in __call__
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]     return self._op(*args, **(kwargs or {}))
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:234 OPS function error: Identity, error code is 500002
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [ERROR] 2025-12-17-02:52:50 (PID:106901, Device:1, RankID:-1) ERR01100 OPS call acl api failed
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Error]: A GE error occurs in the system.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         Rectify the fault based on the error information in the ascend log.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] EZ3002: [PID: 106901] 2025-12-17-02:52:50.062.241 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         TraceBack (most recent call last):
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         Optype [TransData] of Ops kernel [aicpu_ascend_kernel] is unsupported. Reason: Transdata op, groups should be greater than 1, but now is 1.
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         No supported Ops kernel and engine are found for [trans_TransData_42], optype [TransData].
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148]
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122]
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         build graph failed, graph id:49, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1624]
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597]         build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
(Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] 
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]   File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708]     raise e from None
(EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=106762) Process EngineCore_DP0:
(EngineCore_DP0 pid=106762) Traceback (most recent call last):
(EngineCore_DP0 pid=106762)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=106762)     self.run()
(EngineCore_DP0 pid=106762)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=106762)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=106762)     raise e
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=106762)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=106762)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=106762)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=106762)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=106762)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=106762)     self._init_executor()
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
(EngineCore_DP0 pid=106762)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=106762)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=106762)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
(EngineCore_DP0 pid=106762)     raise e from None
(EngineCore_DP0 pid=106762) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=106356) Traceback (most recent call last):
(APIServer pid=106356)   File "/usr/local/python3.11.13/bin/vllm", line 8, in <module>
(APIServer pid=106356)     sys.exit(main())
(APIServer pid=106356)              ^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=106356)     args.dispatch_function(args)
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=106356)     uvloop.run(run_server(args))
(APIServer pid=106356)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run
(APIServer pid=106356)     return runner.run(wrapper())
(APIServer pid=106356)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=106356)     return self._loop.run_until_complete(task)
(APIServer pid=106356)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=106356)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=106356)     return await main
(APIServer pid=106356)            ^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=106356)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=106356)     async with build_async_engine_client(
(APIServer pid=106356)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=106356)     return await anext(self.gen)
(APIServer pid=106356)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=106356)     async with build_async_engine_client_from_engine_args(
(APIServer pid=106356)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=106356)     return await anext(self.gen)
(APIServer pid=106356)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=106356)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=106356)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 1571, in inner
(APIServer pid=106356)     return fn(*args, **kwargs)
(APIServer pid=106356)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=106356)     return cls(
(APIServer pid=106356)            ^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=106356)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=106356)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=106356)     return AsyncMPClient(*client_args)
(APIServer pid=106356)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=106356)     super().__init__(
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=106356)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=106356)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__
(APIServer pid=106356)     next(self.gen)
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=106356)     wait_for_engine_startup(
(APIServer pid=106356)   File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=106356)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=106356) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
(APIServer pid=106356) [ERROR] 2025-12-17-02:52:57 (PID:106356, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
root@674f5f297e81:/workspace# /usr/local/python3.11.13/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

报错关键行：
EZ3002: [PID: 106900] 2025-12-17-02:52:49.499.381 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library.

应该是Transdata这个算子没有在镜像中。以下是我的启动步骤：
export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:v0.11.0rc0
docker run --rm \
--name qwenvl3_30b \
--privileged=true \
--ipc=host \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci4 \
--device /dev/davinci5 \
--device /dev/davinci6 \
--device /dev/davinci7 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-v /var/lib/docker/lkx/:/data/ \
-p 8000:8000 \
-it $IMAGE bash

# Install Bisheng compiler
wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/Ascend-BiSheng-toolkit_aarch64.run
chmod a+x Ascend-BiSheng-toolkit_aarch64.run
./Ascend-BiSheng-toolkit_aarch64.run --install
source /usr/local/Ascend/8.3.RC1/bisheng_toolkit/set_env.sh
 
# Install Triton Ascend
wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
pip install triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl

pip install transformers==4.57.0

export ASCEND_LAUNCH_BLOCKING=1
# 使用 VLLM_USE_MODELSCOPE 提高模型下载速度
export VLLM_USE_MODELSCOPE=true

vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 4 \
--enable-expert-parallel \
--max-num-seqs 16 \
--max-num-batched-tokens 4096 \
--trust-remote-code \
--no-enable-prefix-caching \
--gpu-memory-utilization 0.8 \
--max-model-len 32768 \
--served-model-name qwen3-vl

然后就报错了，我现在是在下载镜像时遇到了速度很慢的情况
[root@os-node-created-nfhbp ~]# docker pull m.daocloud.io/quay.io/ascend/vllm-ascend:v0.12.0rc1
v0.12.0rc1: Pulling from quay.io/ascend/vllm-ascend
0ec3d8645767: Already exists 
bfcc5d0fd602: Downloading [==>                                                ]   7.55MB/146.5MB
16b66f2e497f: Download complete 
7feddad41f7b: Downloading [==========>                                        ]  945.2MB/4.552GB
428f9ade9a77: Download complete 
85c139bd38da: Download complete 
5a102323ed43: Download complete 
50475bbba3e3: Downloading [============>                                      ]  5.494MB/21.92MB
13bb24a6968b: Waiting 
8583ddc168a4: Waiting 
1c39e8a9df5f: Waiting 
b6463d241740: Waiting 
a6069c6ed6b0: Waiting 
846fbee6b18f: Waiting
基本要几个小时，然后出现
docker: unexpected EOF.
请问如何解决这个问题？

Ascend/samples

内容风险标识

评论 (0)

Ascend/samples .gitee-modal { width: 500px !important; }

内容风险标识