登录
注册
开源
企业版
高校版
搜索
帮助中心
使用条款
关于我们
开源
企业版
高校版
私有云
模力方舟
AI 队友
登录
注册
轻量养虾,开箱即用!低 Token + 稳定算力,Gitee & 模力方舟联合出品的 PocketClaw 正式开售!点击了解详情
代码拉取完成,页面将自动刷新
开源项目
>
人工智能
>
AI-人工智能
&&
捐赠
捐赠前请先登录
取消
前往登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
Watch
不关注
关注所有动态
仅关注版本发行动态
关注但不提醒动态
224
Star
1.3K
Fork
1.1K
Ascend
/
samples
代码
Issues
41
Pull Requests
99
Wiki
统计
流水线
服务
JavaDoc
PHPDoc
质量分析
Jenkins for Gitee
腾讯云托管
腾讯云 Serverless
悬镜安全
阿里云 SAE
Codeblitz
SBOM
开发画像分析
我知道了,不再自动展开
更新失败,请稍后重试!
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
ascend裸机运行vllm失败,transdata算子报错
TODO
#IDDLNW
任务
木子木木日生
创建于
2025-12-17 16:06
 root@674f5f297e81:/workspace# vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct --host 0.0.0.0 --port 8000 --tensor-parallel-size 4 --enable-expert-parallel --max-num-seqs 16 --max-num-batched-tokens 4096 --trust-remote-code --no-enable-prefix-caching --gpu-memory-utilization 0.8 --max-model-len 32768 --served-model-name qwen3-vl INFO 12-17 02:49:07 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:49:07 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:49:07 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:49:07 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:49:13 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. (APIServer pid=106356) INFO 12-17 02:49:15 [api_server.py:1839] vLLM API server version 0.11.0rc3 (APIServer pid=106356) INFO 12-17 02:49:15 [utils.py:233] non-default args: {'model_tag': 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'host': '0.0.0.0', 'model': 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'trust_remote_code': True, 'max_model_len': 32768, 'served_model_name': ['qwen3-vl'], 'tensor_parallel_size': 4, 'enable_expert_parallel': True, 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 4096, 'max_num_seqs': 16} (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) INFO 12-17 02:49:36 [model.py:547] Resolved architecture: Qwen3VLMoeForConditionalGeneration (APIServer pid=106356) `torch_dtype` is deprecated! Use `dtype` instead! (APIServer pid=106356) INFO 12-17 02:49:36 [model.py:1510] Using max model len 32768 (APIServer pid=106356) INFO 12-17 02:49:36 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=4096. (APIServer pid=106356) INFO 12-17 02:49:36 [platform.py:141] Non-MLA LLMs forcibly disable the chunked prefill feature,as the performance of operators supporting this feature functionality is currently suboptimal. (APIServer pid=106356) INFO 12-17 02:49:36 [platform.py:227] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode (APIServer pid=106356) INFO 12-17 02:49:36 [utils.py:357] Calculated maximum supported batch sizes for ACL graph: 11 (APIServer pid=106356) WARNING 12-17 02:49:36 [utils.py:360] Currently, communication is performed using FFTS+ method, which reduces the number of available streams and, as a result, limits the range of runtime shapes that can be handled. To both improve communication performance and increase the number of supported shapes, set HCCL_OP_EXPANSION_MODE=AIV. (APIServer pid=106356) INFO 12-17 02:49:36 [utils.py:390] No adjustment needed for ACL graph batch sizes: Qwen3VLMoeForConditionalGeneration model (layers: 48) with 7 sizes (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct INFO 12-17 02:49:48 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:49:48 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:49:48 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:49:48 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:49:54 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") (EngineCore_DP0 pid=106762) INFO 12-17 02:49:54 [core.py:644] Waiting for init message from front-end. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. (EngineCore_DP0 pid=106762) INFO 12-17 02:49:55 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc3) with config: model='Qwen/Qwen3-VL-30B-A3B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen3-VL-30B-A3B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=qwen3-vl, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.unified_ascend_attention_with_output","vllm.mla_forward"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":32,"local_cache_dir":null} (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore_DP0 pid=106762) INFO 12-17 02:49:55 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3], buffer_handle=(4, 16777216, 10, 'psm_fd6690f9'), local_subscribe_addr='ipc:///tmp/527bc7ed-8e43-4e18-a8b6-5ac75098f56f', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:29 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:29 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:29 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:29 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 12-17 02:50:37 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_595d3aa5'), local_subscribe_addr='ipc:///tmp/da85f15a-38da-4e46-b555-bcf4e415f5e5', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_1cda08d7'), local_subscribe_addr='ipc:///tmp/04d255b3-7ac4-4290-adcb-fb946ec7c851', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_615a67c3'), local_subscribe_addr='ipc:///tmp/5d47fbea-d8c2-4a52-97b2-0d8ee955071b', remote_subscribe_addr=None, remote_addr_ipv6=False) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_3d4d7667'), local_subscribe_addr='ipc:///tmp/f4456559-0ee5-4548-a7b7-3d4bacd040ca', remote_subscribe_addr=None, remote_addr_ipv6=False) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct INFO 12-17 02:50:45 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_55ee9d5d'), local_subscribe_addr='ipc:///tmp/dd3f793d-ebfe-4eb4-bc9e-64b091bb80cd', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:45 [parallel_state.py:1208] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 12-17 02:50:45 [parallel_state.py:1208] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1 INFO 12-17 02:50:45 [parallel_state.py:1208] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3 INFO 12-17 02:50:45 [parallel_state.py:1208] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2 .(Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:09 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... .(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:12 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... ..(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:16 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... .(Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:18 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... (Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:20 [layer.py:1052] [EP Rank 2/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->64, 1->65, 2->66, 3->67, 4->68, 5->69, 6->70, 7->71, 8->72, 9->73, 10->74, 11->75, 12->76, 13->77, 14->78, 15->79, 16->80, 17->81, 18->82, 19->83, 20->84, 21->85, 22->86, 23->87, 24->88, 25->89, 26->90, 27->91, 28->92, 29->93, 30->94, 31->95. (Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:20 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. ..(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:24 [layer.py:1052] [EP Rank 1/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->32, 1->33, 2->34, 3->35, 4->36, 5->37, 6->38, 7->39, 8->40, 9->41, 10->42, 11->43, 12->44, 13->45, 14->46, 15->47, 16->48, 17->49, 18->50, 19->51, 20->52, 21->53, 22->54, 23->55, 24->56, 25->57, 26->58, 27->59, 28->60, 29->61, 30->62, 31->63. (Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:24 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. .(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:27 [layer.py:1052] [EP Rank 3/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->96, 1->97, 2->98, 3->99, 4->100, 5->101, 6->102, 7->103, 8->104, 9->105, 10->106, 11->107, 12->108, 13->109, 14->110, 15->111, 16->112, 17->113, 18->114, 19->115, 20->116, 21->117, 22->118, 23->119, 24->120, 25->121, 26->122, 27->123, 28->124, 29->125, 30->126, 31->127. (Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:27 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. (Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:32 [layer.py:1052] [EP Rank 0/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->0, 1->1, 2->2, 3->3, 4->4, 5->5, 6->6, 7->7, 8->8, 9->9, 10->10, 11->11, 12->12, 13->13, 14->14, 15->15, 16->16, 17->17, 18->18, 19->19, 20->20, 21->21, 22->22, 23->23, 24->24, 25->25, 26->26, 27->27, 28->28, 29->29, 30->30, 31->31. (Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:32 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. ...(Worker_TP0_EP0 pid=106900) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Loading safetensors checkpoint shards: 0% Completed | 0/13 [00:00<?, ?it/s] (Worker_TP1_EP1 pid=106901) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (Worker_TP2_EP2 pid=106902) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (Worker_TP3_EP3 pid=106903) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Loading safetensors checkpoint shards: 8% Completed | 1/13 [00:06<01:12, 6.03s/it] Loading safetensors checkpoint shards: 15% Completed | 2/13 [00:07<00:38, 3.49s/it] Loading safetensors checkpoint shards: 23% Completed | 3/13 [00:13<00:46, 4.69s/it] Loading safetensors checkpoint shards: 31% Completed | 4/13 [00:19<00:46, 5.18s/it] Loading safetensors checkpoint shards: 38% Completed | 5/13 [00:24<00:39, 4.96s/it] Loading safetensors checkpoint shards: 46% Completed | 6/13 [00:30<00:36, 5.27s/it] Loading safetensors checkpoint shards: 54% Completed | 7/13 [00:36<00:32, 5.46s/it] Loading safetensors checkpoint shards: 62% Completed | 8/13 [00:42<00:28, 5.62s/it] Loading safetensors checkpoint shards: 69% Completed | 9/13 [00:47<00:22, 5.72s/it] Loading safetensors checkpoint shards: 77% Completed | 10/13 [00:53<00:17, 5.77s/it] Loading safetensors checkpoint shards: 85% Completed | 11/13 [00:59<00:11, 5.81s/it] Loading safetensors checkpoint shards: 92% Completed | 12/13 [01:05<00:05, 5.90s/it] Loading safetensors checkpoint shards: 100% Completed | 13/13 [01:11<00:00, 5.91s/it] Loading safetensors checkpoint shards: 100% Completed | 13/13 [01:11<00:00, 5.52s/it] (Worker_TP0_EP0 pid=106900) (Worker_TP0_EP0 pid=106900) INFO 12-17 02:52:47 [default_loader.py:267] Loading weights took 71.87 seconds .(Worker_TP2_EP2 pid=106902) INFO 12-17 02:52:48 [default_loader.py:267] Loading weights took 71.21 seconds (Worker_TP1_EP1 pid=106901) INFO 12-17 02:52:48 [default_loader.py:267] Loading weights took 72.10 seconds ..(Worker_TP3_EP3 pid=106903) INFO 12-17 02:52:49 [default_loader.py:267] Loading weights took 71.74 seconds .(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] WorkerProc failed to start. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Traceback (most recent call last): (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 437, in __init__ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] self.worker.load_model() (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 291, in load_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] self.model_runner.load_model() (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2630, in load_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] self.model = get_model(vllm_config=self.vllm_config) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 119, in get_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] return loader.load_model(vllm_config=vllm_config, (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] process_weights_after_loading(model, model_config, target_device) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] quant_method.process_weights_after_loading(module) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/common_fused_moe.py", line 129, in process_weights_after_loading (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] layer.w13_weight.data = torch_npu.npu_format_cast( (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1158, in __call__ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] return self._op(*args, **(kwargs or {})) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:234 OPS function error: Identity, error code is 500002 (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [ERROR] 2025-12-17-02:52:49 (PID:106900, Device:0, RankID:-1) ERR01100 OPS call acl api failed (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Error]: A GE error occurs in the system. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Rectify the fault based on the error information in the ascend log. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] EZ3002: [PID: 106900] 2025-12-17-02:52:49.499.381 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] TraceBack (most recent call last): (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Optype [TransData] of Ops kernel [aicpu_ascend_kernel] is unsupported. Reason: Transdata op, groups should be greater than 1, but now is 1. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] No supported Ops kernel and engine are found for [trans_TransData_42], optype [TransData]. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] build graph failed, graph id:49, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1624] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] (Worker_TP3_EP3 pid=106903) INFO 12-17 02:52:50 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP0_EP0 pid=106900) INFO 12-17 02:52:50 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1_EP1 pid=106901) INFO 12-17 02:52:51 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1_EP1 pid=106901) Exception ignored in: <finalize object at 0xfffe61e2f680; dead> (Worker_TP1_EP1 pid=106901) Traceback (most recent call last): (Worker_TP1_EP1 pid=106901) File "/usr/local/python3.11.13/lib/python3.11/weakref.py", line 585, in __call__ (Worker_TP1_EP1 pid=106901) def __call__(self, _=None): (Worker_TP1_EP1 pid=106901) (Worker_TP1_EP1 pid=106901) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 538, in signal_handler (Worker_TP1_EP1 pid=106901) raise SystemExit() (Worker_TP1_EP1 pid=106901) SystemExit: (Worker_TP2_EP2 pid=106902) INFO 12-17 02:52:51 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] WorkerProc failed to start. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Traceback (most recent call last): (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 437, in __init__ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] self.worker.load_model() (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 291, in load_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] self.model_runner.load_model() (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2630, in load_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] self.model = get_model(vllm_config=self.vllm_config) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 119, in get_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] return loader.load_model(vllm_config=vllm_config, (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] process_weights_after_loading(model, model_config, target_device) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] quant_method.process_weights_after_loading(module) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/common_fused_moe.py", line 129, in process_weights_after_loading (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] layer.w13_weight.data = torch_npu.npu_format_cast( (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1158, in __call__ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] return self._op(*args, **(kwargs or {})) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:234 OPS function error: Identity, error code is 500002 (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [ERROR] 2025-12-17-02:52:50 (PID:106901, Device:1, RankID:-1) ERR01100 OPS call acl api failed (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Error]: A GE error occurs in the system. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Rectify the fault based on the error information in the ascend log. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] EZ3002: [PID: 106901] 2025-12-17-02:52:50.062.241 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] TraceBack (most recent call last): (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Optype [TransData] of Ops kernel [aicpu_ascend_kernel] is unsupported. Reason: Transdata op, groups should be greater than 1, but now is 1. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] No supported Ops kernel and engine are found for [trans_TransData_42], optype [TransData]. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] build graph failed, graph id:49, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1624] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] EngineCore failed to start. (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] Traceback (most recent call last): (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 83, in __init__ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] self._init_executor() (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] raise e from None (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore_DP0 pid=106762) Process EngineCore_DP0: (EngineCore_DP0 pid=106762) Traceback (most recent call last): (EngineCore_DP0 pid=106762) File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=106762) self.run() (EngineCore_DP0 pid=106762) File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=106762) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 712, in run_engine_core (EngineCore_DP0 pid=106762) raise e (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=106762) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=106762) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__ (EngineCore_DP0 pid=106762) super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 83, in __init__ (EngineCore_DP0 pid=106762) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=106762) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=106762) self._init_executor() (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor (EngineCore_DP0 pid=106762) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=106762) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready (EngineCore_DP0 pid=106762) raise e from None (EngineCore_DP0 pid=106762) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=106356) Traceback (most recent call last): (APIServer pid=106356) File "/usr/local/python3.11.13/bin/vllm", line 8, in <module> (APIServer pid=106356) sys.exit(main()) (APIServer pid=106356) ^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=106356) args.dispatch_function(args) (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd (APIServer pid=106356) uvloop.run(run_server(args)) (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run (APIServer pid=106356) return runner.run(wrapper()) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run (APIServer pid=106356) return self._loop.run_until_complete(task) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper (APIServer pid=106356) return await main (APIServer pid=106356) ^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1884, in run_server (APIServer pid=106356) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker (APIServer pid=106356) async with build_async_engine_client( (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__ (APIServer pid=106356) return await anext(self.gen) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client (APIServer pid=106356) async with build_async_engine_client_from_engine_args( (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__ (APIServer pid=106356) return await anext(self.gen) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args (APIServer pid=106356) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 1571, in inner (APIServer pid=106356) return fn(*args, **kwargs) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config (APIServer pid=106356) return cls( (APIServer pid=106356) ^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 134, in __init__ (APIServer pid=106356) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=106356) return AsyncMPClient(*client_args) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 769, in __init__ (APIServer pid=106356) super().__init__( (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 448, in __init__ (APIServer pid=106356) with launch_core_engines(vllm_config, executor_class, (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__ (APIServer pid=106356) next(self.gen) (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 732, in launch_core_engines (APIServer pid=106356) wait_for_engine_startup( (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup (APIServer pid=106356) raise RuntimeError("Engine core initialization failed. " (APIServer pid=106356) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} (APIServer pid=106356) [ERROR] 2025-12-17-02:52:57 (PID:106356, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception root@674f5f297e81:/workspace# /usr/local/python3.11.13/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' 报错关键行: EZ3002: [PID: 106900] 2025-12-17-02:52:49.499.381 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library. 应该是Transdata这个算子没有在镜像中。以下是我的启动步骤: export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:v0.11.0rc0 docker run --rm \ --name qwenvl3_30b \ --privileged=true \ --ipc=host \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci2 \ --device /dev/davinci3 \ --device /dev/davinci4 \ --device /dev/davinci5 \ --device /dev/davinci6 \ --device /dev/davinci7 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -v /var/lib/docker/lkx/:/data/ \ -p 8000:8000 \ -it $IMAGE bash # Install Bisheng compiler wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/Ascend-BiSheng-toolkit_aarch64.run chmod a+x Ascend-BiSheng-toolkit_aarch64.run ./Ascend-BiSheng-toolkit_aarch64.run --install source /usr/local/Ascend/8.3.RC1/bisheng_toolkit/set_env.sh # Install Triton Ascend wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl pip install triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl pip install transformers==4.57.0 export ASCEND_LAUNCH_BLOCKING=1 # 使用 VLLM_USE_MODELSCOPE 提高模型下载速度 export VLLM_USE_MODELSCOPE=true vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct \ --host 0.0.0.0 \ --port 8000 \ --tensor-parallel-size 4 \ --enable-expert-parallel \ --max-num-seqs 16 \ --max-num-batched-tokens 4096 \ --trust-remote-code \ --no-enable-prefix-caching \ --gpu-memory-utilization 0.8 \ --max-model-len 32768 \ --served-model-name qwen3-vl 然后就报错了,我现在是在下载镜像时遇到了速度很慢的情况 [root@os-node-created-nfhbp ~]# docker pull m.daocloud.io/quay.io/ascend/vllm-ascend:v0.12.0rc1 v0.12.0rc1: Pulling from quay.io/ascend/vllm-ascend 0ec3d8645767: Already exists bfcc5d0fd602: Downloading [==> ] 7.55MB/146.5MB 16b66f2e497f: Download complete 7feddad41f7b: Downloading [==========> ] 945.2MB/4.552GB 428f9ade9a77: Download complete 85c139bd38da: Download complete 5a102323ed43: Download complete 50475bbba3e3: Downloading [============> ] 5.494MB/21.92MB 13bb24a6968b: Waiting 8583ddc168a4: Waiting 1c39e8a9df5f: Waiting b6463d241740: Waiting a6069c6ed6b0: Waiting 846fbee6b18f: Waiting 基本要几个小时,然后出现 docker: unexpected EOF. 请问如何解决这个问题?
 root@674f5f297e81:/workspace# vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct --host 0.0.0.0 --port 8000 --tensor-parallel-size 4 --enable-expert-parallel --max-num-seqs 16 --max-num-batched-tokens 4096 --trust-remote-code --no-enable-prefix-caching --gpu-memory-utilization 0.8 --max-model-len 32768 --served-model-name qwen3-vl INFO 12-17 02:49:07 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:49:07 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:49:07 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:49:07 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:49:13 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:49:14 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:49:14 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. (APIServer pid=106356) INFO 12-17 02:49:15 [api_server.py:1839] vLLM API server version 0.11.0rc3 (APIServer pid=106356) INFO 12-17 02:49:15 [utils.py:233] non-default args: {'model_tag': 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'host': '0.0.0.0', 'model': 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'trust_remote_code': True, 'max_model_len': 32768, 'served_model_name': ['qwen3-vl'], 'tensor_parallel_size': 4, 'enable_expert_parallel': True, 'gpu_memory_utilization': 0.8, 'enable_prefix_caching': False, 'max_num_batched_tokens': 4096, 'max_num_seqs': 16} (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) INFO 12-17 02:49:36 [model.py:547] Resolved architecture: Qwen3VLMoeForConditionalGeneration (APIServer pid=106356) `torch_dtype` is deprecated! Use `dtype` instead! (APIServer pid=106356) INFO 12-17 02:49:36 [model.py:1510] Using max model len 32768 (APIServer pid=106356) INFO 12-17 02:49:36 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=4096. (APIServer pid=106356) INFO 12-17 02:49:36 [platform.py:141] Non-MLA LLMs forcibly disable the chunked prefill feature,as the performance of operators supporting this feature functionality is currently suboptimal. (APIServer pid=106356) INFO 12-17 02:49:36 [platform.py:227] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode (APIServer pid=106356) INFO 12-17 02:49:36 [utils.py:357] Calculated maximum supported batch sizes for ACL graph: 11 (APIServer pid=106356) WARNING 12-17 02:49:36 [utils.py:360] Currently, communication is performed using FFTS+ method, which reduces the number of available streams and, as a result, limits the range of runtime shapes that can be handled. To both improve communication performance and increase the number of supported shapes, set HCCL_OP_EXPANSION_MODE=AIV. (APIServer pid=106356) INFO 12-17 02:49:36 [utils.py:390] No adjustment needed for ACL graph batch sizes: Qwen3VLMoeForConditionalGeneration model (layers: 48) with 7 sizes (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (APIServer pid=106356) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct INFO 12-17 02:49:48 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:49:48 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:49:48 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:49:48 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:49:54 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") (EngineCore_DP0 pid=106762) INFO 12-17 02:49:54 [core.py:644] Waiting for init message from front-end. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. (EngineCore_DP0 pid=106762) INFO 12-17 02:49:55 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc3) with config: model='Qwen/Qwen3-VL-30B-A3B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen3-VL-30B-A3B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=qwen3-vl, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.unified_ascend_attention_with_output","vllm.mla_forward"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":32,"local_cache_dir":null} (EngineCore_DP0 pid=106762) WARNING 12-17 02:49:55 [multiproc_executor.py:720] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore_DP0 pid=106762) INFO 12-17 02:49:55 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3], buffer_handle=(4, 16777216, 10, 'psm_fd6690f9'), local_subscribe_addr='ipc:///tmp/527bc7ed-8e43-4e18-a8b6-5ac75098f56f', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:04 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:04 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:04 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:04 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:10 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM. WARNING 12-17 02:50:11 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM. INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:28 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:28 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:28 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:28 [__init__.py:207] Platform plugin ascend is activated INFO 12-17 02:50:29 [__init__.py:36] Available plugins for group vllm.platform_plugins: INFO 12-17 02:50:29 [__init__.py:38] - ascend -> vllm_ascend:register INFO 12-17 02:50:29 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-17 02:50:29 [__init__.py:207] Platform plugin ascend is activated WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") WARNING 12-17 02:50:35 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") INFO 12-17 02:50:37 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_595d3aa5'), local_subscribe_addr='ipc:///tmp/da85f15a-38da-4e46-b555-bcf4e415f5e5', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_1cda08d7'), local_subscribe_addr='ipc:///tmp/04d255b3-7ac4-4290-adcb-fb946ec7c851', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_615a67c3'), local_subscribe_addr='ipc:///tmp/5d47fbea-d8c2-4a52-97b2-0d8ee955071b', remote_subscribe_addr=None, remote_addr_ipv6=False) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct INFO 12-17 02:50:38 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_3d4d7667'), local_subscribe_addr='ipc:///tmp/f4456559-0ee5-4548-a7b7-3d4bacd040ca', remote_subscribe_addr=None, remote_addr_ipv6=False) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct INFO 12-17 02:50:45 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_55ee9d5d'), local_subscribe_addr='ipc:///tmp/dd3f793d-ebfe-4eb4-bc9e-64b091bb80cd', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 12-17 02:50:45 [parallel_state.py:1208] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 INFO 12-17 02:50:45 [parallel_state.py:1208] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1 INFO 12-17 02:50:45 [parallel_state.py:1208] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3 INFO 12-17 02:50:45 [parallel_state.py:1208] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2 .(Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:09 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... .(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:12 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... ..(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:16 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... .(Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:18 [model_runner_v1.py:2627] Starting to load model Qwen/Qwen3-VL-30B-A3B-Instruct... (Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:20 [layer.py:1052] [EP Rank 2/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->64, 1->65, 2->66, 3->67, 4->68, 5->69, 6->70, 7->71, 8->72, 9->73, 10->74, 11->75, 12->76, 13->77, 14->78, 15->79, 16->80, 17->81, 18->82, 19->83, 20->84, 21->85, 22->86, 23->87, 24->88, 25->89, 26->90, 27->91, 28->92, 29->93, 30->94, 31->95. (Worker_TP2_EP2 pid=106902) INFO 12-17 02:51:20 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. ..(Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:24 [layer.py:1052] [EP Rank 1/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->32, 1->33, 2->34, 3->35, 4->36, 5->37, 6->38, 7->39, 8->40, 9->41, 10->42, 11->43, 12->44, 13->45, 14->46, 15->47, 16->48, 17->49, 18->50, 19->51, 20->52, 21->53, 22->54, 23->55, 24->56, 25->57, 26->58, 27->59, 28->60, 29->61, 30->62, 31->63. (Worker_TP1_EP1 pid=106901) INFO 12-17 02:51:24 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. .(Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:27 [layer.py:1052] [EP Rank 3/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->96, 1->97, 2->98, 3->99, 4->100, 5->101, 6->102, 7->103, 8->104, 9->105, 10->106, 11->107, 12->108, 13->109, 14->110, 15->111, 16->112, 17->113, 18->114, 19->115, 20->116, 21->117, 22->118, 23->119, 24->120, 25->121, 26->122, 27->123, 28->124, 29->125, 30->126, 31->127. (Worker_TP3_EP3 pid=106903) INFO 12-17 02:51:27 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. (Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:32 [layer.py:1052] [EP Rank 0/4] Expert parallelism is enabled. Expert placement strategy: linear. Local/global number of experts: 32/128. Experts local to global index map: 0->0, 1->1, 2->2, 3->3, 4->4, 5->5, 6->6, 7->7, 8->8, 9->9, 10->10, 11->11, 12->12, 13->13, 14->14, 15->15, 16->16, 17->17, 18->18, 19->19, 20->20, 21->21, 22->22, 23->23, 24->24, 25->25, 26->26, 27->27, 28->28, 29->29, 30->30, 31->31. (Worker_TP0_EP0 pid=106900) INFO 12-17 02:51:32 [layer.py:327] FlashInfer CUTLASS MoE is available for EP but not enabled, consider setting VLLM_USE_FLASHINFER_MOE_FP16=1 to enable it. ...(Worker_TP0_EP0 pid=106900) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Loading safetensors checkpoint shards: 0% Completed | 0/13 [00:00<?, ?it/s] (Worker_TP1_EP1 pid=106901) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (Worker_TP2_EP2 pid=106902) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct (Worker_TP3_EP3 pid=106903) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-30B-A3B-Instruct Loading safetensors checkpoint shards: 8% Completed | 1/13 [00:06<01:12, 6.03s/it] Loading safetensors checkpoint shards: 15% Completed | 2/13 [00:07<00:38, 3.49s/it] Loading safetensors checkpoint shards: 23% Completed | 3/13 [00:13<00:46, 4.69s/it] Loading safetensors checkpoint shards: 31% Completed | 4/13 [00:19<00:46, 5.18s/it] Loading safetensors checkpoint shards: 38% Completed | 5/13 [00:24<00:39, 4.96s/it] Loading safetensors checkpoint shards: 46% Completed | 6/13 [00:30<00:36, 5.27s/it] Loading safetensors checkpoint shards: 54% Completed | 7/13 [00:36<00:32, 5.46s/it] Loading safetensors checkpoint shards: 62% Completed | 8/13 [00:42<00:28, 5.62s/it] Loading safetensors checkpoint shards: 69% Completed | 9/13 [00:47<00:22, 5.72s/it] Loading safetensors checkpoint shards: 77% Completed | 10/13 [00:53<00:17, 5.77s/it] Loading safetensors checkpoint shards: 85% Completed | 11/13 [00:59<00:11, 5.81s/it] Loading safetensors checkpoint shards: 92% Completed | 12/13 [01:05<00:05, 5.90s/it] Loading safetensors checkpoint shards: 100% Completed | 13/13 [01:11<00:00, 5.91s/it] Loading safetensors checkpoint shards: 100% Completed | 13/13 [01:11<00:00, 5.52s/it] (Worker_TP0_EP0 pid=106900) (Worker_TP0_EP0 pid=106900) INFO 12-17 02:52:47 [default_loader.py:267] Loading weights took 71.87 seconds .(Worker_TP2_EP2 pid=106902) INFO 12-17 02:52:48 [default_loader.py:267] Loading weights took 71.21 seconds (Worker_TP1_EP1 pid=106901) INFO 12-17 02:52:48 [default_loader.py:267] Loading weights took 72.10 seconds ..(Worker_TP3_EP3 pid=106903) INFO 12-17 02:52:49 [default_loader.py:267] Loading weights took 71.74 seconds .(Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] WorkerProc failed to start. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Traceback (most recent call last): (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 437, in __init__ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] self.worker.load_model() (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 291, in load_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] self.model_runner.load_model() (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2630, in load_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] self.model = get_model(vllm_config=self.vllm_config) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 119, in get_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] return loader.load_model(vllm_config=vllm_config, (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] process_weights_after_loading(model, model_config, target_device) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] quant_method.process_weights_after_loading(module) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/common_fused_moe.py", line 129, in process_weights_after_loading (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] layer.w13_weight.data = torch_npu.npu_format_cast( (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1158, in __call__ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] return self._op(*args, **(kwargs or {})) (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:234 OPS function error: Identity, error code is 500002 (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [ERROR] 2025-12-17-02:52:49 (PID:106900, Device:0, RankID:-1) ERR01100 OPS call acl api failed (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Error]: A GE error occurs in the system. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Rectify the fault based on the error information in the ascend log. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] EZ3002: [PID: 106900] 2025-12-17-02:52:49.499.381 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] TraceBack (most recent call last): (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Optype [TransData] of Ops kernel [aicpu_ascend_kernel] is unsupported. Reason: Transdata op, groups should be greater than 1, but now is 1. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] No supported Ops kernel and engine are found for [trans_TransData_42], optype [TransData]. (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] build graph failed, graph id:49, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1624] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP0_EP0 pid=106900) ERROR 12-17 02:52:50 [multiproc_executor.py:597] (Worker_TP3_EP3 pid=106903) INFO 12-17 02:52:50 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP0_EP0 pid=106900) INFO 12-17 02:52:50 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1_EP1 pid=106901) INFO 12-17 02:52:51 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1_EP1 pid=106901) Exception ignored in: <finalize object at 0xfffe61e2f680; dead> (Worker_TP1_EP1 pid=106901) Traceback (most recent call last): (Worker_TP1_EP1 pid=106901) File "/usr/local/python3.11.13/lib/python3.11/weakref.py", line 585, in __call__ (Worker_TP1_EP1 pid=106901) def __call__(self, _=None): (Worker_TP1_EP1 pid=106901) (Worker_TP1_EP1 pid=106901) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 538, in signal_handler (Worker_TP1_EP1 pid=106901) raise SystemExit() (Worker_TP1_EP1 pid=106901) SystemExit: (Worker_TP2_EP2 pid=106902) INFO 12-17 02:52:51 [multiproc_executor.py:558] Parent process exited, terminating worker (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] WorkerProc failed to start. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Traceback (most recent call last): (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 437, in __init__ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] self.worker.load_model() (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 291, in load_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] self.model_runner.load_model() (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2630, in load_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] self.model = get_model(vllm_config=self.vllm_config) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 119, in get_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] return loader.load_model(vllm_config=vllm_config, (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] process_weights_after_loading(model, model_config, target_device) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] quant_method.process_weights_after_loading(module) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/vllm-workspace/vllm-ascend/vllm_ascend/ops/common_fused_moe.py", line 129, in process_weights_after_loading (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] layer.w13_weight.data = torch_npu.npu_format_cast( (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/_ops.py", line 1158, in __call__ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] return self._op(*args, **(kwargs or {})) (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] RuntimeError: InnerRun:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:234 OPS function error: Identity, error code is 500002 (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [ERROR] 2025-12-17-02:52:50 (PID:106901, Device:1, RankID:-1) ERR01100 OPS call acl api failed (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Error]: A GE error occurs in the system. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Rectify the fault based on the error information in the ascend log. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] EZ3002: [PID: 106901] 2025-12-17-02:52:50.062.241 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Possible Cause: The operator type is unsupported in the operator information library due to specification mismatch. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Solution: Submit an issue to request for support at https://gitee.com/ascend, or remove this type of operators from your model. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] TraceBack (most recent call last): (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Optype [TransData] of Ops kernel [aicpu_ascend_kernel] is unsupported. Reason: Transdata op, groups should be greater than 1, but now is 1. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] No supported Ops kernel and engine are found for [trans_TransData_42], optype [TransData]. (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] Assert ((SelectEngine(node_ptr, exclude_engines, is_check_support_success, op_info)) == ge::SUCCESS) failed[FUNC:operator()][FILE:engine_place.cc][LINE:148] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] RunAllSubgraphs failed, graph=online.[FUNC:RunAllSubgraphs][FILE:engine_place.cc][LINE:122] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] build graph failed, graph id:49, ret:4294967295[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1624] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145] (Worker_TP1_EP1 pid=106901) ERROR 12-17 02:52:51 [multiproc_executor.py:597] (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] EngineCore failed to start. (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] Traceback (most recent call last): (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 83, in __init__ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] self._init_executor() (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] raise e from None (EngineCore_DP0 pid=106762) ERROR 12-17 02:52:54 [core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore_DP0 pid=106762) Process EngineCore_DP0: (EngineCore_DP0 pid=106762) Traceback (most recent call last): (EngineCore_DP0 pid=106762) File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=106762) self.run() (EngineCore_DP0 pid=106762) File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=106762) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 712, in run_engine_core (EngineCore_DP0 pid=106762) raise e (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core (EngineCore_DP0 pid=106762) engine_core = EngineCoreProc(*args, **kwargs) (EngineCore_DP0 pid=106762) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__ (EngineCore_DP0 pid=106762) super().__init__(vllm_config, executor_class, log_stats, (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 83, in __init__ (EngineCore_DP0 pid=106762) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=106762) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 54, in __init__ (EngineCore_DP0 pid=106762) self._init_executor() (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor (EngineCore_DP0 pid=106762) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=106762) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=106762) File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready (EngineCore_DP0 pid=106762) raise e from None (EngineCore_DP0 pid=106762) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=106356) Traceback (most recent call last): (APIServer pid=106356) File "/usr/local/python3.11.13/bin/vllm", line 8, in <module> (APIServer pid=106356) sys.exit(main()) (APIServer pid=106356) ^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 54, in main (APIServer pid=106356) args.dispatch_function(args) (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd (APIServer pid=106356) uvloop.run(run_server(args)) (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run (APIServer pid=106356) return runner.run(wrapper()) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run (APIServer pid=106356) return self._loop.run_until_complete(task) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper (APIServer pid=106356) return await main (APIServer pid=106356) ^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1884, in run_server (APIServer pid=106356) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker (APIServer pid=106356) async with build_async_engine_client( (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__ (APIServer pid=106356) return await anext(self.gen) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client (APIServer pid=106356) async with build_async_engine_client_from_engine_args( (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__ (APIServer pid=106356) return await anext(self.gen) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args (APIServer pid=106356) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 1571, in inner (APIServer pid=106356) return fn(*args, **kwargs) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config (APIServer pid=106356) return cls( (APIServer pid=106356) ^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 134, in __init__ (APIServer pid=106356) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client (APIServer pid=106356) return AsyncMPClient(*client_args) (APIServer pid=106356) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 769, in __init__ (APIServer pid=106356) super().__init__( (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 448, in __init__ (APIServer pid=106356) with launch_core_engines(vllm_config, executor_class, (APIServer pid=106356) File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__ (APIServer pid=106356) next(self.gen) (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 732, in launch_core_engines (APIServer pid=106356) wait_for_engine_startup( (APIServer pid=106356) File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup (APIServer pid=106356) raise RuntimeError("Engine core initialization failed. " (APIServer pid=106356) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} (APIServer pid=106356) [ERROR] 2025-12-17-02:52:57 (PID:106356, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception root@674f5f297e81:/workspace# /usr/local/python3.11.13/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' 报错关键行: EZ3002: [PID: 106900] 2025-12-17-02:52:49.499.381 Optype [TransData] of Ops kernel [AIcoreEngine] is unsupported. Reason: [tbe-custom]:op type TransData is not found in this op store.[tbe-custom]:op type TransData is not found in this op store.[Dynamic shape check]: The format and dtype is not precisely equivalent to format and dtype in op information library[Static shape check]:The format and dtype is not precisely equivalent to format and dtype in op information library. 应该是Transdata这个算子没有在镜像中。以下是我的启动步骤: export IMAGE=m.daocloud.io/quay.io/ascend/vllm-ascend:v0.11.0rc0 docker run --rm \ --name qwenvl3_30b \ --privileged=true \ --ipc=host \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci2 \ --device /dev/davinci3 \ --device /dev/davinci4 \ --device /dev/davinci5 \ --device /dev/davinci6 \ --device /dev/davinci7 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -v /var/lib/docker/lkx/:/data/ \ -p 8000:8000 \ -it $IMAGE bash # Install Bisheng compiler wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/Ascend-BiSheng-toolkit_aarch64.run chmod a+x Ascend-BiSheng-toolkit_aarch64.run ./Ascend-BiSheng-toolkit_aarch64.run --install source /usr/local/Ascend/8.3.RC1/bisheng_toolkit/set_env.sh # Install Triton Ascend wget https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl pip install triton_ascend-3.2.0.dev20250914-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl pip install transformers==4.57.0 export ASCEND_LAUNCH_BLOCKING=1 # 使用 VLLM_USE_MODELSCOPE 提高模型下载速度 export VLLM_USE_MODELSCOPE=true vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct \ --host 0.0.0.0 \ --port 8000 \ --tensor-parallel-size 4 \ --enable-expert-parallel \ --max-num-seqs 16 \ --max-num-batched-tokens 4096 \ --trust-remote-code \ --no-enable-prefix-caching \ --gpu-memory-utilization 0.8 \ --max-model-len 32768 \ --served-model-name qwen3-vl 然后就报错了,我现在是在下载镜像时遇到了速度很慢的情况 [root@os-node-created-nfhbp ~]# docker pull m.daocloud.io/quay.io/ascend/vllm-ascend:v0.12.0rc1 v0.12.0rc1: Pulling from quay.io/ascend/vllm-ascend 0ec3d8645767: Already exists bfcc5d0fd602: Downloading [==> ] 7.55MB/146.5MB 16b66f2e497f: Download complete 7feddad41f7b: Downloading [==========> ] 945.2MB/4.552GB 428f9ade9a77: Download complete 85c139bd38da: Download complete 5a102323ed43: Download complete 50475bbba3e3: Downloading [============> ] 5.494MB/21.92MB 13bb24a6968b: Waiting 8583ddc168a4: Waiting 1c39e8a9df5f: Waiting b6463d241740: Waiting a6069c6ed6b0: Waiting 846fbee6b18f: Waiting 基本要几个小时,然后出现 docker: unexpected EOF. 请问如何解决这个问题?
评论 (
0
)
登录
后才可以发表评论
状态
TODO
TODO
WIP
DONE
CLOSED
REJECTED
负责人
未设置
标签
未设置
项目
未立项任务
未立项任务
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
分支 (
-
)
标签 (
-
)
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
预计工期
(小时)
参与者(1)
1
https://gitee.com/ascend/samples.git
git@gitee.com:ascend/samples.git
ascend
samples
samples
点此查找更多帮助
搜索帮助
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册