vllm推理量化后的qwen3模型报错

(python-3.9.10) [ma-user vllm-ascend]$python /home/ma-user/work/Quant/competition_model_GPTQ.py
INFO 08-07 16:53:31 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 08-07 16:53:31 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 08-07 16:53:31 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 08-07 16:53:31 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 08-07 16:53:31 [__init__.py:44] plugin ascend loaded.
INFO 08-07 16:53:31 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 08-07 16:53:33 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 08-07 16:53:33 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 08-07 16:53:33 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 08-07 16:53:33 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 08-07 16:53:33 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 08-07 16:53:33 [patch_tritonplaceholder.py:33] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 08-07 16:53:33 [patch_tritonplaceholder.py:46] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 08-07 16:53:33 [patch_tritonplaceholder.py:71] Triton module has been replaced with a placeholder.
WARNING 08-07 16:53:34 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-07 16:53:35 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 08-07 16:53:35 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 08-07 16:53:35 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 08-07 16:53:35 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 08-07 16:53:49 [config.py:689] This model supports multiple tasks: {'generate', 'score', 'classify', 'reward', 'embed'}. Defaulting to 'generate'.
WARNING 08-07 16:53:49 [config.py:768] ascend quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 08-07 16:53:49 [arg_utils.py:1742] npu is experimental on VLLM_USE_V1=1. Falling back to V0 Engine.
INFO 08-07 16:53:49 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
Traceback (most recent call last):
  File "/home/ma-user/work/Quant/competition_model_GPTQ.py", line 242, in <module>
    comp = Competition()
  File "/home/ma-user/work/Quant/competition_model_GPTQ.py", line 35, in __init__
    self.llm = LLM(
  File "/home/ma-user/work/requirements/vllm/vllm/utils.py", line 1099, in inner
    return fn(*args, **kwargs)
  File "/home/ma-user/work/requirements/vllm/vllm/entrypoints/llm.py", line 248, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/home/ma-user/work/requirements/vllm/vllm/engine/llm_engine.py", line 515, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/home/ma-user/work/requirements/vllm/vllm/engine/arg_utils.py", line 1335, in create_engine_config
    config = VllmConfig(
  File "<string>", line 19, in __init__
  File "/home/ma-user/work/requirements/vllm/vllm/config.py", line 3709, in __post_init__
    self.quant_config = VllmConfig._get_quantization_config(
  File "/home/ma-user/work/requirements/vllm/vllm/config.py", line 3651, in _get_quantization_config
    quant_config = get_quant_config(model_config, load_config)
  File "/home/ma-user/work/requirements/vllm/vllm/model_executor/model_loader/weight_utils.py", line 195, in get_quant_config
    return quant_cls()
TypeError: __init__() missing 1 required positional argument: 'quant_config'
[ERROR] 2025-08-07-16:53:50 (PID:299456, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception

我程序所生成的文件和官方提供的deepseek-w8a8文件相比，缺少一个configuration_deepseek.py，不确定是不是诱因。

Ascend/msit
暂停

内容风险标识

评论 (2)

Ascend/msit暂停 .gitee-modal { width: 500px !important; }

内容风险标识