910A服务器4卡跑qwq-32b模型推理报out of memory错误

一。atlas800 9000服务器，4卡使用910A-ascend_24.1.rc3-cann_8.0.t63-py_3.10-ubuntu_20.04-aarch64-mindie_1.0.T71.02镜像跑qwq-32b模型推理报错，服务器信息及报错信息如下：
![输入图片说明](https://foruda.gitee.com/images/1741741131604268081/d9cdccbf_14652503.png "屏幕截图")
(Python310) root@0002:~/mindie/log/debug# cat  mindie-server_14019_202503111813.log
[2025-03-11 18:13:15.138+08:00] [14019] [14019] [mindie-server] ===init log===
[2025-03-11 18:13:15.138+08:00] [14019] [14019] [mindie-server] [INFO] [slave_IPC_communicator.cpp:823] : [model_backend] Slave communicator launch threads success.
[2025-03-11 18:13:20.011+0800] [14019] [281473265430944] [mindie-server] [INFO] [model.py:57] : [python thread: infer] launch do_inference thread.
[2025-03-11 18:13:20.011+0800] [14019] [281473265430944] [mindie-server] [INFO] [config.py:30] : model_config {'backend_bin_path': '/usr/local/Ascend/mindie/1.0.T71/mindie-llm/bin/', 'backend_log_file': '/usr/local/Ascend/mindie/1.0.T71/mindie-service/logs/mindie-server.log', 'backend_modelInstance_id': '0', 'backend_type': 'atb', 'block_size': '128', 'cpu_mem': '5', 'deploy_type': 'INTER_PROCESS', 'executor_type': 'LLM_EXECUTOR_PYTHON', 'globalRankIds': '', 'globalWorldSize': '0', 'interNodeKmcKsfMaster': 'tools/pmt/master/ksfa', 'interNodeKmcKsfStandby': 'tools/pmt/standby/ksfb', 'interNodeTLSEnabled': '1', 'interNodeTlsCaFiles': 'ca.pem,', 'interNodeTlsCaPath': 'security/grpc/ca/', 'interNodeTlsCert': 'security/grpc/certs/server.pem', 'interNodeTlsCrlFiles': 'server_crl.pem,', 'interNodeTlsCrlPath': 'security/grpc/certs/', 'interNodeTlsPk': 'security/grpc/keys/server.key.pem', 'interNodeTlsPkPwd': 'security/grpc/pass/mindie_server_key_pwd.txt', 'isMaster': '0', 'localIP': '', 'local_rank': '1', 'log_error': '1', 'log_file_num': '20', 'log_file_size': '20', 'log_info': '1', 'log_verbose': '0', 'log_warning': '1', 'masterIP': '', 'max_input_len': '2048', 'max_iter_times': '512', 'max_prefill_tokens': '8192', 'max_seq_len': '2560', 'model_id': '/data/QwQ-32B/', 'model_instance_number': '1', 'model_instance_type': 'Standard', 'model_name': 'QwQ-32B', 'multiNodesInferEnabled': '0', 'multiNodesInferPort': '1120', 'npu_device_id': '1', 'npu_device_ids': '0,1,2,3', 'npu_mem': '-1', 'rank': '1', 'slaveIPs': '', 'speculation_gamma': '0', 'trust_remote_code': '0', 'world_size': '4'}
[2025-03-11 18:13:20.011+0800] [14019] [281473265430944] [mindie-server] [INFO] [config.py:35] : do not init dmi config
[2025-03-11 18:13:20.011+0800] [14019] [281473265430944] [mindie-server] [INFO] [metrics.py:56] : profiling is disenabled.
[2025-03-11 18:13:20.011+0800] [14019] [281473265430944] [mindie-server] [INFO] [standard_model.py:146] : global rank id 1 get model config: {'backend_bin_path': '/usr/local/Ascend/mindie/1.0.T71/mindie-llm/bin/', 'backend_log_file': '/usr/local/Ascend/mindie/1.0.T71/mindie-service/logs/mindie-server.log', 'backend_modelInstance_id': '0', 'backend_type': 'atb', 'block_size': '128', 'cpu_mem': '5', 'deploy_type': 'INTER_PROCESS', 'executor_type': 'LLM_EXECUTOR_PYTHON', 'globalRankIds': '', 'globalWorldSize': '0', 'interNodeKmcKsfMaster': 'tools/pmt/master/ksfa', 'interNodeKmcKsfStandby': 'tools/pmt/standby/ksfb', 'interNodeTLSEnabled': '1', 'interNodeTlsCaFiles': 'ca.pem,', 'interNodeTlsCaPath': 'security/grpc/ca/', 'interNodeTlsCert': 'security/grpc/certs/server.pem', 'interNodeTlsCrlFiles': 'server_crl.pem,', 'interNodeTlsCrlPath': 'security/grpc/certs/', 'interNodeTlsPk': 'security/grpc/keys/server.key.pem', 'interNodeTlsPkPwd': 'security/grpc/pass/mindie_server_key_pwd.txt', 'isMaster': '0', 'localIP': '', 'local_rank': '1', 'log_error': '1', 'log_file_num': '20', 'log_file_size': '20', 'log_info': '1', 'log_verbose': '0', 'log_warning': '1', 'masterIP': '', 'max_input_len': '2048', 'max_iter_times': '512', 'max_prefill_tokens': '8192', 'max_seq_len': '2560', 'model_id': '/data/QwQ-32B/', 'model_instance_number': '1', 'model_instance_type': 'Standard', 'model_name': 'QwQ-32B', 'multiNodesInferEnabled': '0', 'multiNodesInferPort': '1120', 'npu_device_id': '1', 'npu_device_ids': '0,1,2,3', 'npu_mem': '-1', 'rank': '1', 'slaveIPs': '', 'speculation_gamma': '0', 'trust_remote_code': '0', 'world_size': '4'}
[2025-03-11 18:15:07.011+0800] [14019] [281473265430944] [mindie-server] [ERROR] [model.py:40] : [Model]        >>> Exception:NPU out of memory. Tried to allocate 136.00 MiB (NPU 1; 32.00 GiB total capacity; 9.00 GiB already allocated; 9.00 GiB current active; 22.12 MiB free; 9.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
Traceback (most recent call last):
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/model_wrapper/model.py", line 38, in initialize
    return self.python_model.initialize(config)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/model_wrapper/standard_model.py", line 147, in initialize
    self.generator = Generator(
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/mindie_llm/text_generator/generator.py", line 80, in __init__
    self.generator_backend = get_generator_backend(model_config)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 26, in get_generator_backend
    return generator_cls(model_config)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 97, in __init__
    super().__init__(model_config)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 113, in __init__
    self.model_wrapper = get_model_wrapper(model_config, backend_type)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 15, in get_model_wrapper
    return wrapper_cls(**model_config)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 52, in __init__
    self.model_runner.load_weights()
  File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 175, in load_weights
    self.model.to(weights.device)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch_npu/utils/_module.py", line 78, in to
    return self._apply(convert)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/root/miniconda3/envs/Python310/lib/python3.10/site-packages/torch_npu/utils/_module.py", line 76, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: NPU out of memory. Tried to allocate 136.00 MiB (NPU 1; 32.00 GiB total capacity; 9.00 GiB already allocated; 9.00 GiB current active; 22.12 MiB free; 9.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
[2025-03-11 18:15:07.011+0800] [14019] [281473265430944] [mindie-server] [ERROR] [model.py:43] : [MIE04E13030A] [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}

Ascend/ModelZoo-PyTorch
Paused

Content Risk Flag

Comments (4)

Ascend/ModelZoo-PyTorchPaused .gitee-modal { width: 500px !important; }

Content Risk Flag