一、问题现象(附报错日志上下文):
910b单机八卡推理Qwen-72b,模型准备阶段npu0 oom,其他卡无负载。
/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/torch_npu/dynamo/init.py:18: UserWarning: Register eager implementation for the 'npu' backend of dynamo, as torch_npu was not compiled with torchair.
warnings.warn(
/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/torch_npu/utils/path_manager.py:77: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/latest owner does not match the current user.
warnings.warn(f"Warning: The {path} owner does not match the current user.")
/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/torch_npu/utils/path_manager.py:77: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/7.0.0/aarch64-linux/ascend_toolkit_install.info owner does not match the current user.
warnings.warn(f"Warning: The {path} owner does not match the current user.")
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 12%|█████████████▋ Loading checkpoint shards: 13%|███████████████
Loading checkpoint shards: 15%|████████████████▍
Loading checkpoint shards: 16%|█████████████████▊
Loading checkpoint shards: 17%|███████████████████
Loading checkpoint shards: 22%|████████████████████████▌ | 18/82 [00:19<01:08, 1.07s/it]
Traceback (most recent call last):0:17<00:38, 1.76it/s]
File "/home/ljl/Qwen-main/predict.py", line 19, in
model = AutoModelForCausalLM.from_pretrained("/home/ljl/qwen-72b",device_map="auto",max_memory=max_memory, trust_remote_code=True,bf16=True).eval()
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained
) = cls._load_pretrained_model(
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/modeling_utils.py", line 807, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 369, in set_module_tensor_to_device
new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
RuntimeError: NPU out of memory. Tried to allocate 386.00 MiB (NPU 0; 32.00 GiB total capacity; 31.57 GiB already allocated; 31.57 GiB current active; 265.07 MiB free; 31.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
二、软件版本:
-- CANN 版本 (e.g., CANN 3.0.x,5.x.x): 7.0.0
--Tensorflow/Pytorch/MindSpore 版本:pytorch 2.1.0,transfomers 4.37.1,accelerate 0.26.1
--Python 版本 (e.g., Python 3.7.5):3.9
-- MindStudio版本 (e.g., MindStudio 2.0.0 (beta3)):
--操作系统版本 (e.g., Ubuntu 18.04):centos7
三、测试步骤:
直接运行Qwen官方推理脚本
四、日志信息:
xxxx
请根据自己的运行环境参考以下方式搜集日志信息,如果涉及到算子开发相关的问题,建议也提供UT/ST测试和单算子集成测试相关的日志。
日志提供方式:
将日志打包后作为附件上传。若日志大小超出附件限制,则可上传至外部网盘后提供链接。
获取方法请参考wiki:
https://gitee.com/ascend/modelzoo/wikis/如何获取日志和计算图?sort_id=4097825
您好,使用的是modelzoo模型么,如果是非modelzoo模型,请优先咨询您的FAE或者PAE
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
你好,我也想试这个模型,请问问题解决了吗
不能多卡推理,原因应该是不能切换gpu,有什么解决方案么?
6.0.rc1之后,2.1.0及以上版本已经支持单进程多卡
登录 后才可以发表评论