77 Star 596 Fork 1.1K

Ascend/pytorch

910b单机八卡推理Qwen72b报OOM

DONE
推理问题
创建于  
2024-02-06 12:38

一、问题现象(附报错日志上下文):
910b单机八卡推理Qwen-72b,模型准备阶段npu0 oom,其他卡无负载。
/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/torch_npu/dynamo/init.py:18: UserWarning: Register eager implementation for the 'npu' backend of dynamo, as torch_npu was not compiled with torchair.
warnings.warn(
/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/torch_npu/utils/path_manager.py:77: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/latest owner does not match the current user.
warnings.warn(f"Warning: The {path} owner does not match the current user.")
/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/torch_npu/utils/path_manager.py:77: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/7.0.0/aarch64-linux/ascend_toolkit_install.info owner does not match the current user.
warnings.warn(f"Warning: The {path} owner does not match the current user.")
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 12%|█████████████▋ Loading checkpoint shards: 13%|███████████████
Loading checkpoint shards: 15%|████████████████▍
Loading checkpoint shards: 16%|█████████████████▊
Loading checkpoint shards: 17%|███████████████████
Loading checkpoint shards: 22%|████████████████████████▌ | 18/82 [00:19<01:08, 1.07s/it]
Traceback (most recent call last):0:17<00:38, 1.76it/s]
File "/home/ljl/Qwen-main/predict.py", line 19, in
model = AutoModelForCausalLM.from_pretrained("/home/ljl/qwen-72b",device_map="auto",max_memory=max_memory, trust_remote_code=True,bf16=True).eval()
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained
) = cls._load_pretrained_model(
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/transformers/modeling_utils.py", line 807, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/ljl/miniconda3/envs/Qwen/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 369, in set_module_tensor_to_device
new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
RuntimeError: NPU out of memory. Tried to allocate 386.00 MiB (NPU 0; 32.00 GiB total capacity; 31.57 GiB already allocated; 31.57 GiB current active; 265.07 MiB free; 31.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

二、软件版本:
-- CANN 版本 (e.g., CANN 3.0.x,5.x.x): 7.0.0
--Tensorflow/Pytorch/MindSpore 版本:pytorch 2.1.0,transfomers 4.37.1,accelerate 0.26.1
--Python 版本 (e.g., Python 3.7.5):3.9
-- MindStudio版本 (e.g., MindStudio 2.0.0 (beta3)):
--操作系统版本 (e.g., Ubuntu 18.04):centos7

三、测试步骤:
直接运行Qwen官方推理脚本

四、日志信息:
xxxx
请根据自己的运行环境参考以下方式搜集日志信息,如果涉及到算子开发相关的问题,建议也提供UT/ST测试和单算子集成测试相关的日志。

日志提供方式:
将日志打包后作为附件上传。若日志大小超出附件限制,则可上传至外部网盘后提供链接。

获取方法请参考wiki:
https://gitee.com/ascend/modelzoo/wikis/如何获取日志和计算图?sort_id=4097825

评论 (5)

XiaFeng 创建了推理问题 1年前

您好,使用的是modelzoo模型么,如果是非modelzoo模型,请优先咨询您的FAE或者PAE

你好 modelzoo里没有找到 qwen的模型 请问地址在哪来呢

Destiny 任务状态TODO 修改为Analysing 1年前

你好,我也想试这个模型,请问问题解决了吗

不能多卡推理,原因应该是不能切换gpu,有什么解决方案么?

6.0.rc1之后,2.1.0及以上版本已经支持单进程多卡

huangyunlong 任务状态Analysing 修改为DONE 10个月前

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(6)
Destiny-wx1103340 guocongdi-ffj_gcd wangjian-wangjianwj sanderhill-sanderhill huangyunlong-huangyunlong2022 XiaFeng-xiafengL
Python
1
https://gitee.com/ascend/pytorch.git
git@gitee.com:ascend/pytorch.git
ascend
pytorch
pytorch

搜索帮助