75 Star 580 Fork 1.1K

Ascend/pytorch

pip install flash_attn 在npu上执行提示报错

DONE
Bug-Report
创建于  
2024-05-13 15:52

输入图片说明
输入图片说明
我的demo 代码如下:import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "/root/clark/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

max_memory = {i: "75GB" for i in range(8)}
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
{"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

评论 (33)

282583553 创建了Bug-Report 1年前

这里没有看到torch_npu的使用,如果是其他包的问题,请向对应包开发人员咨询解决。

我理解是DeepSeek-V2-Chat 这个模型里面有可能用到的,上面代码的model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory) 执行会提示要安装flash_attn

同样的问题,昇腾支持flash-attn2吗?现在pip install flash-atten的时候就会报错因此无法使用。
python官方库有个flash-attention看上去是昇腾上传的,但当前完全没有相关文档。

你们解决了吗?

解决了吗

这里报错是因为flash_attn需要cuda环境,npu可以不用安装此包,直接使用对应算子

能详细说一下或者给一个例子吗?

模型本身的使用情况可以去model_zoo仓咨询
https://gitee.com/ascend/modelzoo

这个是新加的吗,看文档是5月17日更新的,该如何做呢需要重新安装torch_npu吗?输入图片说明

资料中有说明,把你用到的flash_attn中的使用这个算子替代即可

输入图片说明替换的算子里的query 是什么?

这个对应的是q

输入图片说明这个inner_precise是啥?

为啥没有npu_fusion_attention方法呢?输入图片说明

另外这几个方法替换在哪能找到呢?输入图片说明

这个可以直接去掉

请使用支持的版本

这些可以参考源码实现,使用的都是现有算子

输入图片说明
输入图片说明看文档是touch_npu 是2.1.0 就行,我这个不行吗?

我上午又重新弄了一个torch_npu 为什么还是不行,是哪里出了一个问题,没有npu_fusion_attention方法输入图片说明
输入图片说明

你好这几个算子在哪找到替换方法呢?项目比较着急输入图片说明

麻烦看到尽快回复一下

什么项目?可以直接联系PAE或者FAE

这个错误是什么原因?模型推理时报得错误:WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk.
E19999: Inner Error!
E19999 The node If_ScatterElements2_380 input 0 does not have peer[FUNC:ConstructIoEquivalent][FILE:equivalent_data_anchors.cc][LINE:37]
TraceBack (most recent call last):
Assert ((eq_data_anchors.ConstructEquivalentAnchors(exe_graph_)) == ge::GRAPH_SUCCESS) failed[FUNC:Build][FILE:model_v2_executor_builder.cc][LINE:111]
Assert ((executor) != nullptr) failed[FUNC:CreateAndLoad][FILE:stream_executor.cc][LINE:38]

--3333--
Traceback (most recent call last):
File "demo2.py", line 17, in
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1691, in forward
outputs = self.model(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1560, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1277, in forward
hidden_states = self.mlp(hidden_states)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 586, in forward
y = self.moe_infer(hidden_states, topk_idx, topk_weight).view(*orig_shape)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 631, in moe_infer
tokens_per_expert = tokens_per_expert.cpu().numpy()
RuntimeError: The Inner error is reported as above.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-05-25-23:21:14 (PID:2307838, Device:1, RankID:-1) ERR00100 PTA call acl api failed

我把ASCEND_LAUNCH_BLOCKING=1 后重新执行提示如下错误Loading checkpoint shards: 100%|██████████| 55/55 [03:33<00:00, 3.87s/it] WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk. --3333-- Traceback (most recent call last): File "demo2.py", line 17, in <module> outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 1575, in generate result = self._sample( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 2697, in _sample outputs = self( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1691, in forward outputs = self.model( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1560, in forward layer_outputs = decoder_layer( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1277, in forward hidden_states = self.mlp(hidden_states) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 572, in forward topk_idx, topk_weight, aux_loss = self.gate(hidden_states) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 452, in forward group_mask.scatter_(1, group_idx, 1) # [n, n_group] RuntimeError: InnerRun:torch_npu/csrc/framework/OpParamMaker.cpp:197 NPU error, error code is 100000 [ERROR] 2024-05-26-02:10:15 (PID:2873200, Device:1, RankID:-1) ERR01100 OPS call acl api failed [Error]: Parameter verification failed. Check whether the input parameter value of the interface is correct. E19999: Inner Error! E19999 The node If_ScatterElements2_380 input 0 does not have peer[FUNC:ConstructIoEquivalent][FILE:equivalent_data_anchors.cc][LINE:37] TraceBack (most recent call last): Assert ((eq_data_anchors.ConstructEquivalentAnchors(exe_graph_)) == ge::GRAPH_SUCCESS) failed[FUNC:Build][FILE:model_v2_executor_builder.cc][LINE:111] Assert ((executor) != nullptr) failed[FUNC:CreateAndLoad][FILE:stream_executor.cc][LINE:38]

哥,看到回复一下吧

算子报错

我看代码是这里报错了,怎么修改一下呢?输入图片说明

如果是模型问题,请咨询模型相关人员。

我看晟腾的官方文档支持这个算子,为啥会报错呢?输入图片说明

huangyunlong 任务状态TODO 修改为Analysing 12个月前
huangyunlong 任务状态Analysing 修改为DONE 6个月前

登录 后才可以发表评论

状态
负责人
项目
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
预计工期 (小时)
开始日期   -   截止日期
-
置顶选项
优先级
里程碑
分支
参与者(4)
huangyunlong-huangyunlong2022 grazie-grazie123 282583553-mhsh 9724295 xunmenglt 1698985832
Python
1
https://gitee.com/ascend/pytorch.git
git@gitee.com:ascend/pytorch.git
ascend
pytorch
pytorch

搜索帮助