我的demo 代码如下:import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "/root/clark/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
max_memory = {i: "75GB" for i in range(8)}
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
messages = [
{"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)
这里没有看到torch_npu的使用,如果是其他包的问题,请向对应包开发人员咨询解决。
我理解是DeepSeek-V2-Chat 这个模型里面有可能用到的,上面代码的model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory) 执行会提示要安装flash_attn
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
同样的问题,昇腾支持flash-attn2吗?现在pip install flash-atten的时候就会报错因此无法使用。
python官方库有个flash-attention看上去是昇腾上传的,但当前完全没有相关文档。
这里报错是因为flash_attn需要cuda环境,npu可以不用安装此包,直接使用对应算子
能详细说一下或者给一个例子吗?
模型本身的使用情况可以去model_zoo仓咨询
https://gitee.com/ascend/modelzoo
这个是新加的吗,看文档是5月17日更新的,该如何做呢需要重新安装torch_npu吗?
资料中有说明,把你用到的flash_attn中的使用这个算子替代即可
替换的算子里的query 是什么?
这个对应的是q
这个inner_precise是啥?
为啥没有npu_fusion_attention方法呢?
另外这几个方法替换在哪能找到呢?
这个可以直接去掉
请使用支持的版本
这些可以参考源码实现,使用的都是现有算子
看文档是touch_npu 是2.1.0 就行,我这个不行吗?
我上午又重新弄了一个torch_npu 为什么还是不行,是哪里出了一个问题,没有npu_fusion_attention方法
你好这几个算子在哪找到替换方法呢?项目比较着急
麻烦看到尽快回复一下
什么项目?可以直接联系PAE或者FAE
这个错误是什么原因?模型推理时报得错误:WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk.
E19999: Inner Error!
E19999 The node If_ScatterElements2_380 input 0 does not have peer[FUNC:ConstructIoEquivalent][FILE:equivalent_data_anchors.cc][LINE:37]
TraceBack (most recent call last):
Assert ((eq_data_anchors.ConstructEquivalentAnchors(exe_graph_)) == ge::GRAPH_SUCCESS) failed[FUNC:Build][FILE:model_v2_executor_builder.cc][LINE:111]
Assert ((executor) != nullptr) failed[FUNC:CreateAndLoad][FILE:stream_executor.cc][LINE:38]
--3333--
Traceback (most recent call last):
File "demo2.py", line 17, in
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1691, in forward
outputs = self.model(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1560, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1277, in forward
hidden_states = self.mlp(hidden_states)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 586, in forward
y = self.moe_infer(hidden_states, topk_idx, topk_weight).view(*orig_shape)
File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 631, in moe_infer
tokens_per_expert = tokens_per_expert.cpu().numpy()
RuntimeError: The Inner error is reported as above.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, pleace set the environment variable ASCEND_LAUNCH_BLOCKING=1.
[ERROR] 2024-05-25-23:21:14 (PID:2307838, Device:1, RankID:-1) ERR00100 PTA call acl api failed
我把ASCEND_LAUNCH_BLOCKING=1 后重新执行提示如下错误Loading checkpoint shards: 100%|██████████| 55/55 [03:33<00:00, 3.87s/it] WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk. --3333-- Traceback (most recent call last): File "demo2.py", line 17, in <module> outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 1575, in generate result = self._sample( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/transformers/generation/utils.py", line 2697, in _sample outputs = self( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1691, in forward outputs = self.model( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1560, in forward layer_outputs = decoder_layer( File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 1277, in forward hidden_states = self.mlp(hidden_states) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 572, in forward topk_idx, topk_weight, aux_loss = self.gate(hidden_states) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/luke/lib/python3.8/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/DeepSeek-V2-Chat/modeling_deepseek.py", line 452, in forward group_mask.scatter_(1, group_idx, 1) # [n, n_group] RuntimeError: InnerRun:torch_npu/csrc/framework/OpParamMaker.cpp:197 NPU error, error code is 100000 [ERROR] 2024-05-26-02:10:15 (PID:2873200, Device:1, RankID:-1) ERR01100 OPS call acl api failed [Error]: Parameter verification failed. Check whether the input parameter value of the interface is correct. E19999: Inner Error! E19999 The node If_ScatterElements2_380 input 0 does not have peer[FUNC:ConstructIoEquivalent][FILE:equivalent_data_anchors.cc][LINE:37] TraceBack (most recent call last): Assert ((eq_data_anchors.ConstructEquivalentAnchors(exe_graph_)) == ge::GRAPH_SUCCESS) failed[FUNC:Build][FILE:model_v2_executor_builder.cc][LINE:111] Assert ((executor) != nullptr) failed[FUNC:CreateAndLoad][FILE:stream_executor.cc][LINE:38]
哥,看到回复一下吧
算子报错
我看代码是这里报错了,怎么修改一下呢?
如果是模型问题,请咨询模型相关人员。
我看晟腾的官方文档支持这个算子,为啥会报错呢?
登录 后才可以发表评论