67 Star 251 Fork 184

Ascend/modelzoo

 / 详情

ACL stream synchronize failed, error code:507018

DONE
推理问题
创建于  
2024-01-16 17:37

一、问题现象(附报错日志上下文):
Exception in callback _raise_exception_on_finish(request_tracker=<vllm.engine....xfffd3420f910>)(<Task finishe...ode:507018'))>) at /efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py:22
handle: <Handle _raise_exception_on_finish(request_tracker=<vllm.engine....xfffd3420f910>)(<Task finishe...ode:507018'))>) at /efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py:22>
Traceback (most recent call last):
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
task.result()
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 349, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 328, in engine_step
request_outputs = await self.engine.step_async()
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 191, in step_async
output = await self._run_workers_async(
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 220, in _run_workers_async
all_outputs = await asyncio.gather(*all_outputs)
File "/home/ma-user/miniconda3/lib/python3.9/asyncio/tasks.py", line 688, in _wrap_awaitable
return (yield from awaitable.await())
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorker.execute_method() (pid=320730, ip=172.17.0.32, actor_id=cc8005be49463c57423330cd01000000, repr=<vllm.engine.ray_utils.RayWorker object at 0xffee5c066eb0>)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/ray_utils.py", line 35, in execute_method
return executor(*args, **kwargs)
File "/home/ma-user/miniconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/worker/atb_worker.py", line 326, in execute_model
output = self.model(
File "/home/ma-user/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/backend_warpper/atb.py", line 115, in forward
output = self.sampler(logits, input_metadata)
File "/home/ma-user/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/layers/sampler.py", line 158, in forward
return _sample(probs, logprobs, input_metadata, prob_indexes=slice_indexes)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/layers/sampler.py", line 578, in _sample
sample_results = _random_sample(seq_groups, is_prompts, category_probs, category_indexes)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/layers/sampler.py", line 467, in _random_sample
random_samples = random_samples.cpu()
RuntimeError: ACL stream synchronize failed, error code:507018

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ma-user/miniconda3/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
raise exc
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

二、软件版本:
-- CANN 版本 (e.g., CANN 3.0.x,5.x.x): 7.x
--Tensorflow/Pytorch/MindSpore 版本:Pytorch 2.0.1
--Python 版本 (e.g., Python 3.7.5): 3.9.18
-- MindStudio版本 (e.g., MindStudio 2.0.0 (beta3)):
--操作系统版本 (e.g., Ubuntu 18.04): EulerOS 2.0 (SP10)

评论 (8)

JuntongMa 创建了推理问题 1年前
JuntongMa 修改了描述 1年前
洪飞 任务状态TODO 修改为Analysing 1年前
洪飞 添加协作者潘晗煜 1年前
展开全部操作日志

请问您的报错是什么场景?在线推理还是离线推理?

我这边是使用ATB框架部署了一个8K句长的推理服务,20个并发请求就会报这个报错

请求数少的场景也有这个报错吗,还是说到一定程度出现问题

请求数少的话,不会出现该问题,我这边的请求数要求是20-30;另外,句长是2k或4K,60并发也不会出现该问题,很奇怪

您昇腾产品型号是多少,cann版本是多少呢

我在910B2和910B4上,都有这个问题
cann的版本是7.0.0

上传一下运行的debug日志,可以通过设置以下环境变量再重定向到txt文件的方式来获取一下报错的debug日志后看看具体的报错原因:
export ASCEND_GLOBAL_LOG_LEVEL=0
export ASCEND_SLOG_PRINT_TO_STDOUT=1

潘晗煜 任务状态Analysing 修改为DONE 1年前

由于您长时间未回复,此issue先关闭,有问题重新提,谢谢

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(3)
洪飞-hong-fei1997 JuntongMa-majt-code-space 潘晗煜-panhanyu
1
https://gitee.com/ascend/modelzoo.git
git@gitee.com:ascend/modelzoo.git
ascend
modelzoo
modelzoo

搜索帮助