一、问题现象(附报错日志上下文):
Exception in callback _raise_exception_on_finish(request_tracker=<vllm.engine....xfffd3420f910>)(<Task finishe...ode:507018'))>) at /efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py:22
handle: <Handle _raise_exception_on_finish(request_tracker=<vllm.engine....xfffd3420f910>)(<Task finishe...ode:507018'))>) at /efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py:22>
Traceback (most recent call last):
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
task.result()
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 349, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 328, in engine_step
request_outputs = await self.engine.step_async()
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 191, in step_async
output = await self._run_workers_async(
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 220, in _run_workers_async
all_outputs = await asyncio.gather(*all_outputs)
File "/home/ma-user/miniconda3/lib/python3.9/asyncio/tasks.py", line 688, in _wrap_awaitable
return (yield from awaitable.await())
ray.exceptions.RayTaskError(RuntimeError): [36mray::RayWorker.execute_method()[39m (pid=320730, ip=172.17.0.32, actor_id=cc8005be49463c57423330cd01000000, repr=<vllm.engine.ray_utils.RayWorker object at 0xffee5c066eb0>)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/ray_utils.py", line 35, in execute_method
return executor(*args, **kwargs)
File "/home/ma-user/miniconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/worker/atb_worker.py", line 326, in execute_model
output = self.model(
File "/home/ma-user/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/backend_warpper/atb.py", line 115, in forward
output = self.sampler(logits, input_metadata)
File "/home/ma-user/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/layers/sampler.py", line 158, in forward
return _sample(probs, logprobs, input_metadata, prob_indexes=slice_indexes)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/layers/sampler.py", line 578, in _sample
sample_results = _random_sample(seq_groups, is_prompts, category_probs, category_indexes)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/model_executor/layers/sampler.py", line 467, in _random_sample
random_samples = random_samples.cpu()
RuntimeError: ACL stream synchronize failed, error code:507018
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ma-user/miniconda3/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
raise exc
File "/efs_guiyang/majt/atb/ascend-vllm/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
二、软件版本:
-- CANN 版本 (e.g., CANN 3.0.x,5.x.x): 7.x
--Tensorflow/Pytorch/MindSpore 版本:Pytorch 2.0.1
--Python 版本 (e.g., Python 3.7.5): 3.9.18
-- MindStudio版本 (e.g., MindStudio 2.0.0 (beta3)):
--操作系统版本 (e.g., Ubuntu 18.04): EulerOS 2.0 (SP10)
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
请求数少的场景也有这个报错吗,还是说到一定程度出现问题
上传一下运行的debug日志,可以通过设置以下环境变量再重定向到txt文件的方式来获取一下报错的debug日志后看看具体的报错原因:
export ASCEND_GLOBAL_LOG_LEVEL=0
export ASCEND_SLOG_PRINT_TO_STDOUT=1
由于您长时间未回复,此issue先关闭,有问题重新提,谢谢
登录 后才可以发表评论