name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
/device gpu
The same program can run normally on T4gpu, but it will fail on V100
log:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/local/python-3.7.5/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/local/python-3.7.5/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
task = get()
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
ModuleNotFoundError: No module named 'tvm'
Traceback (most recent call last):
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/parallel_compile/akg_compiler/akg_process.py", line 128, in compile
res.get(timeout=self.wait_time)
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/pool.py", line 653, in get
raise TimeoutError
multiprocessing.context.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/remote/kernel_build_server_gpu.py", line 87, in <module>
messager.run()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/remote/kernel_build_server.py", line 119, in run
self.loop()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/remote/kernel_build_server.py", line 116, in loop
self.handle()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/remote/kernel_build_server_gpu.py", line 56, in handle
res = self.akg_builder.compile()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/remote/kernel_build_server.py", line 33, in compile
return self.akg_builder.compile()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/_extends/parallel_compile/akg_compiler/akg_process.py", line 128, in compile
res.get(timeout=self.wait_time)
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/pool.py", line 623, in __exit__
self.terminate()
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/pool.py", line 548, in terminate
self._terminate()
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/util.py", line 201, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/local/python-3.7.5/lib/python3.7/multiprocessing/pool.py", line 585, in _terminate_pool
"Cannot have cache with result_hander not alive")
AssertionError: Cannot have cache with result_hander not alive
[ERROR] SESSION(3269,python):2021-05-12-09:48:57.778.803 [mindspore/ccsrc/backend/session/kernel_build_client.h:110] Response] Response is empty
Traceback (most recent call last):
File "integrate_checkpoint.py", line 217, in <module>
integrate_ckpt_file()
File "integrate_checkpoint.py", line 141, in integrate_ckpt_file
output_ids = generate(model_predict, input_ids, config.seq_length, 9)
File "/userhome/pclproject/gpt/transformModelToGPU/generate.py", line 46, in generate
logits = model.predict(ms.Tensor(input_ids, ms.int32)).asnumpy().reshape(1, seq_length, -1)
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 791, in predict
result = self._predict_network(*predict_data)
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 341, in __call__
out = self.compile_and_run(*inputs)
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 608, in compile_and_run
self.compile(*inputs)
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/nn/cell.py", line 595, in compile
_executor.compile(self, *inputs, phase=self.phase, auto_parallel_mode=self._auto_parallel_mode)
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/common/api.py", line 494, in compile
result = self._executor.compile(obj, args_list, phase, use_vm)
RuntimeError: mindspore/ccsrc/backend/session/kernel_build_client.h:110 Response] Response is empty
Hey yan-dasen, Welcome to MindSpore Community.
All of the projects in MindSpore Community are maintained by @mindspore-ci-bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at https://gitee.com/mindspore/community/blob/master/command.md to find the details.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), also you can visit "https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md" to find more.
为了让问题更快得到响应,请您为该issue打上组件(comp)或兴趣组(sig)标签,打上标签的问题可以直接推送给责任人进行处理。更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md"
以组件问题为例,如果你发现问题是data组件造成的,你可以这样评论:
//comp/data
当然你也可以向data SIG组求助,可以这样写:
//comp/data
//sig/data
如果是一个简单的问题,你可以留给刚进入社区的小伙伴来回答,这时候你可以这样写:
//good-first-issue
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!
我遇到了同样的问题,运行的模型是Pangu-alpha_2.6B.ckpt,model是2B6
机器:A100
mindspore:1.2.0 gpu版本
Ubuntu18.04
python:3.8
而且我已经安装了tvm1.0
//device/gpu
//sig/akg
我也遇到了同样的问题,猜测是不是mindspore不支持rtx类型显卡,请老师尽快询问一下内核开发人员,谢谢!
问题+1
看是用安装包还是源码编译的。安装包的方式要确认一下安装过程有没有出错,源码编译方式要看下PYTHONPATH设置
由于长时间没有反馈,此ISSUE先关闭,如有问题,可以反馈下具体信息,并将ISSUE状态修改为WIP,我们这边会进一步跟踪,谢谢
登录 后才可以发表评论