77 Star 597 Fork 1.2K

Ascend/pytorch

torch._foreach_add 不支持

DONE
缺陷
创建于  
2024-06-25 16:28

日志如下:
Traceback (most recent call last):
File "/root/anaconda3/envs/sjj/bin/nnUNetv2_train", line 8, in
sys.exit(run training entry())
File "/home/openlab/nnUNet-master/nnunetv2/run/run_training.py", line 271, in run_training_entry
,squбтам раuтеидаиd sбuе 'd'sбuе 'uq sбuе (PlOд sвuе ‘uотдеипбтуиоо sвие ‘рт ло эшеи даsедер’sвле)бuтитеиа иnи
File "/home/openlab/nnUNet-master/nnunetv2/run/run_training.py", line 207, in run_training
nnunet trainer.run training( )
File "/home/openlab/nnUNet-master/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1288, in run_training
train_outputs.append(self.train_step(next(self.dataloader train)))
File "/home/openlab/nnUNet-master/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", Line 923, in train_step
self.grad scaler.step(self.optimizer)
File "/root/anaconda3/envs/sjj/Lib/python3.9/site-packages/torch_npu/npu/amp/grad_scaler.py", line 402, in step
retval = self. maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
doqs~ado-aq人ewT uT 'SbS autl ' Adrualeos peuß/dwe/ndu/nduTypJ03/sabeyped-a4Ts/6 SuoU)/d/qT1/([s/sAuO/gepuoogue/100./m a1TH
retval = optimizer.step(*args, **kwargs)
File "/root/anaconda3/envs/sjj/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/root/anaconda3/envs/sjj/lib/python3.9/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/root/anaconda3/envs/sjj/lib/python3.9/site-packages/torch/optim/optimizer.py", line 33, in _use_grade
ret = func(self, *args, **kwargs)
File "/root/anaconda3/envs/sjj/Lib/python3.9/site-packages/torch/optim/sgd.py", Line 76, in step
sgd(params with grad,
File "/root/anaconda3/envs/sjj/Lib/python3.9/site-packages/torch/optim/sgd.py", line 222, in sgd
func(params ,
File "/root/anaconda3/envs/sjj/Lib/python3.9/site-packages/torch/optim/sgd.py", line 291, in _multi_tensor_sgd
device_grads = torch. foreach add(device grads, device params, alpha=weight decay)
RuntimeError: call aclnnForeachAddList failed, detail:EZ1001: Get path and read config failed.
TraceBack (most recent call last):
Check NnopbaseCollecterWork(binCollecter.get()) failed
Assert ((NnopbaseInit()) == 0) failed
Check NnopbaseCreateExecutorSpace(&executorSpace) failed

评论 (2)

sjj0305 创建了缺陷 12个月前

提供cann,torch,torch_npu版本信息以及详细plog日志

更新使用最新2.1版本解决

huangyunlong 任务状态TODO 修改为DONE 12个月前

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(2)
huangyunlong-huangyunlong2022 sjj0305-sjj0305
Python
1
https://gitee.com/ascend/pytorch.git
git@gitee.com:ascend/pytorch.git
ascend
pytorch
pytorch

搜索帮助