pytorch运行官方demo出错

**env:** 
    hardware: 300 I pro
    Ascend-cann-toolkit_7.0.0.alpha002
    pytorch==1.11.0
按照官方运行如下示例成功：
```
import torch
import torch_npu

x = torch.randn(2, 2).npu()
y = torch.randn(2, 2).npu()
z = x.mm(y)

print(z)
```
但运行如下代码失败：
```
# 引入模块
import time
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torchvision

import torch_npu
from torch_npu.npu import amp # 导入AMP模块
from torch_npu.contrib import transfer_to_npu    # 使能自动迁移

# 初始化运行device
device = torch.device('cuda:0')

# 定义模型网络
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.net = nn.Sequential(
            # 卷积层
            nn.Conv2d(in_channels=1, out_channels=16,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1),
            # 池化层
            nn.MaxPool2d(kernel_size=2),
            # 卷积层
            nn.Conv2d(16, 32, 3, 1, 1),
            # 池化层
            nn.MaxPool2d(2),
            # 将多维输入一维化
            nn.Flatten(),
            nn.Linear(32*7*7, 16),
            # 激活函数
            nn.ReLU(),
            nn.Linear(16, 10)
        )
    def forward(self, x):
        return self.net(x)

# 下载数据集
train_data = torchvision.datasets.MNIST(
    root='mnist',
    download=True,
    train=True,
    transform=torchvision.transforms.ToTensor()
)

# 定义训练相关参数
batch_size = 64   
model = CNN().to(device)  # 定义模型
train_dataloader = DataLoader(train_data, batch_size=batch_size)    # 定义DataLoader
loss_func = nn.CrossEntropyLoss().to(device)    # 定义损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)    # 定义优化器
scaler = amp.GradScaler()    # 在模型、优化器定义之后，定义GradScaler
epochs = 10  # 设置循环次数

# 设置循环
for epoch in range(epochs):
    for imgs, labels in train_dataloader:
        start_time = time.time()    # 记录训练开始时间
        imgs = imgs.to(device)    # 把img数据放到指定NPU上
        labels = labels.to(device) 
        with amp.autocast(): # 把label数据放到指定NPU上
            outputs = model(imgs)    # 前向计算
            loss = loss_func(outputs, labels)    # 损失函数计算
        optimizer.zero_grad()
        scaler.scale(loss).backward()    # loss缩放并反向转播
        scaler.step(optimizer)    # 更新参数（自动unscaling）
        scaler.update()    # 基于动态Loss Scale更新loss_scaling系数
        loss.backward()    # 损失函数反向计算
        optimizer.step()    # 更新优化器

# 定义保存模型
torch.save({
               'epoch': 10,
               'arch': CNN,
               'state_dict': model.state_dict(),
               'optimizer' : optimizer.state_dict(),
            },'checkpoint.pth.tar')
```
错误日志在
```
EZ3003: No supported Ops kernel and engine are found for [MaxPoolGradWithArgmaxV144], optype [MaxPoolGradWithArgmaxV1].
        Possible Cause: The operator is not supported by the system. Therefore, no hit is found in any operator information library.
        Solution: 1. Check that the OPP component is installed properly. 2. Submit an issue to request for the support of this operator type.
        TraceBack (most recent call last):
        build graph failed, graph id:43, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615]
        [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]

terminate called after throwing an instance of 'std::runtime_error'
  what():  ASCEND kernel errors might be asynchronously reported at some other API call, so the stacktrace below is not the root cause of the problem.
For getting the stacktrace of OP in PyTorch, consider passing ASCEND_LAUNCH_BLOCKING=1.
```
请问原因是什么，代码参考来源于https://www.hiascend.com/document/detail/zh/canncommercial/70RC1/modeldevpt/ptmigr/AImpug_0003.html

Ascend/pytorch
暂停

内容风险标识

评论 (4)

Ascend/pytorch暂停 .gitee-modal { width: 500px !important; }

内容风险标识