登录
注册
开源
企业版
高校版
搜索
帮助中心
使用条款
关于我们
开源
企业版
高校版
私有云
模力方舟
AI 队友
登录
注册
轻量养虾,开箱即用!低 Token + 稳定算力,Gitee & 模力方舟联合出品的 PocketClaw 正式开售!点击了解详情~
代码拉取完成,页面将自动刷新
开源项目
>
人工智能
>
大模型
&&
捐赠
捐赠前请先登录
取消
前往登录
扫描微信二维码支付
取消
支付完成
支付提示
将跳转至支付宝完成支付
确定
取消
Watch
不关注
关注所有动态
仅关注版本发行动态
关注但不提醒动态
109
Star
894
Fork
1.4K
MindSpore
/
models
代码
Issues
120
Pull Requests
0
Wiki
统计
流水线
服务
JavaDoc
PHPDoc
质量分析
Jenkins for Gitee
腾讯云托管
腾讯云 Serverless
悬镜安全
阿里云 SAE
Codeblitz
SBOM
开发画像分析
我知道了,不再自动展开
更新失败,请稍后重试!
移除标识
内容风险标识
本任务被
标识为内容中包含有代码安全 Bug 、隐私泄露等敏感信息,仓库外成员不可访问
[Bug]: RuntimeError: Output device address's size 2 is not equal output_size_list's size 4
DONE
#I8R51I
hust_cciip_rjLuo
创建于
2023-12-25 16:14
### 问题描述 在这种条件下:context.set_context(mode=mindspore.PYNATIVE_MODE, device_target="GPU", enable_graph_kernel=True)报错: 可以运行一会儿,然后报错 终端显示报错信息: /home/lx/miniconda3/envs/ms2.2/bin/python /media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py Namespace(batch_size=4, cnn_type='vgg19', crop_size=224, data_name='f30k', data_path='data/archive', embed_size=1024, finetune=False, grad_clip=2.0, img_dim=4096, learning_rate=0.0002, log_step=10, logger_name='runs/runX', lr_update=15, margin=0.2, max_violation=False, measure='cosine', no_imgnorm=False, num_epochs=30, num_layers=1, reset_train=False, resume='', use_abs=False, use_restval=False, val_step=500, vocab_path='./vocab/', word_dim=300, workers=2) train======================= val======================= => using pre-trained model 'vgg19' iter = 0 of 36250 completed, loss = 4.8170185, lr=0.0002, time=0.6785878209921066seconds iter = 1 of 36250 completed, loss = 4.792597, lr=0.0002, time=0.1610727850056719seconds iter = 2 of 36250 completed, loss = 4.8085637, lr=0.0002, time=0.22413303097710013seconds [ERROR] KERNEL(130610,7f964a1ed740,python):2023-12-25-16:01:55.102.912 [mindspore/ccsrc/plugin/device/gpu/kernel/gpu_kernel.h:151] ResetResource] kernel must override the `ResetResource()` method when dynamic shape [ERROR] PYNATIVE(130610,7f964a1ed740,python):2023-12-25-16:01:55.107.628 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:60] operator()] Traceback (most recent call last): File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 261, in <module> main() File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 118, in main iter_loss = train_net(image, target, index, img_id) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 705, in __call__ raise err File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 701, in __call__ output = self._run_construct(args, kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 482, in _run_construct output = self.construct(*cast_inputs, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 418, in construct return self._no_sens_impl(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 434, in _no_sens_impl grads = self.grad_no_sens(self.network, self.weights)(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 388, in after_grad return grad_(fn, weights)(*args, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 121, in wrapper results = fn(*arg, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 378, in after_grad out = _pynative_executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in __call__ return self._executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/_grad_experimental/grad_nn_ops.py", line 241, in bprop dx, dhx = gru_grad_data(y, dy, dhy, w, hx, reserve, state) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 314, in __call__ return _run_op(self, self.name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 913, in _run_op stub = _pynative_executor.run_op_async(obj, op_name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1186, in run_op_async return self._executor.run_op_async(*args) RuntimeError: Output device address's size 2 is not equal output_size_list's size 4 ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/runtime/pynative/run_op_helper.cc:644 UpdateOutputDeviceInfo 进程已结束,退出代码为 139 (interrupted by signal 11:SIGSEGV) ### 环境信息 - **Hardware Environment(`Ascend`/`GPU`/`CPU`) / 硬件环境**: GPU > Please delete the backend not involved / 请删除不涉及的后端: > /device GPU - **Software Environment / 软件环境 (Mandatory / 必填)**: -- MindSpore version (e.g., 2.0.0) :ms2.2 -- Python version (e.g., Python 3.7.5) :python3.7 -- OS platform and distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20 -- GCC/Compiler version (if compiled from source): - **Execute Mode / 执行模式 (Mandatory / 必填)(`PyNative`/`Graph`)**: PyNative > Please delete the mode not involved / 请删除不涉及的模式: > /mode pynative ### 关联用例 无 ### 重现步骤 python train.py 直接运行就可以 ### 预期结果 预期能够正常运行 ### 日志/截图 终端显示报错信息: /home/lx/miniconda3/envs/ms2.2/bin/python /media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py Namespace(batch_size=4, cnn_type='vgg19', crop_size=224, data_name='f30k', data_path='data/archive', embed_size=1024, finetune=False, grad_clip=2.0, img_dim=4096, learning_rate=0.0002, log_step=10, logger_name='runs/runX', lr_update=15, margin=0.2, max_violation=False, measure='cosine', no_imgnorm=False, num_epochs=30, num_layers=1, reset_train=False, resume='', use_abs=False, use_restval=False, val_step=500, vocab_path='./vocab/', word_dim=300, workers=2) train======================= val======================= => using pre-trained model 'vgg19' iter = 0 of 36250 completed, loss = 4.8170185, lr=0.0002, time=0.6785878209921066seconds iter = 1 of 36250 completed, loss = 4.792597, lr=0.0002, time=0.1610727850056719seconds iter = 2 of 36250 completed, loss = 4.8085637, lr=0.0002, time=0.22413303097710013seconds [ERROR] KERNEL(130610,7f964a1ed740,python):2023-12-25-16:01:55.102.912 [mindspore/ccsrc/plugin/device/gpu/kernel/gpu_kernel.h:151] ResetResource] kernel must override the `ResetResource()` method when dynamic shape [ERROR] PYNATIVE(130610,7f964a1ed740,python):2023-12-25-16:01:55.107.628 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:60] operator()] Traceback (most recent call last): File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 261, in <module> main() File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 118, in main iter_loss = train_net(image, target, index, img_id) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 705, in __call__ raise err File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 701, in __call__ output = self._run_construct(args, kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 482, in _run_construct output = self.construct(*cast_inputs, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 418, in construct return self._no_sens_impl(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 434, in _no_sens_impl grads = self.grad_no_sens(self.network, self.weights)(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 388, in after_grad return grad_(fn, weights)(*args, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 121, in wrapper results = fn(*arg, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 378, in after_grad out = _pynative_executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in __call__ return self._executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/_grad_experimental/grad_nn_ops.py", line 241, in bprop dx, dhx = gru_grad_data(y, dy, dhy, w, hx, reserve, state) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 314, in __call__ return _run_op(self, self.name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 913, in _run_op stub = _pynative_executor.run_op_async(obj, op_name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1186, in run_op_async return self._executor.run_op_async(*args) RuntimeError: Output device address's size 2 is not equal output_size_list's size 4 ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/runtime/pynative/run_op_helper.cc:644 UpdateOutputDeviceInfo 进程已结束,退出代码为 139 (interrupted by signal 11:SIGSEGV)   ### 备注 这是model.py ``` import mindspore import mindspore.nn as nn import numpy as np from mindspore import Tensor from mindspore import ops, load_param_into_net from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence from models import model_vgg19 def l2norm(X): # pdb.set_trace() norm = ops.pow(X, 2).sum(axis=1).sqrt() X = ops.div(X, norm.unsqueeze(1)) return X def EncoderImage(data_name, img_dim, embed_size, finetune=False, cnn_type='vgg19', use_abs=False, no_imgnorm=False): """A wrapper to image encoders. Chooses between an encoder that uses precomputed image features, `EncoderImagePrecomp`, or an encoder that computes image features on the fly `EncoderImageFull`. """ img_enc = EncoderImageFull( embed_size, finetune, cnn_type, use_abs, no_imgnorm) return img_enc # tutorials/09 - Image Captioning class EncoderImageFull(nn.Cell): def __init__(self, embed_size, finetune=False, cnn_type='vgg19', use_abs=False, no_imgnorm=False): """Load pretrained VGG19 and replace top fc layer.""" super(EncoderImageFull, self).__init__() self.embed_size = embed_size self.no_imgnorm = no_imgnorm self.use_abs = use_abs # Load a pre-trained model self.cnn = self.get_cnn(cnn_type, True) # For efficient memory usage. for param in self.cnn.get_parameters(): param.requires_grad = finetune # Replace the last fully connected layer of CNN with a new one if cnn_type.startswith('vgg'): self.cnn.classifier self.fc = nn.Dense(4096, embed_size) self.cnn.classifier = nn.SequentialCell( *list(self.cnn.classifier)[:-1]) elif cnn_type.startswith('resnet'): self.fc = nn.Dense(self.cnn.module.fc.in_features, embed_size) self.cnn.module.fc = nn.SequentialCell() self.init_weights() def get_cnn(self, arch, pretrained): """Load a pretrained CNN and parallelize over GPUs """ if pretrained: print("=> using pre-trained model '{}'".format(arch)) model = model_vgg19.vgg19(pretrained=True) else: print("=> creating model '{}'".format(arch)) model = model_vgg19.vgg19(pretrained=False) # if arch.startswith('alexnet') or arch.startswith('vgg'): # model.features = nn.DataParallel(model.features) # else: # model = nn.DataParallel(model) # if torch.cuda.is_available(): # model.cuda() return model def load_state_dict(self, state_dict): """ Handle the models saved before commit pytorch/vision@989d52a """ if 'cnn.classifier.1.weight' in state_dict: state_dict['cnn.classifier.0.weight'] = state_dict[ 'cnn.classifier.1.weight'] del state_dict['cnn.classifier.1.weight'] state_dict['cnn.classifier.0.bias'] = state_dict[ 'cnn.classifier.1.bias'] del state_dict['cnn.classifier.1.bias'] state_dict['cnn.classifier.3.weight'] = state_dict[ 'cnn.classifier.4.weight'] del state_dict['cnn.classifier.4.weight'] state_dict['cnn.classifier.3.bias'] = state_dict[ 'cnn.classifier.4.bias'] del state_dict['cnn.classifier.4.bias'] # super(EncoderImageFull, self).load_state_dict(state_dict) load_param_into_net(super(EncoderImageFull, self), state_dict[0]) def init_weights(self): """Xavier initialization for the fully connected layer """ r = np.sqrt(6.) / np.sqrt(self.fc.in_channels + self.fc.out_channels) self.fc.weight.data.set_data(ops.uniform(self.fc.weight.data.shape, Tensor(-r, mindspore.float32), Tensor(r, mindspore.float32), dtype=mindspore.float32)) # self.fc.weight.data.uniform_(-r, r) # self.fc.bias.data.fill_(0) self.fc.bias.data.set_data(ops.full(self.fc.bias.data.shape, Tensor(0, mindspore.float32))) def construct(self, images): """Extract image feature vectors.""" features = self.cnn(images) # normalization in the image embedding space features = l2norm(features) # linear projection to the joint embedding space features = self.fc(features) # normalization in the joint embedding space if not self.no_imgnorm: features = l2norm(features) # take the absolute value of the embedding (used in order embeddings) if self.use_abs: features = ops.abs(features) return features # tutorials/08 - Language Model # RNN Based Language Model class EncoderText(nn.Cell): def __init__(self, vocab_size, word_dim, embed_size, num_layers, use_abs=False): super(EncoderText, self).__init__() self.use_abs = use_abs self.embed_size = embed_size # word embedding self.embed = nn.Embedding(vocab_size, word_dim) # caption embedding self.rnn = nn.GRU(word_dim, embed_size, num_layers, batch_first=True) self.init_weights() def init_weights(self): # self.embed.weight.data.uniform_(-0.1, 0.1) self.embed.init_tensor = ops.uniform(self.embed.init_tensor.shape, Tensor(-0.1, mindspore.float32), Tensor(0.1, mindspore.float32), dtype=mindspore.float32) def construct(self, x, lengths): """Handles variable size captions """ # Embed word ids to vectors x = self.embed(x) # packed = pack_padded_sequence(x, lengths, batch_first=True) # Forward propagate RNN out, _ = self.rnn(x) # Reshape *final* output to (batch_size, hidden_size) # padded = pad_packed_sequence(out, batch_first=True) I = Tensor(lengths, dtype=mindspore.int64).view(-1, 1, 1) I = I.broadcast_to((x.shape[0], 1, self.embed_size)) - 1 # if torch.cuda.is_available(): # I = I.cuda() # I = I.cuda() out = ops.gather_elements(out, 1, I).squeeze(1) # normalization in the joint embedding space out = l2norm(out) # take absolute value, used by order embeddings if self.use_abs: out = ops.abs(out) return out def cosine_sim(im, s): """Cosine similarity between all the image and sentence pairs """ return im.mm(s.t()) def order_sim(im, s): """Order embeddings similarity measure $max(0, s-im)$ """ YmX = (s.unsqueeze(1).broadcast_to(s.size(0), im.size(0), s.size(1)) - im.unsqueeze(0).broadcast_to(s.size(0), im.size(0), s.size(1))) score = -YmX.clamp(min=0).pow(2).sum(2).sqrt().t() return score class ContrastiveLoss(nn.Cell): """ Compute contrastive loss """ def __init__(self, margin=0, measure=False, max_violation=False): super(ContrastiveLoss, self).__init__() self.margin = margin if measure == 'order': self.sim = order_sim else: self.sim = cosine_sim self.max_violation = max_violation def construct(self, im, s): # compute image-sentence score matrix scores = self.sim(im, s) diagonal = mindspore.numpy.diag(scores).view(im.shape[0], 1) d1 = diagonal.expand_as(scores) d2 = diagonal.t().expand_as(scores) # compare every diagonal score to scores in its column # caption retrieval cost_s = (self.margin + scores - d1).clamp(min=0) # compare every diagonal score to scores in its row # image retrieval cost_im = (self.margin + scores - d2).clamp(min=0) # clear diagonals mask = ops.eye(scores.shape[0]) > .5 I = Tensor(mask) # if torch.cuda.is_available(): # I = I.cuda() # I = I.cuda() cost_s = cost_s.masked_fill(I, 0) cost_im = cost_im.masked_fill(I, 0) # keep the maximum violating negative for each query if self.max_violation: cost_s = cost_s.max(1)[0] cost_im = cost_im.max(0)[0] return cost_s.sum() + cost_im.sum() class VSE(nn.Cell): def __init__(self, opt): super(VSE, self).__init__() self.grad_clip = opt.grad_clip self.img_enc = EncoderImage(opt.data_name, opt.img_dim, opt.embed_size, opt.finetune, opt.cnn_type, use_abs=opt.use_abs, no_imgnorm=opt.no_imgnorm) self.txt_enc = EncoderText(opt.vocab_size, opt.word_dim, opt.embed_size, opt.num_layers, use_abs=opt.use_abs) # if torch.cuda.is_available(): # self.img_enc.cuda() # self.txt_enc.cuda() # cudnn.benchmark = True # Loss and Optimizer self.criterion = ContrastiveLoss(margin=opt.margin, measure=opt.measure, max_violation=opt.max_violation) # params = list(self.txt_enc.trainable_params()) # params += list(self.img_enc.fc.trainable_params()) # if opt.finetune: # params += list(self.img_enc.cnn.trainable_params()) # self.params = params # milestone = [30, 60, 90] # learning_rates = [opt.learning_rate, 0.1 * opt.learning_rate, 0.01 * opt.learning_rate] # self.lr = nn.piecewise_constant_lr(milestone, learning_rates) # self.optimizer = nn.Adam(params, learning_rate=self.lr) self.Eiters = 0 def state_dict(self): state_dict = [self.img_enc.state_dict(), self.txt_enc.state_dict()] return state_dict def load_state_dict(self, state_dict): self.img_enc.load_state_dict(state_dict[0]) # self.txt_enc.load_state_dict(state_dict[1]) # load_param_into_net(self.img_enc, state_dict[0]) load_param_into_net(self.txt_enc, state_dict[1]) def train_start(self): """switch to train mode """ self.img_enc.train() self.txt_enc.train() def val_start(self): """switch to evaluate mode """ self.img_enc.eval() self.txt_enc.eval() def forward_emb(self, images, captions, lengths): """Compute the image and caption embeddings """ # Set mini-batch dataset images = mindspore.Tensor(images) captions = mindspore.Tensor(captions,dtype=mindspore.int64) # if torch.cuda.is_available(): # images = images.cuda() # captions = captions.cuda() # images = images.cuda() # captions = captions.cuda() # Forward img_emb = self.img_enc(images) cap_emb = self.txt_enc(captions, lengths) return img_emb, cap_emb def forward_loss(self, img_emb, cap_emb, **kwargs): """Compute the loss given pairs of image and caption embeddings """ loss = self.criterion(img_emb, cap_emb) # self.logger.update('Le', loss.item(), img_emb.size(0)) return loss def construct(self, images, captions, lengths, ids=None): """One training step given images and captions. """ self.Eiters += 1 # self.logger.update('Eit', self.Eiters) # self.logger.update('lr', self.optimizer.param_groups[0]['lr']) # compute the embeddings img_emb, cap_emb = self.forward_emb(images, captions, lengths) # measure accuracy and record loss loss = self.forward_loss(img_emb, cap_emb) # if self.grad_clip > 0: # grad_fn = mindspore.grad(self.forward_loss, grad_position=None, weights=self.params, has_aux=True) # grads, out2 = grad_fn(mindspore.ops.unsqueeze(Tensor(img_emb), dim=0), Tensor(cap_emb)) # ops.clip_by_norm(grads, self.grad_clip) return loss ```
### 问题描述 在这种条件下:context.set_context(mode=mindspore.PYNATIVE_MODE, device_target="GPU", enable_graph_kernel=True)报错: 可以运行一会儿,然后报错 终端显示报错信息: /home/lx/miniconda3/envs/ms2.2/bin/python /media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py Namespace(batch_size=4, cnn_type='vgg19', crop_size=224, data_name='f30k', data_path='data/archive', embed_size=1024, finetune=False, grad_clip=2.0, img_dim=4096, learning_rate=0.0002, log_step=10, logger_name='runs/runX', lr_update=15, margin=0.2, max_violation=False, measure='cosine', no_imgnorm=False, num_epochs=30, num_layers=1, reset_train=False, resume='', use_abs=False, use_restval=False, val_step=500, vocab_path='./vocab/', word_dim=300, workers=2) train======================= val======================= => using pre-trained model 'vgg19' iter = 0 of 36250 completed, loss = 4.8170185, lr=0.0002, time=0.6785878209921066seconds iter = 1 of 36250 completed, loss = 4.792597, lr=0.0002, time=0.1610727850056719seconds iter = 2 of 36250 completed, loss = 4.8085637, lr=0.0002, time=0.22413303097710013seconds [ERROR] KERNEL(130610,7f964a1ed740,python):2023-12-25-16:01:55.102.912 [mindspore/ccsrc/plugin/device/gpu/kernel/gpu_kernel.h:151] ResetResource] kernel must override the `ResetResource()` method when dynamic shape [ERROR] PYNATIVE(130610,7f964a1ed740,python):2023-12-25-16:01:55.107.628 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:60] operator()] Traceback (most recent call last): File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 261, in <module> main() File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 118, in main iter_loss = train_net(image, target, index, img_id) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 705, in __call__ raise err File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 701, in __call__ output = self._run_construct(args, kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 482, in _run_construct output = self.construct(*cast_inputs, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 418, in construct return self._no_sens_impl(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 434, in _no_sens_impl grads = self.grad_no_sens(self.network, self.weights)(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 388, in after_grad return grad_(fn, weights)(*args, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 121, in wrapper results = fn(*arg, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 378, in after_grad out = _pynative_executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in __call__ return self._executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/_grad_experimental/grad_nn_ops.py", line 241, in bprop dx, dhx = gru_grad_data(y, dy, dhy, w, hx, reserve, state) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 314, in __call__ return _run_op(self, self.name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 913, in _run_op stub = _pynative_executor.run_op_async(obj, op_name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1186, in run_op_async return self._executor.run_op_async(*args) RuntimeError: Output device address's size 2 is not equal output_size_list's size 4 ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/runtime/pynative/run_op_helper.cc:644 UpdateOutputDeviceInfo 进程已结束,退出代码为 139 (interrupted by signal 11:SIGSEGV) ### 环境信息 - **Hardware Environment(`Ascend`/`GPU`/`CPU`) / 硬件环境**: GPU > Please delete the backend not involved / 请删除不涉及的后端: > /device GPU - **Software Environment / 软件环境 (Mandatory / 必填)**: -- MindSpore version (e.g., 2.0.0) :ms2.2 -- Python version (e.g., Python 3.7.5) :python3.7 -- OS platform and distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20 -- GCC/Compiler version (if compiled from source): - **Execute Mode / 执行模式 (Mandatory / 必填)(`PyNative`/`Graph`)**: PyNative > Please delete the mode not involved / 请删除不涉及的模式: > /mode pynative ### 关联用例 无 ### 重现步骤 python train.py 直接运行就可以 ### 预期结果 预期能够正常运行 ### 日志/截图 终端显示报错信息: /home/lx/miniconda3/envs/ms2.2/bin/python /media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py Namespace(batch_size=4, cnn_type='vgg19', crop_size=224, data_name='f30k', data_path='data/archive', embed_size=1024, finetune=False, grad_clip=2.0, img_dim=4096, learning_rate=0.0002, log_step=10, logger_name='runs/runX', lr_update=15, margin=0.2, max_violation=False, measure='cosine', no_imgnorm=False, num_epochs=30, num_layers=1, reset_train=False, resume='', use_abs=False, use_restval=False, val_step=500, vocab_path='./vocab/', word_dim=300, workers=2) train======================= val======================= => using pre-trained model 'vgg19' iter = 0 of 36250 completed, loss = 4.8170185, lr=0.0002, time=0.6785878209921066seconds iter = 1 of 36250 completed, loss = 4.792597, lr=0.0002, time=0.1610727850056719seconds iter = 2 of 36250 completed, loss = 4.8085637, lr=0.0002, time=0.22413303097710013seconds [ERROR] KERNEL(130610,7f964a1ed740,python):2023-12-25-16:01:55.102.912 [mindspore/ccsrc/plugin/device/gpu/kernel/gpu_kernel.h:151] ResetResource] kernel must override the `ResetResource()` method when dynamic shape [ERROR] PYNATIVE(130610,7f964a1ed740,python):2023-12-25-16:01:55.107.628 [mindspore/ccsrc/pipeline/pynative/pynative_execute.cc:60] operator()] Traceback (most recent call last): File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 261, in <module> main() File "/media/lx/Ubuntu_SSD/LRJ/vsepp-python3/train.py", line 118, in main iter_loss = train_net(image, target, index, img_id) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 705, in __call__ raise err File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 701, in __call__ output = self._run_construct(args, kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/cell.py", line 482, in _run_construct output = self.construct(*cast_inputs, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 418, in construct return self._no_sens_impl(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/nn/wrap/cell_wrapper.py", line 434, in _no_sens_impl grads = self.grad_no_sens(self.network, self.weights)(*inputs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 388, in after_grad return grad_(fn, weights)(*args, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 121, in wrapper results = fn(*arg, **kwargs) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/composite/base.py", line 378, in after_grad out = _pynative_executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1147, in __call__ return self._executor() File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/_grad_experimental/grad_nn_ops.py", line 241, in bprop dx, dhx = gru_grad_data(y, dy, dhy, w, hx, reserve, state) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 314, in __call__ return _run_op(self, self.name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/ops/primitive.py", line 913, in _run_op stub = _pynative_executor.run_op_async(obj, op_name, args) File "/home/lx/miniconda3/envs/ms2.2/lib/python3.7/site-packages/mindspore/common/api.py", line 1186, in run_op_async return self._executor.run_op_async(*args) RuntimeError: Output device address's size 2 is not equal output_size_list's size 4 ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/runtime/pynative/run_op_helper.cc:644 UpdateOutputDeviceInfo 进程已结束,退出代码为 139 (interrupted by signal 11:SIGSEGV)   ### 备注 这是model.py ``` import mindspore import mindspore.nn as nn import numpy as np from mindspore import Tensor from mindspore import ops, load_param_into_net from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence from models import model_vgg19 def l2norm(X): # pdb.set_trace() norm = ops.pow(X, 2).sum(axis=1).sqrt() X = ops.div(X, norm.unsqueeze(1)) return X def EncoderImage(data_name, img_dim, embed_size, finetune=False, cnn_type='vgg19', use_abs=False, no_imgnorm=False): """A wrapper to image encoders. Chooses between an encoder that uses precomputed image features, `EncoderImagePrecomp`, or an encoder that computes image features on the fly `EncoderImageFull`. """ img_enc = EncoderImageFull( embed_size, finetune, cnn_type, use_abs, no_imgnorm) return img_enc # tutorials/09 - Image Captioning class EncoderImageFull(nn.Cell): def __init__(self, embed_size, finetune=False, cnn_type='vgg19', use_abs=False, no_imgnorm=False): """Load pretrained VGG19 and replace top fc layer.""" super(EncoderImageFull, self).__init__() self.embed_size = embed_size self.no_imgnorm = no_imgnorm self.use_abs = use_abs # Load a pre-trained model self.cnn = self.get_cnn(cnn_type, True) # For efficient memory usage. for param in self.cnn.get_parameters(): param.requires_grad = finetune # Replace the last fully connected layer of CNN with a new one if cnn_type.startswith('vgg'): self.cnn.classifier self.fc = nn.Dense(4096, embed_size) self.cnn.classifier = nn.SequentialCell( *list(self.cnn.classifier)[:-1]) elif cnn_type.startswith('resnet'): self.fc = nn.Dense(self.cnn.module.fc.in_features, embed_size) self.cnn.module.fc = nn.SequentialCell() self.init_weights() def get_cnn(self, arch, pretrained): """Load a pretrained CNN and parallelize over GPUs """ if pretrained: print("=> using pre-trained model '{}'".format(arch)) model = model_vgg19.vgg19(pretrained=True) else: print("=> creating model '{}'".format(arch)) model = model_vgg19.vgg19(pretrained=False) # if arch.startswith('alexnet') or arch.startswith('vgg'): # model.features = nn.DataParallel(model.features) # else: # model = nn.DataParallel(model) # if torch.cuda.is_available(): # model.cuda() return model def load_state_dict(self, state_dict): """ Handle the models saved before commit pytorch/vision@989d52a """ if 'cnn.classifier.1.weight' in state_dict: state_dict['cnn.classifier.0.weight'] = state_dict[ 'cnn.classifier.1.weight'] del state_dict['cnn.classifier.1.weight'] state_dict['cnn.classifier.0.bias'] = state_dict[ 'cnn.classifier.1.bias'] del state_dict['cnn.classifier.1.bias'] state_dict['cnn.classifier.3.weight'] = state_dict[ 'cnn.classifier.4.weight'] del state_dict['cnn.classifier.4.weight'] state_dict['cnn.classifier.3.bias'] = state_dict[ 'cnn.classifier.4.bias'] del state_dict['cnn.classifier.4.bias'] # super(EncoderImageFull, self).load_state_dict(state_dict) load_param_into_net(super(EncoderImageFull, self), state_dict[0]) def init_weights(self): """Xavier initialization for the fully connected layer """ r = np.sqrt(6.) / np.sqrt(self.fc.in_channels + self.fc.out_channels) self.fc.weight.data.set_data(ops.uniform(self.fc.weight.data.shape, Tensor(-r, mindspore.float32), Tensor(r, mindspore.float32), dtype=mindspore.float32)) # self.fc.weight.data.uniform_(-r, r) # self.fc.bias.data.fill_(0) self.fc.bias.data.set_data(ops.full(self.fc.bias.data.shape, Tensor(0, mindspore.float32))) def construct(self, images): """Extract image feature vectors.""" features = self.cnn(images) # normalization in the image embedding space features = l2norm(features) # linear projection to the joint embedding space features = self.fc(features) # normalization in the joint embedding space if not self.no_imgnorm: features = l2norm(features) # take the absolute value of the embedding (used in order embeddings) if self.use_abs: features = ops.abs(features) return features # tutorials/08 - Language Model # RNN Based Language Model class EncoderText(nn.Cell): def __init__(self, vocab_size, word_dim, embed_size, num_layers, use_abs=False): super(EncoderText, self).__init__() self.use_abs = use_abs self.embed_size = embed_size # word embedding self.embed = nn.Embedding(vocab_size, word_dim) # caption embedding self.rnn = nn.GRU(word_dim, embed_size, num_layers, batch_first=True) self.init_weights() def init_weights(self): # self.embed.weight.data.uniform_(-0.1, 0.1) self.embed.init_tensor = ops.uniform(self.embed.init_tensor.shape, Tensor(-0.1, mindspore.float32), Tensor(0.1, mindspore.float32), dtype=mindspore.float32) def construct(self, x, lengths): """Handles variable size captions """ # Embed word ids to vectors x = self.embed(x) # packed = pack_padded_sequence(x, lengths, batch_first=True) # Forward propagate RNN out, _ = self.rnn(x) # Reshape *final* output to (batch_size, hidden_size) # padded = pad_packed_sequence(out, batch_first=True) I = Tensor(lengths, dtype=mindspore.int64).view(-1, 1, 1) I = I.broadcast_to((x.shape[0], 1, self.embed_size)) - 1 # if torch.cuda.is_available(): # I = I.cuda() # I = I.cuda() out = ops.gather_elements(out, 1, I).squeeze(1) # normalization in the joint embedding space out = l2norm(out) # take absolute value, used by order embeddings if self.use_abs: out = ops.abs(out) return out def cosine_sim(im, s): """Cosine similarity between all the image and sentence pairs """ return im.mm(s.t()) def order_sim(im, s): """Order embeddings similarity measure $max(0, s-im)$ """ YmX = (s.unsqueeze(1).broadcast_to(s.size(0), im.size(0), s.size(1)) - im.unsqueeze(0).broadcast_to(s.size(0), im.size(0), s.size(1))) score = -YmX.clamp(min=0).pow(2).sum(2).sqrt().t() return score class ContrastiveLoss(nn.Cell): """ Compute contrastive loss """ def __init__(self, margin=0, measure=False, max_violation=False): super(ContrastiveLoss, self).__init__() self.margin = margin if measure == 'order': self.sim = order_sim else: self.sim = cosine_sim self.max_violation = max_violation def construct(self, im, s): # compute image-sentence score matrix scores = self.sim(im, s) diagonal = mindspore.numpy.diag(scores).view(im.shape[0], 1) d1 = diagonal.expand_as(scores) d2 = diagonal.t().expand_as(scores) # compare every diagonal score to scores in its column # caption retrieval cost_s = (self.margin + scores - d1).clamp(min=0) # compare every diagonal score to scores in its row # image retrieval cost_im = (self.margin + scores - d2).clamp(min=0) # clear diagonals mask = ops.eye(scores.shape[0]) > .5 I = Tensor(mask) # if torch.cuda.is_available(): # I = I.cuda() # I = I.cuda() cost_s = cost_s.masked_fill(I, 0) cost_im = cost_im.masked_fill(I, 0) # keep the maximum violating negative for each query if self.max_violation: cost_s = cost_s.max(1)[0] cost_im = cost_im.max(0)[0] return cost_s.sum() + cost_im.sum() class VSE(nn.Cell): def __init__(self, opt): super(VSE, self).__init__() self.grad_clip = opt.grad_clip self.img_enc = EncoderImage(opt.data_name, opt.img_dim, opt.embed_size, opt.finetune, opt.cnn_type, use_abs=opt.use_abs, no_imgnorm=opt.no_imgnorm) self.txt_enc = EncoderText(opt.vocab_size, opt.word_dim, opt.embed_size, opt.num_layers, use_abs=opt.use_abs) # if torch.cuda.is_available(): # self.img_enc.cuda() # self.txt_enc.cuda() # cudnn.benchmark = True # Loss and Optimizer self.criterion = ContrastiveLoss(margin=opt.margin, measure=opt.measure, max_violation=opt.max_violation) # params = list(self.txt_enc.trainable_params()) # params += list(self.img_enc.fc.trainable_params()) # if opt.finetune: # params += list(self.img_enc.cnn.trainable_params()) # self.params = params # milestone = [30, 60, 90] # learning_rates = [opt.learning_rate, 0.1 * opt.learning_rate, 0.01 * opt.learning_rate] # self.lr = nn.piecewise_constant_lr(milestone, learning_rates) # self.optimizer = nn.Adam(params, learning_rate=self.lr) self.Eiters = 0 def state_dict(self): state_dict = [self.img_enc.state_dict(), self.txt_enc.state_dict()] return state_dict def load_state_dict(self, state_dict): self.img_enc.load_state_dict(state_dict[0]) # self.txt_enc.load_state_dict(state_dict[1]) # load_param_into_net(self.img_enc, state_dict[0]) load_param_into_net(self.txt_enc, state_dict[1]) def train_start(self): """switch to train mode """ self.img_enc.train() self.txt_enc.train() def val_start(self): """switch to evaluate mode """ self.img_enc.eval() self.txt_enc.eval() def forward_emb(self, images, captions, lengths): """Compute the image and caption embeddings """ # Set mini-batch dataset images = mindspore.Tensor(images) captions = mindspore.Tensor(captions,dtype=mindspore.int64) # if torch.cuda.is_available(): # images = images.cuda() # captions = captions.cuda() # images = images.cuda() # captions = captions.cuda() # Forward img_emb = self.img_enc(images) cap_emb = self.txt_enc(captions, lengths) return img_emb, cap_emb def forward_loss(self, img_emb, cap_emb, **kwargs): """Compute the loss given pairs of image and caption embeddings """ loss = self.criterion(img_emb, cap_emb) # self.logger.update('Le', loss.item(), img_emb.size(0)) return loss def construct(self, images, captions, lengths, ids=None): """One training step given images and captions. """ self.Eiters += 1 # self.logger.update('Eit', self.Eiters) # self.logger.update('lr', self.optimizer.param_groups[0]['lr']) # compute the embeddings img_emb, cap_emb = self.forward_emb(images, captions, lengths) # measure accuracy and record loss loss = self.forward_loss(img_emb, cap_emb) # if self.grad_clip > 0: # grad_fn = mindspore.grad(self.forward_loss, grad_position=None, weights=self.params, has_aux=True) # grads, out2 = grad_fn(mindspore.ops.unsqueeze(Tensor(img_emb), dim=0), Tensor(cap_emb)) # ops.clip_by_norm(grads, self.grad_clip) return loss ```
评论 (
4
)
登录
后才可以发表评论
状态
DONE
TODO
ACCEPTED
WIP
VALIDATION
DONE
CLOSED
REJECTED
负责人
未设置
yxx
yangxixin
负责人
协作者
+负责人
+协作者
标签
pynative
未设置
项目
未立项任务
未立项任务
里程碑
未关联里程碑
未关联里程碑
Pull Requests
未关联
未关联
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
未关联
分支 (
-
)
标签 (
-
)
开始日期   -   截止日期
-
置顶选项
不置顶
置顶等级:高
置顶等级:中
置顶等级:低
优先级
不指定
严重
主要
次要
不重要
预计工期
(小时)
参与者(4)
1
https://gitee.com/mindspore/models.git
git@gitee.com:mindspore/models.git
mindspore
models
models
点此查找更多帮助
搜索帮助
Git 命令在线学习
如何在 Gitee 导入 GitHub 仓库
Git 仓库基础操作
企业版和社区版功能对比
SSH 公钥设置
如何处理代码冲突
仓库体积过大,如何减小?
如何找回被删除的仓库数据
Gitee 产品配额说明
GitHub仓库快速导入Gitee及同步更新
什么是 Release(发行版)
将 PHP 项目自动发布到 packagist.org
仓库举报
回到顶部
登录提示
该操作需登录 Gitee 帐号,请先登录后再操作。
立即登录
没有帐号,去注册