1.8K Star 6.7K Fork 3K

GVPMindSpore / mindspore

 / 详情

stop_gradient算子未生效,依然计算梯度

DONE
Bug-Report
Opened this issue  
2021-06-22 11:19

如图,多任务训练场景,期望效果如下,pytorch中的实现为detath:
输入图片说明

代码片段:
mlm_loss = generate_tensor()
mlm_start = 0
mlm_end = self.mlm_batch_size * seq_length
if self.mlm_batch_size > 0:
per_mlm_loss = self.slice(per_example_loss, (mlm_start,), (mlm_end,), (1,))
mlm_weights = self.slice(label_weights, (mlm_start,), (mlm_end,), (1,))
mlm_loss = self.reduce_sum_1dim(per_mlm_loss, ()) / self.reduce_sum_1dim(mlm_weights, ())
F.stop_gradient(per_mlm_loss)

实际效果如下,加入stop_gradient的算子仍然计算了反向:
输入图片说明
输入图片说明

Comments (12)

steve createdBug-Report
steve set related repository to MindSpore/mindspore
Expand operation logs

Please add labels (comp or sig), also you can visit "https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md" to find more.
为了让问题更快得到响应,请您为该issue打上**组件(comp)或兴趣组(sig)**标签,打上标签的问题可以直接推送给责任人进行处理。更多的标签可以查看 https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md"
以组件问题为例,如果你发现问题是data组件造成的,你可以这样评论:
//comp/data
当然你也可以向data SIG组求助,可以这样写:
//comp/data
//sig/data
如果是一个简单的问题,你可以留给刚进入社区的小伙伴来回答,这时候你可以这样写:
//good-first-issue
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!

mindspore-dx-bot added
 
kind/bug
label
steve changed description
mindspore-dx-bot added
 
sig/ops
label

//comp/operator

@steve Thank you for giving the infomation in detail.

需要什么详细信息呢?现在的需求是使用类似pytorch detach的功能,我使用stop_gradient实现了,但是看起来stop_gradient并未生效。

steve changed description

需要什么详细信息呢?现在的需求是使用类似pytorch detach的功能,我使用stop_gradient实现了,但是看起来stop_gradient并未生效。

@steve 我说的是你上面ISSUE提供的信息已经比较详细,表示一下感谢!

@张清华 @liangchenghui 请帮忙看一下这个问题

/assign @YuJianfeng 贴一下之前的分析结论,然后走给并行相关责任人吧

@steve stop_gradient在图上是一个算子,不能被悬空,请参照如下用法

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.matmul = ops.MatMul()

    def construct(self, x, y):
        out1 = self.matmul(x, y)
        out2 = self.matmul(x, y)
        out2 = stop_gradient(out2)
        out = out1 + out2
        return out
YuJianfeng changed issue state from TODO to VALIDATION

@chenfei52: Gitee didn't allow you to assign to: steve9.

Choose one of following members as assignee.

  • youui
  • yingjy
  • yuximiao
  • c_34
  • jonyguo
  • kingxian
  • cristoval
  • stsuteng
  • mikef
  • liucunwei
  • guoqi1024
  • kisnwang
  • heleiwang
  • pandoublefeng
  • nsyca
  • limingqi107
  • robingrosman
  • yehenrytian
  • liangchenghui
  • zhang_xue_tong
  • zhaizhiqiang
  • gaoxiong1
  • gaocongli_hw
  • jjfeing
  • sunnybeike
  • zichun_ye
  • HilbertDavid
  • ddwsky
  • zhanghaibo5
  • wang_zi_dong
  • hangangqiang
  • jianfeichen
  • wangyue01
  • zh_qh
  • chujinjin
  • tom__chen
  • john_tzanakakis
  • wuxuejian
  • ouwenchang
  • wenkai_dist
  • lilongfei15
  • linqingke
  • hwhewei
  • ginfung
  • xu-yfei
  • zhoufeng54
  • chenyijie6
  • dylangeng
  • anyrenwei
  • lixiaohui33
  • yelihua
  • dechin
  • physicist01
  • yzotov
  • dnguyen
  • wangchengyuan
  • jpc_chenjianping
  • wilfchen
  • yuchaojie
  • zlq2020
  • oacjiewen
  • zhunaipan
  • majorzhang
  • mindspore_ci
  • xsmq
  • test-bot

In response to this:

/assign @steve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the opensourceways/test-infra repository.

验证修改后报错如下:
[ERROR] PARALLEL(167860,python3):2021-06-22-10:07:15.340.501 [mindspore/ccsrc/frontend/parallel/step_parallel.cc:2457] GetLossNodeGradOutputLayout] : The pointer[prim] is null.

chenfei_mindspore changed issue state from VALIDATION to DONE

Sign in to comment

Status
Assignees
Projects
Milestones
Pull Requests
Successfully merging a pull request will close this issue.
Branches
Planed to start   -   Planed to end
-
Top level
Priority
Duration (hours)
参与者(6)
8777557 test bot 1617846881 6560119 panza 1584156773 6521784 chenfei52 1584972569
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

Search

113223 674803ea 1850385 170725 2838fb2a 1850385