2.4K Star 8.2K Fork 4.4K

GVPMindSpore / mindspore

 / 详情

[CT][MS][Applyftrl]GPU环境下图模式偶现1d场景精度问题

DONE
Bug-Report
创建于  
2022-12-07 17:48
name about labels
Bug Report Use this template for reporting a bug kind/bug

Describe the current behavior / 问题描述 (Mandatory / 必填)

GPU环境下图模式偶现1d场景,float16类型出现精度问题

Environment / 环境信息 (Mandatory / 必填)

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:

Please delete the backend not involved / 请删除不涉及的后端:
GPU

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :
    -- Python version (e.g., Python 3.7.5) :
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):

Please delete the mode not involved / 请删除不涉及的模式:

/mode graph

Related testcase / 关联用例 (Mandatory / 必填)

test_p_applyftrl_1d_float16 __________________________________________________________________________________________
    @Level2
    @SKIP_ENV_DAVINCI_EXECUTOR(reason='issue=I5O1EN,算子在ascend端,发现必现问题,报错显示标杆期望值显示为0,但是实际是存在的')
    def test_p_applyftrl_1d_float16():
        var = Parameter(Tensor(np.random.randn(8, ).astype(np.float16)), name="var")
        accum = Parameter(Tensor(np.random.randn(8, ).astype(np.float16)), name="accum")
        linear = Parameter(Tensor(np.random.randn(8, ).astype(np.float16)), name="linear")
        grad = Tensor(np.random.randn(8, ).astype(np.float16))
        lr = Tensor(0.001).astype(np.float16)
        l1 = Tensor(0.0).astype(np.float16)
        l2 = Tensor(0.0).astype(np.float16)
        lr_power = Tensor(-0.5).astype(np.float16)
        fact = ApplyFtrlMock(attributes={'use_locking': False},
                             inputs=[var, accum, linear, grad, lr, l1, l2, lr_power])
        if os.environ['CONTEXT_DEVICE_TARGET'] == 'CPU':
            fact.forward_cmp()
        else:
>           fact.forward_cmp_1()

Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

  1. run testcase

Describe the expected behavior / 预期结果 (Mandatory / 必填)

testcase pass

Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

../share/ops/primitive/applyftrl_ops.py:160: in forward_cmp_1
    allclose_nparray(py_real, ms_real, self.loss, self.loss)
../share/utils.py:31: in allclose_nparray
    _count_unequal_element(data_expected, data_me, rtol, atol)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data_expected = array([ 0.1814  , -0.006058,  0.      ,  0.02211 , -0.1737  , -0.10864 ,
       -0.597   ,  1.598   ], dtype=float16)
data_me = array([ 0.1816  , -0.006058,  0.      ,  0.02211 , -0.1736  , -0.10864 ,
       -0.597   ,  1.601   ], dtype=float16), rtol = 0.001, atol = 0.001

    def _count_unequal_element(data_expected, data_me, rtol, atol):
        assert data_expected.shape == data_me.shape
        total_count = len(data_expected.flatten())
        error = np.abs(data_expected - data_me)
        greater = np.greater(error, atol + np.abs(data_me) * rtol)
        loss_count = np.count_nonzero(greater)
        assert (loss_count / total_count) < rtol, \
            "\ndata_expected_std:{0}\ndata_me_error:{1}\nloss:{2}". \
>               format(data_expected[greater], data_me[greater], error[greater])
E       AssertionError:
E       data_expected_std:[1.598]
E       data_me_error:[1.601]
E       loss:[0.00293]

../share/utils.py:24: AssertionError

Special notes for this issue/备注 (Optional / 选填)

评论 (6)

张辉 创建了Bug-Report
张辉 添加了
 
kind/bug
标签
张辉 添加了
 
sig/ops
标签
展开全部操作日志

Please assign maintainer to check this issue.
请为此issue分配处理人。
@张辉

Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!

ApplyFtrl算子改造前就有float16类型的,实测改造前已存在精度问题。频率为每1000次3次精度误差。
输入图片说明

ApplyFtrl 算子GPU上的历史问题。

liangchenghui 负责人liangchenghui 修改为fangwenyi
liangchenghui 负责人fangwenyi 修改为changzherui
liangchenghui 添加协作者fangwenyi
liangchenghui 里程碑设置为B-SIG-FrontEnd
changzherui 添加协作者changzherui
changzherui 负责人changzherui 修改为冯一航

最新master上的包未复现问题,待复现问题,若持续观察未复现按正常流程回归。

未复现该场景问题,目前所有用例已pass

张辉 任务状态TODO 修改为DONE
张辉 修改了描述

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(7)
6561470 liangchenghui 1584762793 8996751 cooinga 1685590095 7347217 changzherui 1584948547
Python
1
https://gitee.com/mindspore/mindspore.git
git@gitee.com:mindspore/mindspore.git
mindspore
mindspore
mindspore

搜索帮助

344bd9b3 5694891 D2dac590 5694891