Some findings about testing MKL_DNN on laptop · Issue #I3DERM · PaddlePaddle/Paddle - Gitee.com

开源项目 > 人工智能 > 机器学习/深度学习 &&

/ 详情

已完成

创建于

2021-03-25 23:47

源自github用户kuke:
We run the mkl_dnn benchmark test in Docker container on the laptop Dell XPS 15 , and find that:

The batch size of training samples is limited by the memory (8G) of the laptop, up to 48, which is smaller than the minimum batch size of the benchmark test on server.
When batch size is too small (<=8), the training cost will yield nan. Maybe need to modify the test script to avoid such nan cost.

创建了任务

将关联仓库设置为PaddlePaddle/Paddle

展开全部操作日志

源自github用户tensor-tang:
Thanks kuke

I highly recommend expand the memory for benchmark, since 8G is even smaller than some GPU(12G memory).
And for some typologies which are very deep like resnet, we can only choose very small batchsize.
It can not show the best performance of MKL-DNN or MKLML.
When change batchsize to smaller, we should change the learning rate smaller too, since vgg do not have batch norm layer, it's very easy to nan

源自github用户luotao1:

since 8G is even smaller than some GPU(12G memory)

我们选择内存不大的笔记本和台式机来做性能测试，主要原因是：笔记本和台式机属于民用市场：考虑到大多数学习场景，GPU都是过剩的，而且学习过程如果用GPU对学生来说也是不小的升级成本。如果MKLDNN能在当前资源下可以跑通大多数模型，对初级用户可能是个特别大的福音。

since vgg do not have batch norm layer, it's very easy to nan

如果在小的bs情况下，vgg容易出现NAN，是否可以考虑测试别的网络？

源自github用户tensor-tang:
减小learning rate之后就不会nan了。

将任务状态从 待办的 修改为已完成

登录后才可以发表评论

状态

负责人

里程碑

Pull Requests

关联的 Pull Requests 被合并后可能会关闭此 issue

分支

开始日期 - 截止日期

-

置顶选项

优先级

参与者（1）

Python

1

https://gitee.com/paddlepaddle/Paddle.git

git@gitee.com:paddlepaddle/Paddle.git

paddlepaddle

Paddle

Paddle