源自github用户kuke:
We run the mkl_dnn benchmark test in Docker container on the laptop Dell XPS 15 , and find that:
The batch size of training samples is limited by the memory (8G) of the laptop, up to 48, which is smaller than the minimum batch size of the benchmark test on server.
When batch size is too small (<=8), the training cost will yield nan
. Maybe need to modify the test script to avoid such nan
cost.
源自github用户tensor-tang:
Thanks kuke
I highly recommend expand the memory for benchmark, since 8G is even smaller than some GPU(12G memory).
And for some typologies which are very deep like resnet, we can only choose very small batchsize.
It can not show the best performance of MKL-DNN or MKLML.
When change batchsize to smaller, we should change the learning rate smaller too, since vgg do not have batch norm layer, it's very easy to nan
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
since 8G is even smaller than some GPU(12G memory)
我们选择内存不大的笔记本和台式机来做性能测试,主要原因是:笔记本和台式机属于民用市场:考虑到大多数学习场景,GPU都是过剩的,而且学习过程如果用GPU对学生来说也是不小的升级成本。如果MKLDNN能在当前资源下可以跑通大多数模型,对初级用户可能是个特别大的福音。
since vgg do not have batch norm layer, it's very easy to nan
如果在小的bs情况下,vgg容易出现NAN,是否可以考虑测试别的网络?
源自github用户tensor-tang:
减小learning rate之后就不会nan了。
登录 后才可以发表评论