代码拉取完成,页面将自动刷新
一、问题现象(附报错日志上下文):
B类给定性能规格为:按per step time:0.135779看护;换算结果为batch_size/time=5000/0.135779≈36825
模型在 arm+ubuntu环境 训练过程中的 性能不达标 ,换算结果为batch_size×12/Average(cost)=5000×12/1.9≈31579
训练结果日志如下:
{'dataset': 'choose which dataset', 'datapath': 'minddata path', 'Ks': 'top K', 'workers': 'number of process to generate data', 'ckptpath': 'checkpoint path', 'epsilon': 'optimizer parameter', 'learning_rate': 'learning rate', 'l2': 'l2 coefficient', 'activation': "activation function, choices in ['relu', 'tanh'].", 'neighbor_dropout': 'dropout ratio for different aggregation layer', 'log_name': 'log name', 'num_epoch': 'epoch sizes for training', 'input_dim': 'user and item embedding dimension, choices in [64, 128]', 'batch_pairs': 'batch size', 'eval_interval': 'evaluation interval', 'num_neg': 'negative sampling rate ', 'raw_neighs': 'num of sampling neighbors in raw graph', 'gnew_neighs': 'num of sampling neighbors in sample graph', 'embedded_dimension': 'output embedding dim', 'dist_reg': 'distance loss coefficient', 'device_target': "device target, choices in ['Ascend', GPU]", 'device_id': 'Device id', 'ckpt_file': 'Checkpoint file path.', 'file_name': 'output file name.', 'file_format': "file format, choices in ['AIR', 'ONNX', 'MINDIR']", 'row_neighs': 'num of sampling neighbors in raw graph'}
[WARNING] ME(109451:281473040572432,_GeneratorWorkerMp-1):2022-01-12-06:24:46.311.233 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109455:281473040572432,_GeneratorWorkerMp-5):2022-01-12-06:24:46.312.081 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109456:281473040572432,_GeneratorWorkerMp-6):2022-01-12-06:24:46.315.739 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109452:281473040572432,_GeneratorWorkerMp-2):2022-01-12-06:24:46.315.900 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109454:281473040572432,_GeneratorWorkerMp-4):2022-01-12-06:24:46.316.132 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109458:281473040572432,_GeneratorWorkerMp-8):2022-01-12-06:24:46.317.834 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109457:281473040572432,_GeneratorWorkerMp-7):2022-01-12-06:24:46.318.209 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] ME(109453:281473040572432,_GeneratorWorkerMp-3):2022-01-12-06:24:46.318.207 [mindspore/dataset/engine/queue.py:106] Using shared memory queue, but rowsize is larger than allocated memory max_rowsize 6291456 current rowwize 13200000
[WARNING] DEVICE(109290,fffe377fe1f0,python):2022-01-12-06:24:59.617.830 [mindspore/ccsrc/runtime/device/ascend/kernel_select_ascend.cc:284] TagRaiseReduce] node:[DropoutGenMask]reduce precision from int64 to int32
[WARNING] DEVICE(109290,fffe377fe1f0,python):2022-01-12-06:24:59.618.135 [mindspore/ccsrc/runtime/device/ascend/kernel_select_ascend.cc:284] TagRaiseReduce] node:[DropoutGenMask]reduce precision from int64 to int32
[WARNING] DEVICE(109290,fffe377fe1f0,python):2022-01-12-06:24:59.618.267 [mindspore/ccsrc/runtime/device/ascend/kernel_select_ascend.cc:284] TagRaiseReduce] node:[DropoutGenMask]reduce precision from int64 to int32
[WARNING] DEVICE(109290,fffe377fe1f0,python):2022-01-12-06:24:59.618.401 [mindspore/ccsrc/runtime/device/ascend/kernel_select_ascend.cc:284] TagRaiseReduce] node:[DropoutGenMask]reduce precision from int64 to int32
[WARNING] DEVICE(109290,fffe377fe1f0,python):2022-01-12-06:24:59.618.518 [mindspore/ccsrc/runtime/device/ascend/kernel_select_ascend.cc:284] TagRaiseReduce] node:[DropoutGenMask]reduce precision from int64 to int32
[WARNING] DEVICE(109290,fffe377fe1f0,python):2022-01-12-06:24:59.618.663 [mindspore/ccsrc/runtime/device/ascend/kernel_select_ascend.cc:284] TagRaiseReduce] node:[DropoutGenMask]reduce precision from int64 to int32
[WARNING] SESSION(109290,fffe377fe1f0,python):2022-01-12-06:25:02.078.296 [mindspore/ccsrc/backend/session/ascend_session.cc:1377] SelectKernel] There are 2 node/nodes used raise precision to selected the kernel!
[WARNING] SESSION(109290,fffe377fe1f0,python):2022-01-12-06:25:02.078.394 [mindspore/ccsrc/backend/session/ascend_session.cc:1381] SelectKernel] There are 6 node/nodes used reduce precision to selected the kernel!
Epoch 001 iter 12 loss 34696.863, cost:52.4288
Epoch 002 iter 12 loss 34288.637, cost:1.3063
Epoch 003 iter 12 loss 30985.635, cost:1.2516
Epoch 004 iter 12 loss 22491.91, cost:1.2746
Epoch 005 iter 12 loss 21087.371, cost:1.9760
Epoch 006 iter 12 loss 19150.377, cost:1.9044
Epoch 007 iter 12 loss 18561.326, cost:2.2398
Epoch 008 iter 12 loss 18068.207, cost:1.9947
Epoch 009 iter 12 loss 16396.041, cost:2.0078
Epoch 010 iter 12 loss 15766.55, cost:1.7990
Epoch 011 iter 12 loss 14308.345, cost:1.9249
...
Epoch 595 iter 12 loss 3645.4429, cost:1.9038
Epoch 596 iter 12 loss 3667.8376, cost:1.9782
Epoch 597 iter 12 loss 3667.6663, cost:1.9115
Epoch 598 iter 12 loss 3664.8555, cost:1.9984
Epoch 599 iter 12 loss 3681.8513, cost:1.9173
Epoch 600 iter 12 loss 3674.1487, cost:2.0402
二、软件版本:
-- CANN 版本: (CANN 5.0.2 B058)
--Python 版本:Python 3.7.5
--操作系统版本 (e.g., Ubuntu 18.04):Ubuntu 18.04
三、测试步骤:
1、按照readme.md文件操作指导开展网络模型1p训练;
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
r1.3自测 @wangxingang
登录 后才可以发表评论