name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
[resnext50/vgg16][GPU ]网络训练告警日志过多导致用例失败
模型仓地址:https://gitee.com/mindspore/models/tree/master/official/cv/DeepLabV3P/scripts
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device GPU
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
mindspore版本:2.2.10.20231125,commit_id = '[sha1]:1e6bd3d7,[branch]:(HEAD,origin/r2.2.10,r2.2.10)'
run:Milan_C15/20231122
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
test_ms_alexnet_cifar10_train_infer_910_gpu_1p_0001
test_ms_asr_dynamic_shape_an4_32_train_check_loss_gpu_1p_0001
test_ms_bert_finetune_ner_softmax_cluener_train_infer_0002
test_ms_bert_large_cn_news_pynative_train_check_loss_gpu_8p_0001
test_ms_bert_large_cn_news_pynative_train_check_perf_gpu_1p_0001
test_ms_cyclegan_cityscapes_ascend_gpu_cpu_train_check_loss_0002
test_ms_dbnet_r18_icdar2015_gpu_check_loss_8p_0005
test_ms_dbnet_r50_icdar2015_gpu_check_loss_8p_0011
test_ms_deeplabv3_vocaug_cpu_train_check_loss_0001
test_ms_deeplabv3plus_s16_gpu_check_fps_1p_0001
test_ms_deeplabv3plus_s16_gpu_check_loss_8p_0003
test_ms_deeplabv3plus_s16_pynative_train_check_loss_gpu_8p_0003
test_ms_deeplabv3plus_s8_gpu_check_fps_1p_0002
test_ms_deeplabv3plus_s8_gpu_check_loss_8p_0004
test_ms_deeplabv3plus_s8_pynative_train_check_loss_gpu_8p_0004
test_ms_dqn_train_infer_0001
test_ms_efficientnet_cifar10_cpu_train_check_loss_daily_0001
test_ms_efficientnet_imagenet2012_pynative_train_check_loss_gpu_8p_0001
test_ms_efficientnet_imagenet2012_train_check_loss_gpu_8p_0001
test_ms_efficientnetb3_imagenet2012_gpu_check_fps_1p_0001
test_ms_inceptionv3_imagenet2012_pynative_train_check_loss_gpu_8p_0001
test_ms_inceptionv3_imagenet2012_train_check_fps_gpu_1p_0004
test_ms_inceptionv3_imagenet2012_train_check_loss_gpu_8p_0005
test_ms_lstm_aclimdb_train_infer_gpu_1p_0001
test_ms_mobilenetv1_cifar10_cpu_train_check_loss_0001
test_ms_mobilenetv1_imagenet_gpu_check_fps_1p_0004
test_ms_mobilenetv1_imagenet_gpu_train_check_loss_4p_0003
test_ms_mobilenetv2_garbage_cpu_train_infer_0003
test_ms_mobilenetv2_imagenet2012_train_check_loss_gpu_8p_0005
test_ms_mobilenetv3_cpu_check_fps_1p_0001
test_ms_mobilenetv3_imagenet2012_gpu_check_loss_8p_0003
test_ms_pangu_alpha_gpu_train_8p_0001
test_ms_ppo_train_infer_200_episode_gpu_cpu_1p_0001
test_ms_resnet101_imagenet_pynative_train_check_loss_gpu_8p_0001
test_ms_resnet18_cifar10_pynative_train_check_perf_gpu_1p_0001
test_ms_resnet18_cifar10_pynative_train_infer_gpu_8p_0001
test_ms_resnet18_cifar10_train_check_fps_gpu_1p_0002
test_ms_resnet18_cifar10_train_infer_gpu_8p_0001
test_ms_resnet18_imagenet_pynative_train_check_loss_gpu_8p_0001
test_ms_resnet18_imagenet_train_check_pfs_gpu_1p_0002
test_ms_resnet50_benchmark_imagenet_pynative_train_check_perf_gpu_8p_0001
test_ms_resnet50_cifar10_pynative_train_check_perf_gpu_1p_0001
test_ms_resnet50_cifar10_pynative_train_infer_gpu_8p_0001
test_ms_resnet50_cifar10_train_check_loss_cpu_0001
test_ms_resnet50_cifar10_train_check_loss_gpu_8p_0002
test_ms_resnet50_cifar10_train_check_perf_gpu_1p_0003
test_ms_resnet50_imagenet_pynative_train_check_loss_gpu_8p_0001
test_ms_resnet50_imagenet_train_check_loss_gpu_8p_0002
test_ms_resnet50_imagenet_train_check_perf_gpu_1p_0003
test_ms_resnext50_imagenet2012_pynative_train_check_loss_gpu_8p_0001
test_ms_resnext50_imagenet2012_train_check_loss_gpu_8p_0001
test_ms_retinaface_resnet50_widerface_pynative_train_check_loss_gpu_4p_0001
test_ms_retinaface_resnet50_widerface_train_check_loss_gpu_8p_0001
test_ms_retinaface_resnet50_widerface_train_check_perf_gpu_1p_0001
test_ms_retinaface_resnet50_widerface_train_no_resume_check_loss_gpu_8p_0001
test_ms_shufflenetv1_gpu_check_fps_1p_0001
test_ms_shufflenetv2_imagenet2012_pynative_train_check_loss_gpu_8p_0001
test_ms_shufflenetv2_imagenet2012_train_check_fps_gpu_1p_0004
test_ms_shufflenetv2_imagenet2012_train_check_loss_gpu_8p_0005
test_ms_ssd_helmet_cpu_train_check_loss_0001
test_ms_ssd_mobilenetv1_fpn_coco2017_train_check_fps_gpu_0002
test_ms_ssd_mobilenetv1_fpn_coco2017_train_check_loss_gpu_8p_0003
test_ms_ssd_resnet50_fpn_coco2017_pynative_train_check_loss_gpu_8p_0001
test_ms_ssd_resnet50_fpn_coco2017_train_check_fps_gpu_0002
test_ms_ssd_resnet50_fpn_coco2017_train_check_loss_gpu_8p_0003
test_ms_ssd_vgg16_coco2017_pynative_train_check_loss_gpu_8p_0001
test_ms_ssd_vgg16_coco2017_train_check_fps_gpu_0002
test_ms_ssd_vgg16_coco2017_train_check_loss_gpu_8p_0003
test_ms_unet_plus_gpu_train_infer_1p_0001
test_ms_unet_plus_gpu_train_infer_8p_0002
test_ms_usability_benchmark_graph_cpu_cyclegan_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_deeplabv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_efficientnet_b0_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_inceptionv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_inceptionv4_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_mobilenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_mobilenetv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_resnet_50_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_cpu_ssd_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_alexnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_ctpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_deeplabv3_plus_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_dqn_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_efficientnet_b0_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_efficientnet_b3_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_inceptionv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_lstm_sentimentnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_mobilenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_mobilenetv2_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_mobilenetv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_ppo_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_resnet_18_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_resnet_50_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_resnext50_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_retinaface_resnet50_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_shufflenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_shufflenetv2_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_ssd_mobilenetv1_fpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_ssd_resnet50_fpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_ssd_vgg16_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_vgg16_time_perf_loss_1p_0001
test_ms_usability_benchmark_graph_gpu_yolov5_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_cyclegan_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_deeplabv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_efficientnet_b0_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_inceptionv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_inceptionv4_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_lstm_sentimentnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_mobilenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_mobilenetv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_resnet_50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_cpu_ssd_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_alexnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_ctpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_cyclegan_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_deeplabv3_plus_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_dqn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_efficientnet_b0_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_efficientnet_b3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_facerecognition_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_fasterrcnn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_inceptionv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_lstm_sentimentnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_mobilenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_mobilenetv2_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_mobilenetv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_ppo_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_resnet_101_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_resnet_18_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_resnet_50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_resnext50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_retinaface_resnet50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_shufflenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_shufflenetv2_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_ssd_mobilenetv1_fpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_ssd_resnet50_fpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_ssd_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_ssd_vgg16_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_transformer_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_vgg16_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_yolov3_darknet53_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_gpu_yolov5_time_perf_loss_1p_0001
test_ms_vgg16_imagenet_check_loss_gpu_8p_0001
test_ms_vgg16_imagenet_perf_gpu_1p_0001
test_ms_vgg16_imagenet_pynative_train_check_loss_gpu_8p_0001
test_ms_wide_deep_criteo_host_device_mix_train_infer_gpu_8p_0001
test_ms_wide_deep_criteo_ps_train_check_perf_910_gpu_1p_0001
1.get code from models
2.cd models/official/cv/DeepLabV3P/script
3.bash run_distribute_train_s16_r1_gpu.sh /PATH/TO/MINDRECORD_NAME /PATH/TO/PRETRAIN_MODEL
4.验证网络是否训练成功,性能达标,无过多告警。
训练成功,性能达标,无过多告警。
走给项敏珊
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表:
当前代码里直接传的是空串,mindspore/ccsrc/kernel/format_utils.cc
中的函数Format GetFormatFromStrToEnum(const std::string &format_str)
没有对空串做特殊处理
当前产生warnning的调用栈为
0. mindspore/ccsrc/kernel/format_utils.cc `Format GetFormatFromStrToEnum(const std::string &format_str)`
1. mindspore/ccsrc/kernel/kernel.cc `KernelTensor::KernelTensor(void *device_ptr, size_t size, const std::string &format, TypeId dtype_id, ...`
2. mindspore/ccsrc/runtime/device/device_address_utils.cc `void DeviceAddressUtils::CreateKernelWorkspaceDeviceAddress(const DeviceContext *device_context, const KernelGraphPtr &graph)` 中调用 `auto kernel_tensor = std::make_shared<kernel::KernelTensor>(nullptr, workspace_sizes[i], "", kTypeUnknown, ShapeVector(), ...`
原因如上,已将日志修改为debug
!62616:修复format日志过多问题
回归时间:2023/12/26
回归步骤:参考issue步骤
回归版本:2.3
回归结果:
回归结论:回归通过
登录 后才可以发表评论