name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
transformer-dynamic-shape网络pynative模式在910环境8p训练,训练日志有error日志
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device Ascend910A
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
Run包:Milan_C13/20231024/
Mindspore版本:r2.3_20231108021522_753c9f688
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
用例仓地址: solution_test/cases/02network/02nlp/transformer_dynamic/pynative
用例:test_ms_transformer_dynamic_shape_wmt_englisth_german_pynative_train_check_func_910_8p_0001.py
1、get code from modelinternal
2、cd modelinternal/internal_case/transformer_dynamic
3、set pynative mode in train.py, set epoch_size=3 in default_config_large.yaml
4、sh scripts/run_distribute_train_ascend.sh WMT-English-German-Dynamic/database ./default_config_large.yaml
5、验证网络是否训练成功,检查./run_distribute_train/ckpt_0/transformer-3_17573.ckpt是否生成,并配置model_file
6、python eval.py --device_target=Ascend --config_path=./default_config_large.yaml
7、sh scripts/process_output.sh ./WMT-English-German/data/newstest2014.tok.de ./output_eval.txt ./WMT-English-German/data/vocab.bpe.32000
8、perl multi-bleu.perl ./WMT-English-German/data/newstest2014.tok.de.forbleu < ./output_eval.txt.forbleu
9、验证推理精度是否能达到23
网络训练成功
[ERROR] FE(127810,python):2023-11-09-01:32:05.521.100 [get_attr_by_type.cc:73]135155 GetIntAttrValue:"[SubGraphOpt][PreCompileOp] Op [Cast] get int attr [dst_type] value failed."
[ERROR] FE(127810,python):2023-11-09-01:32:05.521.143 [tbe_info_assembler.cc:1578]135155 AssembleTbeInfo:"[SubGraphOpt][PreCompileOp][AssembleInput][Op Cast, type Cast]: failed to feedAttrsToTbeOpInfo."
走给蔡福壁
Please assign maintainer to check this issue.
请为此issue分配处理人。
@zhongjicheng
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表:
该问题影响了50个用例,建议升级严重单。
test_ms_bert_large_boost_en_wiki_pynative_train_check_func_910_1p_0001
test_ms_bert_large_cn_news_pynative_train_check_perf_910_1p_0001
test_ms_dbnet_mobilenetv3_icdar2015_pynative_train_check_perf_910_gpu_1p_0001
test_ms_deeplabv3_vocaug_s16_pynative_train_check_loss_910_8p_0001
test_ms_inceptionv3_imagenet_pynative_train_check_loss_910_8p_0001
test_ms_inceptionv4_imagenet_pynative_train_check_loss_910_8p_0001
test_ms_pynative_dbnet_r18_icdar2015_ascend_check_fps_1p_0001
test_ms_pynative_dbnet_r50_icdar2015_ascend_check_fps_1p_0003
test_ms_resnet101_imagenet_pynative_train_check_loss_910_8p_0001
test_ms_resnet18_cifar10_pynative_train_check_perf_910_1p_0001
test_ms_resnet18_cifar10_pynative_train_infer_910_8p_0001
test_ms_resnet18_imagenet_pynative_train_check_loss_910_8p_0001
test_ms_resnet50_cifar10_pynative_train_check_perf_910_1p_0001
test_ms_resnet50_imagenet_pynative_train_check_loss_910_8p_0001
test_ms_resnext50_imagenet2012_pynative_train_check_loss_910_8p_0001
test_ms_retinanet_coco2017_pynative_train_check_loss_910_8p_0001
test_ms_se_resnet50_imagenet_pynative_train_check_loss_910_8p_0001
test_ms_ssd_resnet50_fpn_coco2017_pynative_train_check_loss_910_8p_0001
test_ms_ssd_vgg16_coco2017_pynative_train_check_loss_910_8p_0001
test_ms_unet2d_pengcheng_luna_ascend_train_infer_1p_0003
test_ms_usability_benchmark_pynative_ascend_alexnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_ctpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_deepfm_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_deeplabv3_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_deeptext_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_dqn_time_perf_1p_0001
test_ms_usability_benchmark_pynative_ascend_facerecognition_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_fasterrcnn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_inceptionv4_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_lstm_sentimentnet_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_mobilenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_mobilenetv2_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_resnet_101_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_resnet_18_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_resnet_50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_resnext50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_retinanet_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_se_resnet50_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_shufflenetv1_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_ssd_mobilenetv1_fpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_ssd_resnet50_fpn_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_ssd_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_ssd_vgg16_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_transformer_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_u_net3d_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_unet2d_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_vgg16_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_wide_deep_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_ascend_yolov4_time_perf_loss_1p_0001
test_ms_usability_benchmark_pynative_unet_plus_time_perf_loss_1p_0001
问题影响用例面积大,优先级提升到严重
根因:海思cast算子需要的属性类型变为int,ms需要适配;
解决方案:将ge adapter里的属性类型强转为int;
#Appearance & Root Cause
海思cast算子需要的属性类型变为int,ms需要适配;
#Fix Solution
将ge adapter里的属性类型强转为int;
relevant pr:
https://e.gitee.com/mind_spore/repos/mindspore/mindspore/pulls/61849
Self-test Report & DT Review
是否需要补充ST/UT:否
回归版本:
runpkg_version:Milan_C17/20240124
mindspore:r2.3_20240124223721_b30b274f4143
回归步骤:参考issue复现步骤
基本功能:网络pynative跑不起来
测试结论:问题由issue #I8Z8LW:[ST][doc][教程][910A][pynative][ssd/generative_diffusion/vision_transformer]等教程在910上pynative模式报错ValueNode<ValueTuple> ()anf_node is not CNode 跟踪
回归人员:zhongjicheng
回归时间: 2024-01-27
登录 后才可以发表评论