Unet3d使用LUNA16数据无法训练成功

环境信息：CANN6.3.RC3 python3.9.2 Euler2.8

代码：https://gitee.com/mindspore/models/tree/master/official/cv/Unet3d

测试过程：
数据集LUNA16
解压数据集：

root@f8482f2c30ce:/data/data# 7za e subset9.zip
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,192 CPUs LE)
Scanning the drive for archives:
1 file, 6699650017 bytes (6390 MiB)
Extracting archive: subset9.zip
ERRORS:
Headers Error
--
Path = subset9.zip
Type = zip
ERRORS:
Headers Error
Physical Size = 6699650017
64-bit = +
Archives with Errors: 1
Open Errors: 1
转换脚本：
python ./src/convert_nifti.py --data_path=/data/data/list/ --output_path=.//data/data/LUNA16/train/image
数据集目录：
![输入图片说明](https://foruda.gitee.com/images/1685178274906373955/9ea5ae15_8021201.png "屏幕截图")
![输入图片说明](https://foruda.gitee.com/images/1685178296256352871/33b870a1_8021201.png "屏幕截图")

执行训练：
bash run_distribute_train.sh /home/wukong/hccl_8p_01234567.json /data/data/LUNA16/train

log:
home/wukong/hccl_8p_01234567.json
/data/data/LUNA16/train/
start training for rank 0, device 0
start training for rank 1, device 1
start training for rank 2, device 2
start training for rank 3, device 3
start training for rank 4, device 4
start training for rank 5, device 5
start training for rank 6, device 6
start training for rank 7, device 7
[root@bms-306f scripts]# tail -f train_parallel0/log.txt
{'enable_modelarts': 'Whether training on modelarts, default: False', 'enable_fp16_gpu': 'Whether training on gpu with fp16, default: False', 'data_url': 'Dataset url for obs', 'train_url': 'Training output url for obs', 'checkpoint_url': 'The location of checkpoint for obs', 'data_path': 'Dataset path for local', 'output_path': 'Training output path for local', 'load_path': 'The location of checkpoint for obs', 'device_target': 'Target device type, available: [Ascend, GPU, CPU]', 'enable_profiling': 'Whether enable profiling while training, default: False', 'num_classes': 'Class for dataset', 'batch_size': 'Batch size for training and evaluation', 'epoch_size': 'Total training epochs.', 'keep_checkpoint_max': 'keep the last keep_checkpoint_max checkpoint', 'checkpoint_path': 'The location of the checkpoint file.', 'checkpoint_file_path': 'The location of the checkpoint file.'}
{'batch_size': 1,
 'checkpoint_file_path': 'Unet3d-10-110.ckpt',
 'checkpoint_path': './checkpoint/',
 'checkpoint_url': '',
 'ckpt_file': './checkpoint/Unet3d-10-110.ckpt',
 'config_path': '/home/wukong/models/official/cv/Unet3d/scripts/train_parallel0/src/model_utils/../../default_config.yaml',
 'data_path': '/data/data/LUNA16/train/',
 'data_url': '',
 'device_id': 0,
 'device_target': 'Ascend',
 'enable_fp16_gpu': False,
 'enable_modelarts': False,
 'enable_profiling': False,
 'epoch_size': 10,
 'file_format': 'MINDIR',
 'file_name': 'unet3d',
 'in_channels': 1,
 'keep_checkpoint_max': 1,
 'load_path': '/cache/checkpoint_path/',
 'loss_scale': 256.0,
 'lower_limit': 3,
 'lr': 0.0005,
 'max_val': 1000,
 'min_val': -500,
 'num_classes': 4,
 'output_path': './output',
 'overlap': 0.25,
 'post_result_path': './result_Files',
 'pre_result_path': './preprocess_Result',
 'roi_size': [224, 224, 96],
 'run_distribute': True,
 'train_url': '',
 'upper_limit': 5,
 'warmup_ratio': 0.3,
 'warmup_step': 120}
Please check the above information for the configurations
[WARNING] HCCL_ADPT(129357,ffff95a6d010,python):2023-05-27-17:07:41.340.275 [mindspore/ccsrc/plugin/device/ascend/hal/hccl_adapter/hccl_adapter.cc:47] GenHcclOptions] The environment variable DEPLOY_MODE is not set. Now set to default value 0
[WARNING] ME(129357:281473192480784,MainProcess):2023-05-27-17:07:41.568.165 [mindspore/dataset/engine/datasets_user_defined.py:805] GeneratorDataset's num_parallel_workers: 4 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 1 or smaller.
[ERROR] MD(129357,ffff95a6d010,python):2023-05-27-17:07:41.967.817 [mindspore/ccsrc/minddata/dataset/engine/ir/datasetops/source/generator_node.cc:113] ValidateParams] GeneratorNode: data row of input source must not be 0, got: 0
Traceback (most recent call last):
  File "/home/wukong/models/official/cv/Unet3d/scripts/train_parallel0/train.py", line 96, in <module>
    train_net(data_path=config.data_path,
  File "/home/wukong/models/official/cv/Unet3d/scripts/train_parallel0/src/model_utils/moxing_adapter.py", line 104, in wrapped_func
    run_func(*args, **kwargs)
  File "/home/wukong/models/official/cv/Unet3d/scripts/train_parallel0/train.py", line 63, in train_net
    train_data_size = train_dataset.get_dataset_size()
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 1673, in get_dataset_size
    runtime_getter = self.__init_size_getter()
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 1562, in __init_size_getter
    ir_tree, api_tree = self.create_ir_tree()
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 398, in create_ir_tree
    ir_tree = dataset.parse_tree()
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 412, in parse_tree
    ir_children = [d.parse_tree() for d in self.children]
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 412, in <listcomp>
    ir_children = [d.parse_tree() for d in self.children]
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets.py", line 416, in parse_tree
    ir_node = self.parse(ir_children)
  File "/usr/local/python3.9.2/lib/python3.9/site-packages/mindspore/dataset/engine/datasets_user_defined.py", line 771, in parse
    return cde.GeneratorNode(self.prepared_source, self.column_names, self.column_types, self.source_len,
RuntimeError: Syntax error.

------------------------------------------------------------------
- Dataset Pipeline Error Message:
------------------------------------------------------------------
[ERROR] GeneratorNode: data row of input source must not be 0, got: 0.

------------------------------------------------------------------
- C++ Call Stack: (For framework developers)
------------------------------------------------------------------
mindspore/ccsrc/minddata/dataset/engine/ir/datasetops/source/generator_node.cc(113).

MindSpore/models

内容风险标识

评论 (6)

MindSpore/models .gitee-modal { width: 500px !important; }

内容风险标识