models: Models of MindSpore

Contents

IndexNet Description

Upsampling is an essential stage for most dense prediction tasks using deep convolutional neural networks (CNNs). The frequently used upsampling operators include transposed convolution, unpooling, periodic shuffling (also known as depth-to-space), and naive interpolation followed by convolution. These operators, however, are not general-purpose designs and often exhibit different behaviors in different tasks. Instead of using maxpooling and unpooling, IndexNet is based on two novel operations: indexed pooling and indexed upsampling where downsampling and upsampling are guided by learned indices. The indices are generated dynamically conditioned on the feature map and are learned using a fully convolutional network, termed IndexNet, without supervision.

Paper: Indices Matter: Learning to Index for Deep Image Matting. Hau Lu, Yutong Dai, Chunhua Shen.

Model Architecture

IndexNet bases on the UNet architecture and uses mobilenetv2 as backbone. Mobilenetv2 was chosen because it is lightweight and allows the use of higher-resolution images on the same GPU as high capacity backbones. All 2-stride convolutions were changed by 1-stride convolutions and 2-stride 2x2 max poolings after each encoding stage for downsampling, which allows the extraction of indices. If applying the IndexNet idea, max pooling and unpooling layers can be replaced with IndexedPooling and IndexedUnpooling, respectively.

Dataset

Paper uses the Adobe Image Matting dataset, but it is in close access. Thus, we use AIM-500 (Automatic Image Matting - 500) dataset, which is in open access, and anyone can download it.

Every image from AIM-500 dataset cuts out by mask and N (96 train part, 20 test part) times placed as foreground over the unique image from the COCO-2014 dataset (train part), which is used as background.

Datasets used: AIM-500, COCO-2014 (train).

	AIM-500	COCO-2014	Merged (after processing)
Dataset size	~0.35 Gb	~13.0 Gb	~86.0 Gb
Train	0.35 Gb, 3 * 500 images (mask, original, trimap)	13.0 Gb, 82783 images	84 Gb, 43200 images
Test	-	-	2 Gb, 1000 images
Data format	.png, .jpg images	.jpg images	.png images

Note: We manually split AIM-500 for the train/test parts (450/50).

Download AIM-500 dataset (3 folders: original, mask, trimap), unzip them, move folders from unzipped archives into one folder named AIM-500. Download COCO-2014 train and unzip.

The structure of the datasets will be as follows:

.
└─AIM-500      <- data_dir
  ├─mask
  │ └─***.png
  ├─original
  │ └─***.jpg
  └─trimap
    └─***.png

.
└─train2014    <- bg_dir
  └─***.jpg

Where *** is the image file name

To process dataset use the command below.

python -m data.process_dataset --data_dir /path/to/AIM-500 --bg_dir /path/to/coco/train2014

DATA_DIR - path to image matting dataset (AIM-500 folder, in this case).
BG_DIR - path to backgrounds dataset (COCO/train2014 folder, in this case).

Note: Before data processing requirements will be installed. Make sure that you have ~100 Gb free space at disk, which corresponds to --data_dir path. It can take about 20 hours to prepare dataset, depends on hardware.

During processing the data_dir structure will be automatically changed and the merged images saved into data_dir/train/merged, data_dir/validation/merged. The bg_dir will remain unchanged. Processed dataset will have the following structure:

.
└─AIM-500      <- data_dir
  ├─train
  │ ├─data.txt
  │ ├─mask
  │ ├─merged
  │ └─original
  └─validation
    ├─data.txt
    ├─mask
    ├─merged
    ├─original
    └─trimap

.
└─train2014    <- bg_dir

Environment Requirements

Hardware（GPU）
Prepare hardware environment with GPU processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Note: We use MindSpore 1.6.1 GPU, thus make sure that you install >= 1.6.1 version.

Quick Start

After installing MindSpore through the official website, you can follow the steps below for training and evaluation, in particular, before training, you need to install requirements.txt by following command pip install -r requirements.txt and download the pre-trained on ImageNet mobilenetv2 backbone.

# Run standalone training example
bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]

# Run distribute training example
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]

DEVICE_ID - process device ID.
DEVICE_NUM - number of distribute training devices.
LOGS_CKPT_DIR - path to the directory, where the training results (ckpts, logs) will be stored.
MOBILENET_CKPT - path to the pre-trained mobilenetv2 backbone (link).
DATA_DIR - path to image matting dataset (AIM-500 folder, in this case).
BG_DIR - path to backgrounds dataset (COCO/train2014 folder, in this case).

Script Description

Script and Sample Code

.
└─IndexNet
  ├─README.md
  ├─requirements.txt
  ├─data
  │ └─process_dataset.py               # data preparation script
  ├─scripts
  │ ├─run_distribute_train_gpu.sh      # launch distribute train on GPU
  │ ├─run_eval_gpu.sh                  # launch evaluation on GPU
  │ └─run_standalone_train_gpu.sh      # launch standalone train on GPU
  ├─src
  │ ├─cfg
  │ │ ├─__init__.py
  │ │ └─config.py                      # parameter parser
  │ ├─dataset.py                       # dataset script and utils
  │ ├─layers.py                        # model layers
  │ ├─model.py                         # model script
  │ ├─modules.py                       # model modules
  │ └─utils.py                         # utilities used in other scripts
  ├─default_config.yaml                # default configs
  ├─eval.py                            # evaluation script
  ├─export.py                          # export to MINDIR script
  └─train.py                           # training script

Script Parameters

# Main arguments:

# training params
batch_size: 16          # Batch size for training
epochs: 30              # Number of training epochs
learning_rate: 0.01     # Learning rate init
backbone_lr_mult: 100   # Learning rate scaling (division) for backbone params
lr_decay: 0.1           # Learning rate scaling at milestone
milestones: [20, 26]    # Milestones for learning rate scheduler
input_size: 320         # Input crop size for training

Training Process

Standalone Training

Note: For all trainings necessary to use pretrained modilenetv2 as backbone.

bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]

DEVICE_ID - process device ID.
LOGS_CKPT_DIR - path to the directory, where the training results (ckpts, logs) will be stored.
MOBILENET_CKPT - path to the pre-trained mobilenetv2 backbone (link).
DATA_DIR - path to image matting dataset (AIM-500 folder, in this case).
BG_DIR - path to backgrounds dataset (COCO/train2014 folder, in this case).

The above command will run in the background, you can view the result through the generated standalone_train.log file. After training, you can get the training loss and time logs in chosen logs dir.

The model checkpoints will be saved in [LOGS_CKPT_DIR] directory.

Distribute Training

bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]

DEVICE_NUM - number of distribute training devices.
LOGS_CKPT_DIR - path to the directory, where the training results (ckpts, logs) will be stored.
MOBILENET_CKPT - path to the pre-trained mobilenetv2 backbone (link).
DATA_DIR - path to image matting dataset (AIM-500 folder, in this case).
BG_DIR - path to backgrounds dataset (COCO/train2014 folder, in this case).

The above command will run in the background, you can view the result through the generated distribute_train.log file. After training, you can get the training loss and time logs in chosen logs dir.

The model checkpoints will be saved in [LOGS_CKPT_DIR] directory.

Evaluation Process

Evaluation

To start evaluation run the command below.

bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATA_DIR] [LOGS_DIR]

DEVICE_ID - process device ID.
CKPT_URL - path to the trained IndexNet model.
DATA_DIR - path to image matting dataset (AIM-500 folder, in this case).
LOGS_DIR - path to the directory, where the eval results (outputs, logs) will be stored.

The above python command will run in the background. Predicted masks (.png) will be stored into chosen [LOGS_DIR]. And there you can view the results through the file "eval.log".

Model Export

To export the model to mindir format, run the following command:

python export.py --ckpt_url [CKPT_URL]

CKPT_URL - path to the trained IndexNet model.

Model Description

Performance

Training Performance

Parameters	GPU (1p)	GPU (8p)
Model	IndexNet	IndexNet
Hardware	1 Nvidia Tesla V100-PCIE, CPU @ 3.40GHz	8 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz
Upload Date	07/04/2022 (day/month/year)	07/04/2022 (day/month/year)
MindSpore Version	1.6.1	1.6.1
Dataset	AIM-500, COCO-2014 (composition of datasets)	AIM-500, COCO-2014 (composition of datasets)
Training Parameters	epochs=30, lr=0.01, batch_size=16, num_workers=12	epochs=30, lr=0.01, batch_size=16 (each device), num_workers=4
Optimizer	Adam, beta1=0.9, beta2=0.999, eps=1e-8	Adam, beta1=0.9, beta2=0.999, eps=1e-8
Loss Function	Weighted loss (alpha predictions loss and composition loss)	Weighted loss (alpha predictions loss and composition loss)
Speed	~ 516 ms/step	~ 2670 ms/step
Total time	~ 11.6 hours	~ 7.5 hours

Evaluation Performance

Parameters	GPU (1p)	GPU (8p)
Model	IndexNet	IndexNet
Resource	1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz	1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz
Upload Date	07/04/2022 (day/month/year)	07/04/2022 (day/month/year)
MindSpore Version	1.6.1	1.6.1
Dataset	AIM-500, COCO-2014 (composition of datasets)	AIM-500, COCO-2014 (composition of datasets)
Batch_size	1	1
Outputs	.png images of alpha masks	.png images of alpha masks
Metrics	21.51 SAD, 0.0096 MSE, 13.43 Grad, 20.43 Conn	22.06 SAD, 0.0134 MSE, 12.84 Grad, 21.32 Conn
Metrics expected range	< 24.00 SAD, < 0.0120 MSE, < 13.70 Grad, < 23.20 Conn	< 24.20 SAD, < 0.0145 MSE, < 13.40 Grad, < 22.70 Conn

ModelZoo Homepage

Please check the official homepage.

MindSpore/models

Contents

IndexNet Description

Model Architecture

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Process

Standalone Training

Distribute Training

Evaluation Process

Evaluation

Model Export

Model Description

Performance

Training Performance

Evaluation Performance

ModelZoo Homepage

简介

发行版

贡献者

近期动态

MindSpore/models .gitee-modal { width: 500px !important; }

Contents

Standalone Training

Distribute Training

Evaluation

Training Performance

Evaluation Performance

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

MindSpore/models