Upsampling is an essential stage for most dense prediction tasks using deep convolutional neural networks (CNNs). The frequently used upsampling operators include transposed convolution, unpooling, periodic shuffling (also known as depth-to-space), and naive interpolation followed by convolution. These operators, however, are not general-purpose designs and often exhibit different behaviors in different tasks. Instead of using maxpooling and unpooling, IndexNet is based on two novel operations: indexed pooling and indexed upsampling where downsampling and upsampling are guided by learned indices. The indices are generated dynamically conditioned on the feature map and are learned using a fully convolutional network, termed IndexNet, without supervision.
Paper: Indices Matter: Learning to Index for Deep Image Matting. Hau Lu, Yutong Dai, Chunhua Shen.
IndexNet bases on the UNet architecture and uses mobilenetv2 as backbone. Mobilenetv2 was chosen because it is lightweight and allows the use of higher-resolution images on the same GPU as high capacity backbones. All 2-stride convolutions were changed by 1-stride convolutions and 2-stride 2x2 max poolings after each encoding stage for downsampling, which allows the extraction of indices. If applying the IndexNet idea, max pooling and unpooling layers can be replaced with IndexedPooling and IndexedUnpooling, respectively.
Paper uses the Adobe Image Matting dataset, but it is in close access. Thus, we use AIM-500 (Automatic Image Matting - 500) dataset, which is in open access, and anyone can download it.
Every image from AIM-500 dataset cuts out by mask and N (96 train part, 20 test part) times placed as foreground over the unique image from the COCO-2014 dataset (train part), which is used as background.
Datasets used: AIM-500, COCO-2014 (train).
AIM-500 | COCO-2014 | Merged (after processing) | |
---|---|---|---|
Dataset size | ~0.35 Gb | ~13.0 Gb | ~86.0 Gb |
Train | 0.35 Gb, 3 * 500 images (mask, original, trimap) | 13.0 Gb, 82783 images | 84 Gb, 43200 images |
Test | - | - | 2 Gb, 1000 images |
Data format | .png, .jpg images | .jpg images | .png images |
Note: We manually split AIM-500 for the train/test parts (450/50).
Download AIM-500 dataset (3 folders: original, mask, trimap), unzip them, move folders from unzipped archives into one folder named AIM-500. Download COCO-2014 train and unzip.
The structure of the datasets will be as follows:
.
└─AIM-500 <- data_dir
├─mask
│ └─***.png
├─original
│ └─***.jpg
└─trimap
└─***.png
.
└─train2014 <- bg_dir
└─***.jpg
Where *** is the image file name
To process dataset use the command below.
python -m data.process_dataset --data_dir /path/to/AIM-500 --bg_dir /path/to/coco/train2014
Note: Before data processing requirements will be installed. Make sure that you have ~100 Gb free space at disk, which corresponds to --data_dir path. It can take about 20 hours to prepare dataset, depends on hardware.
During processing the data_dir structure will be automatically changed and the merged images saved into data_dir/train/merged, data_dir/validation/merged. The bg_dir will remain unchanged. Processed dataset will have the following structure:
.
└─AIM-500 <- data_dir
├─train
│ ├─data.txt
│ ├─mask
│ ├─merged
│ └─original
└─validation
├─data.txt
├─mask
├─merged
├─original
└─trimap
.
└─train2014 <- bg_dir
Note: We use MindSpore 1.6.1 GPU, thus make sure that you install >= 1.6.1 version.
After installing MindSpore through the official website, you can follow the steps below for training and evaluation,
in particular, before training, you need to install requirements.txt
by following command pip install -r requirements.txt
and download
the pre-trained on ImageNet mobilenetv2 backbone.
# Run standalone training example
bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]
# Run distribute training example
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]
.
└─IndexNet
├─README.md
├─requirements.txt
├─data
│ └─process_dataset.py # data preparation script
├─scripts
│ ├─run_distribute_train_gpu.sh # launch distribute train on GPU
│ ├─run_eval_gpu.sh # launch evaluation on GPU
│ └─run_standalone_train_gpu.sh # launch standalone train on GPU
├─src
│ ├─cfg
│ │ ├─__init__.py
│ │ └─config.py # parameter parser
│ ├─dataset.py # dataset script and utils
│ ├─layers.py # model layers
│ ├─model.py # model script
│ ├─modules.py # model modules
│ └─utils.py # utilities used in other scripts
├─default_config.yaml # default configs
├─eval.py # evaluation script
├─export.py # export to MINDIR script
└─train.py # training script
# Main arguments:
# training params
batch_size: 16 # Batch size for training
epochs: 30 # Number of training epochs
learning_rate: 0.01 # Learning rate init
backbone_lr_mult: 100 # Learning rate scaling (division) for backbone params
lr_decay: 0.1 # Learning rate scaling at milestone
milestones: [20, 26] # Milestones for learning rate scheduler
input_size: 320 # Input crop size for training
Note: For all trainings necessary to use pretrained modilenetv2 as backbone.
bash scripts/run_standalone_train_gpu.sh [DEVICE_ID] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]
The above command will run in the background, you can view the result through the generated standalone_train.log file. After training, you can get the training loss and time logs in chosen logs dir.
The model checkpoints will be saved in [LOGS_CKPT_DIR]
directory.
bash scripts/run_distribute_train_gpu.sh [DEVICE_NUM] [LOGS_CKPT_DIR] [MOBILENET_CKPT] [DATA_DIR] [BG_DIR]
The above command will run in the background, you can view the result through the generated distribute_train.log file. After training, you can get the training loss and time logs in chosen logs dir.
The model checkpoints will be saved in [LOGS_CKPT_DIR]
directory.
To start evaluation run the command below.
bash scripts/run_eval_gpu.sh [DEVICE_ID] [CKPT_URL] [DATA_DIR] [LOGS_DIR]
The above python command will run in the background. Predicted masks (.png) will be stored into chosen [LOGS_DIR]
.
And there you can view the results through the file "eval.log".
To export the model to mindir format, run the following command:
python export.py --ckpt_url [CKPT_URL]
Parameters | GPU (1p) | GPU (8p) |
---|---|---|
Model | IndexNet | IndexNet |
Hardware | 1 Nvidia Tesla V100-PCIE, CPU @ 3.40GHz | 8 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz |
Upload Date | 07/04/2022 (day/month/year) | 07/04/2022 (day/month/year) |
MindSpore Version | 1.6.1 | 1.6.1 |
Dataset | AIM-500, COCO-2014 (composition of datasets) | AIM-500, COCO-2014 (composition of datasets) |
Training Parameters | epochs=30, lr=0.01, batch_size=16, num_workers=12 | epochs=30, lr=0.01, batch_size=16 (each device), num_workers=4 |
Optimizer | Adam, beta1=0.9, beta2=0.999, eps=1e-8 | Adam, beta1=0.9, beta2=0.999, eps=1e-8 |
Loss Function | Weighted loss (alpha predictions loss and composition loss) | Weighted loss (alpha predictions loss and composition loss) |
Speed | ~ 516 ms/step | ~ 2670 ms/step |
Total time | ~ 11.6 hours | ~ 7.5 hours |
Parameters | GPU (1p) | GPU (8p) |
---|---|---|
Model | IndexNet | IndexNet |
Resource | 1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz | 1 Nvidia RTX 3090, Intel Xeon Gold 6226R CPU @ 2.90GHz |
Upload Date | 07/04/2022 (day/month/year) | 07/04/2022 (day/month/year) |
MindSpore Version | 1.6.1 | 1.6.1 |
Dataset | AIM-500, COCO-2014 (composition of datasets) | AIM-500, COCO-2014 (composition of datasets) |
Batch_size | 1 | 1 |
Outputs | .png images of alpha masks | .png images of alpha masks |
Metrics | 21.51 SAD, 0.0096 MSE, 13.43 Grad, 20.43 Conn | 22.06 SAD, 0.0134 MSE, 12.84 Grad, 21.32 Conn |
Metrics expected range | < 24.00 SAD, < 0.0120 MSE, < 13.70 Grad, < 23.20 Conn | < 24.20 SAD, < 0.0145 MSE, < 13.40 Grad, < 22.70 Conn |
Please check the official homepage.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。