# deeplab-pytorch **Repository Path**: wei_qiang_zhou/deeplab-pytorch ## Basic Information - **Project Name**: deeplab-pytorch - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-11-15 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DeepLab with PyTorch This is an unofficial **PyTorch** implementation of **DeepLab v2** [[1](##references)] with a **ResNet-101** backbone. * **COCO-Stuff** dataset [[2](##references)] and **PASCAL VOC** dataset [[3]()] are supported. * The official Caffe weights provided by the authors can be used without building the Caffe APIs. * DeepLab v3/v3+ models with the identical backbone are also included (not tested). * [```torch.hub``` is supported](#torchhub). ## Performance ### COCO-Stuff

Train set	Eval set	Code	Weight	CRF?	Pixel Accuracy	Mean Accuracy	Mean IoU	FreqW IoU
10k train †	10k val †	Official [2]			65.1	45.5	34.4	50.4
		This repo	Download		65.8	45.7	34.8	51.2
		This repo	Download	✓	67.1	46.4	35.6	52.5
164k train	164k val	This repo	Download ‡		66.8	51.2	39.1	51.5
164k train	164k val	This repo	Download ‡	✓	67.6	51.5	39.7	52.3

† Images and labels are pre-warped to square-shape 513x513
‡ Note for [SPADE](https://nvlabs.github.io/SPADE/) followers: The provided COCO-Stuff 164k weight has been kept intact since 2019/02/23. ### PASCAL VOC 2012

Train set	Eval set	Code	Weight	CRF?	Pixel Accuracy	Mean Accuracy	Mean IoU	FreqW IoU
trainaug	val	Official [3]			-	-	76.35	-
		Official [3]		✓	-	-	77.69	-
		This repo	Download		94.64	86.50	76.65	90.41
		This repo	Download	✓	95.04	86.64	77.93	91.06

## Setup ### Requirements Required Python packages are listed in the Anaconda configuration file `configs/conda_env.yaml`. Please modify the listed `cudatoolkit=10.2` and `python=3.6` as needed and run the following commands. ```sh # Set up with Anaconda conda env create -f configs/conda_env.yaml conda activate deeplab-pytorch ``` ### Download datasets * [COCO-Stuff 10k/164k](data/datasets/cocostuff/README.md) * [PASCAL VOC 2012](data/datasets/voc12/README.md) ### Download pre-trained caffemodels Caffemodels pre-trained on COCO and PASCAL VOC datasets are released by the DeepLab authors. In accordance with the papers [[1](##references),[2](##references)], this repository uses the COCO-trained parameters as initial weights. 1. Run the follwing script to download the pre-trained caffemodels (1GB+). ```sh $ bash scripts/setup_caffemodels.sh ``` 2. Convert the caffemodels to pytorch compatibles. No need to build the Caffe API! ```sh # Generate "deeplabv1_resnet101-coco.pth" from "init.caffemodel" $ python convert.py --dataset coco # Generate "deeplabv2_resnet101_msc-vocaug.pth" from "train2_iter_20000.caffemodel" $ python convert.py --dataset voc12 ``` ## Training & Evaluation To train DeepLab v2 on PASCAL VOC 2012: ```sh python main.py train \ --config-path configs/voc12.yaml ``` To evaluate the performance on a validation set: ```sh python main.py test \ --config-path configs/voc12.yaml \ --model-path data/models/voc12/deeplabv2_resnet101_msc/train_aug/checkpoint_final.pth ``` Note: This command saves the predicted logit maps (`.npy`) and the scores (`.json`). To re-evaluate with a CRF post-processing:
```sh python main.py crf \ --config-path configs/voc12.yaml ``` Execution of a series of the above scripts is equivalent to `bash scripts/train_eval.sh`. To monitor a loss, run the following command in a separate terminal. ```sh tensorboard --logdir data/logs ``` Please specify the appropriate configuration files for the other datasets. | Dataset | Config file | #Iterations | Classes | | :-------------- | :--------------------------- | :---------- | :--------------------------- | | PASCAL VOC 2012 | `configs/voc12.yaml` | 20,000 | 20 foreground + 1 background | | COCO-Stuff 10k | `configs/cocostuff10k.yaml` | 20,000 | 182 thing/stuff | | COCO-Stuff 164k | `configs/cocostuff164k.yaml` | 100,000 | 182 thing/stuff | Note: Although the label indices range from 0 to 181 in COCO-Stuff 10k/164k, only [171 classes](https://github.com/nightrome/cocostuff/blob/master/labels.md) are supervised. Common settings: - **Model**: DeepLab v2 with ResNet-101 backbone. Dilated rates of ASPP are (6, 12, 18, 24). Output stride is 8. - **GPU**: All the GPUs visible to the process are used. Please specify the scope with ```CUDA_VISIBLE_DEVICES=```. - **Multi-scale loss**: Loss is defined as a sum of responses from multi-scale inputs (1x, 0.75x, 0.5x) and element-wise max across the scales. The *unlabeled* class is ignored in the loss computation. - **Gradient accumulation**: The mini-batch of 10 samples is not processed at once due to the high occupancy of GPU memories. Instead, gradients of small batches of 5 samples are accumulated for 2 iterations, and weight updating is performed at the end (```batch_size * iter_size = 10```). GPU memory usage is approx. 11.2 GB with the default setting (tested on the single Titan X). You can reduce it with a small ```batch_size```. - **Learning rate**: Stochastic gradient descent (SGD) is used with momentum of 0.9 and initial learning rate of 2.5e-4. Polynomial learning rate decay is employed; the learning rate is multiplied by ```(1-iter/iter_max)**power``` at every 10 iterations. - **Monitoring**: Moving average loss (```average_loss``` in Caffe) can be monitored in TensorBoard. - **Preprocessing**: Input images are randomly re-scaled by factors ranging from 0.5 to 1.5, padded if needed, and randomly cropped to 321x321. Processed images and labels in COCO-Stuff 164k: ![Data](docs/datasets/cocostuff.png) ## Inference Demo You can use [the pre-trained models](#performance), [the converted models](#download-pre-trained-caffemodels), or your models. To process a single image: ```bash python demo.py single \ --config-path configs/voc12.yaml \ --model-path deeplabv2_resnet101_msc-vocaug-20000.pth \ --image-path image.jpg ``` To run on a webcam: ```bash python demo.py live \ --config-path configs/voc12.yaml \ --model-path deeplabv2_resnet101_msc-vocaug-20000.pth ``` To run a CRF post-processing, add `--crf`. To run on a CPU, add `--cpu`. ## Misc ### torch.hub Model setup with two lines ```python import torch.hub model = torch.hub.load("kazuto1011/deeplab-pytorch", "deeplabv2_resnet101", pretrained='cocostuff164k', n_classes=182) ``` ### Difference with Caffe version * While the official code employs 1/16 bilinear interpolation (```Interp``` layer) for downsampling a label for only 0.5x input, this codebase does for both 0.5x and 0.75x inputs with nearest interpolation (```PIL.Image.resize```, [related issue](https://github.com/kazuto1011/deeplab-pytorch/issues/51)). * Bilinear interpolation on images and logits is performed with the ```align_corners=False```. ### Training batch normalization This codebase only supports DeepLab v2 training which freezes batch normalization layers, although v3/v3+ protocols require training them. If training their parameters on multiple GPUs as well in your projects, please install [the extra library](https://hangzhang.org/PyTorch-Encoding/) below. ```bash pip install torch-encoding ``` Batch normalization layers in a model are automatically switched in ```libs/models/resnet.py```. ```python try: from encoding.nn import SyncBatchNorm _BATCH_NORM = SyncBatchNorm except: _BATCH_NORM = nn.BatchNorm2d ``` ## References 1. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. *IEEE TPAMI*, 2018.
[Project](http://liangchiehchen.com/projects/DeepLab.html) / [Code](https://bitbucket.org/aquariusjay/deeplab-public-ver2) / [arXiv paper](https://arxiv.org/abs/1606.00915) 2. H. Caesar, J. Uijlings, V. Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In *CVPR*, 2018.
[Project](https://github.com/nightrome/cocostuff) / [arXiv paper](https://arxiv.org/abs/1612.03716) 1. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. *IJCV*, 2010.
[Project](http://host.robots.ox.ac.uk/pascal/VOC) / [Paper](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf)