# KL-Loss **Repository Path**: guanwl/KL-Loss ## Basic Information - **Project Name**: KL-Loss - **Description**: Bounding Box Regression with Uncertainty for Accurate Object Detection (CVPR'19) - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-11-06 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Bounding Box Regression with Uncertainty for Accurate Object Detection **CVPR 2019** [[presentation (youtube)]](https://www.youtube.com/watch?v=bcGtNdTzdkc) [Yihui He](http://yihui-he.github.io/), [Chenchen Zhu](https://sites.google.com/andrew.cmu.edu/zcckernel), [Jianren Wang](https://scholar.google.com/citations?user=NL8MDkwAAAAJ&hl=en), [Marios Savvides](http://www.cmu-biometrics.org), [Xiangyu Zhang](https://scholar.google.com/citations?user=yuB-cfoAAAAJ&hl=en&oi=ao), Carnegie Mellon University & Megvii Inc. Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non-maximum suppression (NMS), which further improves the localization performance. On MS-COCO, we boost the Average Precision (AP) of VGG-16 Faster R-CNN from 23.6% to 29.1%. More importantly, for ResNet-50-FPN Mask R-CNN, our method improves the AP and AP90 by **1.8%** and **6.2%** respectively, which significantly outperforms previous state-of-the-art bounding box refinement methods.

### Citation If you find the code useful in your research, please consider citing: @InProceedings{klloss, author = {He, Yihui and Zhu, Chenchen and Wang, Jianren and Savvides, Marios and Zhang, Xiangyu}, title = {Bounding Box Regression With Uncertainty for Accurate Object Detection}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2019} } ### Installation Please find installation instructions for Caffe2 and Detectron in [`INSTALL.md`](INSTALL.md). When installing cocoapi, please use [my fork](https://github.com/yihui-he/cocoapi) to get AP80 and AP90 scores. ### Testing Inference without Var Voting (8 GPUs): ``` python2 tools/test_net.py -c configs/e2e_faster_rcnn_R-50-FPN_2x.yaml ``` You will get: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.385 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.578 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.412 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.209 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.412 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.515 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.323 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.499 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.522 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.321 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.553 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.680 Average Precision (AP) @[ IoU=0.60 | area= all | maxDets=100 ] = 0.533 Average Precision (AP) @[ IoU=0.70 | area= all | maxDets=100 ] = 0.461 Average Precision (AP) @[ IoU=0.80 | area= all | maxDets=100 ] = 0.350 Average Precision (AP) @[ IoU=0.85 | area= all | maxDets=100 ] = 0.269 Average Precision (AP) @[ IoU=0.90 | area= all | maxDets=100 ] = 0.154 Average Precision (AP) @[ IoU=0.95 | area= all | maxDets=100 ] = 0.032 ``` Inference with Var Voting: ``` python2 tools/test_net.py -c configs/e2e_faster_rcnn_R-50-FPN_2x.yaml STD_NMS True ``` You will get: ``` Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.392 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.576 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.425 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.212 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.417 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.526 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.324 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.528 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.564 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.346 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.594 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.736 Average Precision (AP) @[ IoU=0.60 | area= all | maxDets=100 ] = 0.536 Average Precision (AP) @[ IoU=0.70 | area= all | maxDets=100 ] = 0.472 Average Precision (AP) @[ IoU=0.80 | area= all | maxDets=100 ] = 0.363 Average Precision (AP) @[ IoU=0.85 | area= all | maxDets=100 ] = 0.281 Average Precision (AP) @[ IoU=0.90 | area= all | maxDets=100 ] = 0.165 Average Precision (AP) @[ IoU=0.95 | area= all | maxDets=100 ] = 0.037 ``` ### Training ``` python2 tools/train_net.py -c configs/e2e_faster_rcnn_R-50-FPN_2x.yaml ``` ### FAQ Please create a [new issue](https://github.com/yihui-he/KL-Loss/issues/new). ------------------------------------------- # Detectron Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including [Mask R-CNN](https://arxiv.org/abs/1703.06870). It is written in Python and powered by the [Caffe2](https://github.com/caffe2/caffe2) deep learning framework. At FAIR, Detectron has enabled numerous research projects, including: [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144), [Mask R-CNN](https://arxiv.org/abs/1703.06870), [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333), [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), [Non-local Neural Networks](https://arxiv.org/abs/1711.07971), [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370), [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440), [DensePose: Dense Human Pose Estimation In The Wild](https://arxiv.org/abs/1802.00434), and [Group Normalization](https://arxiv.org/abs/1803.08494).

Example Mask R-CNN output.

## Introduction The goal of Detectron is to provide a high-quality, high-performance codebase for object detection *research*. It is designed to be flexible in order to support rapid implementation and evaluation of novel research. Detectron includes implementations of the following object detection algorithms: - [Mask R-CNN](https://arxiv.org/abs/1703.06870) -- *Marr Prize at ICCV 2017* - [RetinaNet](https://arxiv.org/abs/1708.02002) -- *Best Student Paper Award at ICCV 2017* - [Faster R-CNN](https://arxiv.org/abs/1506.01497) - [RPN](https://arxiv.org/abs/1506.01497) - [Fast R-CNN](https://arxiv.org/abs/1504.08083) - [R-FCN](https://arxiv.org/abs/1605.06409) using the following backbone network architectures: - [ResNeXt{50,101,152}](https://arxiv.org/abs/1611.05431) - [ResNet{50,101,152}](https://arxiv.org/abs/1512.03385) - [Feature Pyramid Networks](https://arxiv.org/abs/1612.03144) (with ResNet/ResNeXt) - [VGG16](https://arxiv.org/abs/1409.1556) Additional backbone architectures may be easily implemented. For more details about these models, please see [References](#references) below. ## Update - 4/2018: Support Group Normalization - see [`GN/README.md`](./projects/GN/README.md) ## License Detectron is released under the [Apache 2.0 license](https://github.com/facebookresearch/detectron/blob/master/LICENSE). See the [NOTICE](https://github.com/facebookresearch/detectron/blob/master/NOTICE) file for additional details. ## Citing Detectron If you use Detectron in your research or wish to refer to the baseline results published in the [Model Zoo](MODEL_ZOO.md), please use the following BibTeX entry. ``` @misc{Detectron2018, author = {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and Piotr Doll\'{a}r and Kaiming He}, title = {Detectron}, howpublished = {\url{https://github.com/facebookresearch/detectron}}, year = {2018} } ``` ## Model Zoo and Baselines We provide a large set of baseline results and trained models available for download in the [Detectron Model Zoo](MODEL_ZOO.md). ## Installation Please find installation instructions for Caffe2 and Detectron in [`INSTALL.md`](INSTALL.md). ## Quick Start: Using Detectron After installation, please see [`GETTING_STARTED.md`](GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron. ## Getting Help To start, please check the [troubleshooting](INSTALL.md#troubleshooting) section of our installation instructions as well as our [FAQ](FAQ.md). If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems. If bugs are found, **we appreciate pull requests** (including adding Q&A's to `FAQ.md` and improving our installation instructions and troubleshooting documents). Please see [CONTRIBUTING.md](CONTRIBUTING.md) for more information about contributing to Detectron. ## References - [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/abs/1712.04440). Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. Tech report, arXiv, Dec. 2017. - [Learning to Segment Every Thing](https://arxiv.org/abs/1711.10370). Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. Tech report, arXiv, Nov. 2017. - [Non-Local Neural Networks](https://arxiv.org/abs/1711.07971). Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Tech report, arXiv, Nov. 2017. - [Mask R-CNN](https://arxiv.org/abs/1703.06870). Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2017. - [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002). Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017. - [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677). Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Tech report, arXiv, June 2017. - [Detecting and Recognizing Human-Object Interactions](https://arxiv.org/abs/1704.07333). Georgia Gkioxari, Ross Girshick, Piotr Dollár, and Kaiming He. Tech report, arXiv, Apr. 2017. - [Feature Pyramid Networks for Object Detection](https://arxiv.org/abs/1612.03144). Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. - [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431). Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. - [R-FCN: Object Detection via Region-based Fully Convolutional Networks](http://arxiv.org/abs/1605.06409). Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2016. - [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. - [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](http://arxiv.org/abs/1506.01497) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2015. - [Fast R-CNN](http://arxiv.org/abs/1504.08083). Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2015.