Frustratingly Simple Few-Shot Object Detection (ICML'2020)

Abstract

Detecting rare objects from a few examples is an emerging problem. Prior works show meta-learning is a promising approach. But, fine-tuning techniques have drawn scant attention. We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods. However, the high variance in the few samples often leads to the unreliability of existing benchmarks. We revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparisons and build new benchmarks based on three datasets: PASCAL VOC, COCO and LVIS. Again, our fine-tuning approach establishes a new state of the art on the revised benchmarks. The code as well as the pretrained models are available at https://github.com/ucbdrive/few-shot-object-detection.

Citation

@inproceedings{wang2020few,
    title={Frustratingly Simple Few-Shot Object Detection},
    author={Wang, Xin and Huang, Thomas E. and  Darrell, Trevor and Gonzalez, Joseph E and Yu, Fisher}
    booktitle={International Conference on Machine Learning (ICML)},
    year={2020}
}

Note: ALL the reported results use the data split released from TFA official repo. Currently, each setting is only evaluated with one fixed few shot dataset. Please refer to DATA Preparation to get more details about the dataset and data preparation.

How to reproduce TFA

Following the original implementation, it consists of 3 steps:

Step1: Base training
- use all the images and annotations of base classes to train a base model.
Step2: Reshape the bbox head of base model:
- create a new bbox head for all classes fine-tuning (base classes + novel classes) using provided script.
- the weights of base class in new bbox head directly use the original one as initialization.
- the weights of novel class in new bbox head use random initialization.
Step3: Few shot fine-tuning:
- use the base model from step2 as model initialization and further fine tune the bbox head with few shot datasets.

An example of VOC split1 1 shot setting with 8 gpus

# step1: base training for voc split1
bash ./tools/detection/dist_train.sh \
    configs/detection/tfa/voc/split1/tfa_r101_fpn_voc-split1_base-training.py 8

# step2: reshape the bbox head of base model for few shot fine-tuning
python -m tools.detection.misc.initialize_bbox_head \
    --src1 work_dirs/tfa_r101_fpn_voc-split1_base-training/latest.pth \
    --method randinit \
    --save-dir work_dirs/tfa_r101_fpn_voc-split1_base-training

# step3: few shot fine-tuning
bash ./tools/detection/dist_train.sh \
    configs/detection/tfa/voc/split1/tfa_r101_fpn_voc-split1_1shot-fine-tuning.py 8

Note:

The default output path of the reshaped base model in step2 is set to work_dirs/{BASE TRAINING CONFIG}/base_model_random_init_bbox_head.pth. When the model is saved to different path, please update the argument load_from in step3 few shot fine-tune configs instead of using resume_from.
To use pre-trained checkpoint, please set the load_from to the downloaded checkpoint path.

Results on VOC dataset

Base Training

Arch	Split	Base AP50	ckpt(step1)	ckpt(step2)	log
r101_fpn	1	80.9	ckpt	ckpt	log
r101_fpn	2	82.0	ckpt	ckpt	log
r101_fpn	3	82.1	ckpt	ckpt	log

Note:

The performance of the same few shot setting using different base training models can be dramatically unstable (AP50 can fluctuate by 5.0 or more), even their mAP on base classes are very close.
Temporally, the solution to getting a good base model is training the base model with different random seed. Also, the random seed used in this code base may not the optimal one, and it is possible to get the higher results by using other random seeds. However, using the same random seed still can not guarantee the identical result each time, as some nondeterministic CUDA operations. We will continue to investigate and improve it.
To reproduce the reported few shot results, it is highly recommended using the released step2 model for few shot fine-tuning.
The difficult samples will be used in base training, but not be used in few shot setting.

Few Shot Fine-tuning

Arch	Split	Shot	Base AP50	Novel AP50	ckpt	log
r101_fpn	1	1	79.2	41.9	ckpt	log
r101_fpn	1	2	79.2	49.0	ckpt	log
r101_fpn	1	3	79.6	49.9	ckpt	log
r101_fpn	1	5	79.6	58.0	ckpt	log
r101_fpn	1	10	79.7	58.4	ckpt	log
r101_fpn	2	1	80.3	26.6	ckpt	log
r101_fpn	2	2	78.1	30.7	ckpt	log
r101_fpn	2	3	79.4	39.0	ckpt	log
r101_fpn	2	5	79.4	35.7	ckpt	log
r101_fpn	2	10	79.7	40.5	ckpt	log
r101_fpn	3	1	80.5	34.0	ckpt	log
r101_fpn	3	2	80.6	39.3	ckpt	log
r101_fpn	3	3	81.1	42.8	ckpt	log
r101_fpn	3	5	80.8	51.4	ckpt	log
r101_fpn	3	10	80.7	50.6	ckpt	log

Results on COCO dataset

Base Training

Arch	Base mAP	ckpt(step1)	ckpt(step2)	log
r101_fpn	39.5	ckpt	ckpt	log

Few Shot Fine-tuning

Arch	Shot	Base mAP	Novel mAP	ckpt	log
r101_fpn	10	35.2	10.4	ckpt	log
r101_fpn	30	36.7	14.7	ckpt	log

OpenMMLab / mmfewshot

Frustratingly Simple Few-Shot Object Detection (ICML'2020)

Abstract

Citation

How to reproduce TFA

An example of VOC split1 1 shot setting with 8 gpus

Results on VOC dataset

Base Training

Few Shot Fine-tuning

Results on COCO dataset

Base Training

Few Shot Fine-tuning

简介

发行版

贡献者

近期动态

OpenMMLab / mmfewshot .gitee-modal { width: 500px !important; }

Frustratingly Simple Few-Shot Object Detection (ICML'2020)

Abstract

Citation

How to reproduce TFA

An example of VOC split1 1 shot setting with 8 gpus

Results on VOC dataset

Base Training

Few Shot Fine-tuning

Results on COCO dataset

Base Training

Few Shot Fine-tuning

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

OpenMMLab / mmfewshot