A PyTorch implementation of EfficientDet.
It is based on the
There are other PyTorch implementations. Either their approach didn't fit my aim to correctly reproduce the Tensorflow models (but with a PyTorch feel and flexibility) or they cannot come close to replicating MS COCO training from scratch.
Aside from the default model configs, there is a lot of flexibility to facilitate experiments and rapid improvements here -- some options based on the official Tensorflow impl, some of my own:
timm
model collection that supports feature extraction (features_only
arg) can be used as a bacbkone.
Latest results in and training goal achieved. Slightly bested the TF model mAP results for D0 model. This model uses:
timm
)My latest D0 run:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.336251
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.521584
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.356439
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.123988
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.395033
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521695
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.287121
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.441450
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.467914
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.197697
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.552515
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689297
TF ported D0 weights:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.335653
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.516253
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.353884
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.125278
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.386957
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.528071
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.288049
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.439918
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.466877
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193482
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.549262
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.686037
Pretrained weights added for this model efficientdet_d0
(Tensorflow port is tf_efficientdet_d0
)
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.331
A bunch of changes:
Initial D1 training results in -- close but not quite there. Definitely in reach and better than any other non-official EfficientDet impl I've seen.
Biggest missing element is proper per-epoch mAP validation for better checkpoint selection (than loss based). I was resisting doing full COCO eval because it's so slow, but may throw that in for now...
D1: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.382
Previous D0 result: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.324
First decent MSCOCO training results (from scratch, w/ pretrained classification backbone weights as starting point). 32.4 mAP for D0. Working on improvements and D1 trials still running.
Taking a pause on training, some high priority things came up. There are signs of life on the training branch, was working the basic augs before priority switch, loss fn appeared to be doing something sane with distributed training working, no proper eval yet, init not correct yet. I will get to it, with SOTA training config and good performance as the end goal (as with my EfficientNet work).
Cleanup post-processing. Less code and a five-fold throughput increase on the smaller models. D0 running > 130 img/s on a single 2080Ti, D1 > 130 img/s on dual 2080Ti up to D7 @ 8.5 img/s.
Replace generate_detections
with PyTorch impl using torchvision batched_nms. Significant performance increase with minor (+/-.001 mAP) score differences. Quite a bit faster than original TF impl on a GPU now.
Initial code with working validation posted. Yes, it's a little slow, but I think faster than the official impl on a GPU if you leave AMP enabled. Post processing needs some love.
If you are an organization is interested in sponsoring and any of this work, or prioritization of the possible future directions interests you, feel free to contact me (issue, LinkedIn, Twitter, hello at rwightman dot com). I will setup a github sponser if there is any interest.
Variant | Download | mAP (val2017) | mAP (test-dev2017) | mAP (TF official val2017) | mAP (TF official test-dev2017) |
---|---|---|---|---|---|
D0 | efficientdet_d0.pth | 33.6 | TBD | 33.5 | 33.8 |
D0 | tf_efficientdet_d0.pth | 33.6 | TBD | 33.5 | 33.8 |
D1 | tf_efficientdet_d1.pth | 39.3 | TBD | 39.1 | 39.6 |
D2 | tf_efficientdet_d2.pth | 42.6 | 43.1 | 42.5 | 43 |
D3 | tf_efficientdet_d3.pth | 46.0 | TBD | 45.9 | 45.8 |
D4 | tf_efficientdet_d4.pth | 49.1 | TBD | 49.0 | 49.4 |
D5 | tf_efficientdet_d5.pth | 50.4 | TBD | 50.5 | 50.7 |
D6 | tf_efficientdet_d6.pth | 51.2 | TBD | 51.3 | 51.7 |
D7 | tf_efficientdet_d7.pth | 51.8 | 52.1 | 52.1 | 52.2 |
Tested in a Python 3.7 or 3.8 conda environment in Linux with:
pip install timm
or local install from (https://github.com/rwightman/pytorch-image-models)NOTE - There is a conflict/bug with Numpy 1.18+ and pycocotools, force install numpy <= 1.17.5 or the coco eval will fail, the validation script will still save the output JSON and that can be run through eval again later.
MSCOCO 2017 validation data:
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip val2017.zip
unzip annotations_trainval2017.zip
MSCOCO 2017 test-dev data:
wget http://images.cocodataset.org/zips/test2017.zip
unzip -q test2017.zip
wget http://images.cocodataset.org/annotations/image_info_test2017.zip
unzip image_info_test2017.zip
Run validation (val2017 by default) with D2 model: python validation.py /localtion/of/mscoco/ --model tf_efficientdet_d2 --checkpoint tf_efficientdet_d2.pth
Run test-dev2017: python validation.py /localtion/of/mscoco/ --model tf_efficientdet_d2 --checkpoint tf_efficientdet_d2.pth --anno test-dev2017
TODO: Need an inference script
./distributed_train.sh 2 /mscoco --model tf_efficientdet_d0 -b 16 --amp --lr .04 --warmup-epochs 5 --sync-bn --opt fusedmomentum --fill-color mean --model-ema
NOTE:
--fill-color mean
) as the background for crop/scale/aspect fill, the official repo uses black pixel (0) (--fill-color 0
). Both likely work fine.Latest training run with .336 for D0 (on 4x 1080ti):
./distributed_train.sh 4 --model efficientdet_d0 -b 22 --amp --lr .12 --sync-bn --opt fusedmomentum --warmup-epochs 5 --lr-noise 0.4 0.9 --model-ema --model-ema-decay 0.9999
These hparams above resulted in a good model, a few points:
VAL2017
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.336251
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.521584
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.356439
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.123988
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.395033
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521695
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.287121
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.441450
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.467914
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.197697
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.552515
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689297
NOTE: I've only tried submitting D2 and D7 to dev server for sanity check so far
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.431
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.624
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.463
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.226
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.471
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.585
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.345
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.543
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.575
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.632
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.756
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.521
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.714
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.563
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.345
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.555
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.646
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.390
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.631
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.670
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.497
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.704
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.808
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.336
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.516
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.354
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.125
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.387
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.528
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.288
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.440
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.467
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.549
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.686
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.393
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.583
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.419
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.447
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.572
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.323
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.532
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.295
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.599
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.734
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.618
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.452
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.237
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.481
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.590
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.537
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.569
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.633
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.748
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.460
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.651
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.493
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.283
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.503
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.618
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.360
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.570
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.605
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.409
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.655
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.491
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.685
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.531
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.334
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.539
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.641
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.375
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.598
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.635
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.468
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.683
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.780
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.504
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.700
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.543
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.337
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.549
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.646
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.381
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.617
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.654
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.485
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.696
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.791
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.512
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.706
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.551
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.555
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.654
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.623
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.661
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.701
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.794
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.518
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.711
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.564
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.665
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.505
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.704
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.801
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。