1 Star 1 Fork 1

liu_yf/pytorch-image-models

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

PyTorch Image Models

Sponsors

A big thank you to my GitHub Sponsors for their support!

In addition to the sponsors at the link above, I've received hardware and/or cloud resources from

I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.

What's New

April 1, 2021

  • Add snazzy benchmark.py script for bulk timm model benchmarking of train and/or inference
  • Add Pooling-based Vision Transformer (PiT) models (from https://github.com/naver-ai/pit)
    • Merged distilled variant into main for torchscript compatibility
    • Some timm cleanup/style tweaks and weights have hub download support
  • Cleanup Vision Transformer (ViT) models
    • Merge distilled (DeiT) model into main so that torchscript can work
    • Support updated weight init (defaults to old still) that closer matches original JAX impl (possibly better training from scratch)
    • Separate hybrid model defs into different file and add several new model defs to fiddle with, support patch_size != 1 for hybrids
    • Fix fine-tuning num_class changes (PiT and ViT) and pos_embed resizing (Vit) with distilled variants
    • nn.Sequential for block stack (does not break downstream compat)
  • TnT (Transformer-in-Transformer) models contributed by author (from https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/cv/TNT)
  • Add RegNetY-160 weights from DeiT teacher model
  • Add new NFNet-L0 w/ SE attn (rename nfnet_l0b->nfnet_l0) weights 82.75 top-1 @ 288x288
  • Some fixes/improvements for TFDS dataset wrapper

March 17, 2021

  • Add new ECA-NFNet-L0 (rename nfnet_l0c->eca_nfnet_l0) weights trained by myself.
    • 82.6 top-1 @ 288x288, 82.8 @ 320x320, trained at 224x224
    • Uses SiLU activation, approx 2x faster than dm_nfnet_f0 and 50% faster than nfnet_f0s w/ 1/3 param count
  • Integrate Hugging Face model hub into timm create_model and default_cfg handling for pretrained weight and config sharing (more on this soon!)
  • Merge HardCoRe NAS models contributed by https://github.com/yoniaflalo
  • Merge PyTorch trained EfficientNet-EL and pruned ES/EL variants contributed by DeGirum

March 7, 2021

  • First 0.4.x PyPi release w/ NFNets (& related), ByoB (GPU-Efficient, RepVGG, etc).
  • Change feature extraction for pre-activation nets (NFNets, ResNetV2) to return features before activation.
  • Tested with PyTorch 1.8 release. Updated CI to use 1.8.
  • Benchmarked several arch on RTX 3090, Titan RTX, and V100 across 1.7.1, 1.8, NGC 20.12, and 21.02. Some interesting performance variations to take note of https://gist.github.com/rwightman/bb59f9e245162cee0e38bd66bd8cd77f

Feb 18, 2021

  • Add pretrained weights and model variants for NFNet-F* models from DeepMind Haiku impl.
    • Models are prefixed with dm_. They require SAME padding conv, skipinit enabled, and activation gains applied in act fn.
    • These models are big, expect to run out of GPU memory. With the GELU activiation + other options, they are roughly 1/2 the inference speed of my SiLU PyTorch optimized s variants.
    • Original model results are based on pre-processing that is not the same as all other models so you'll see different results in the results csv (once updated).
    • Matching the original pre-processing as closely as possible I get these results:
      • dm_nfnet_f6 - 86.352
      • dm_nfnet_f5 - 86.100
      • dm_nfnet_f4 - 85.834
      • dm_nfnet_f3 - 85.676
      • dm_nfnet_f2 - 85.178
      • dm_nfnet_f1 - 84.696
      • dm_nfnet_f0 - 83.464

Feb 16, 2021

  • Add Adaptive Gradient Clipping (AGC) as per https://arxiv.org/abs/2102.06171. Integrated w/ PyTorch gradient clipping via mode arg that defaults to prev 'norm' mode. For backward arg compat, clip-grad arg must be specified to enable when using train.py.
    • AGC w/ default clipping factor --clip-grad .01 --clip-mode agc
    • PyTorch global norm of 1.0 (old behaviour, always norm), --clip-grad 1.0
    • PyTorch value clipping of 10, --clip-grad 10. --clip-mode value
    • AGC performance is definitely sensitive to the clipping factor. More experimentation needed to determine good values for smaller batch sizes and optimizers besides those in paper. So far I've found .001-.005 is necessary for stable RMSProp training w/ NFNet/NF-ResNet.

Feb 12, 2021

Feb 10, 2021

  • First Normalization-Free model training experiments done,
    • nf_resnet50 - 80.68 top-1 @ 288x288, 80.31 @ 256x256
    • nf_regnet_b1 - 79.30 @ 288x288, 78.75 @ 256x256
  • More model archs, incl a flexible ByobNet backbone ('Bring-your-own-blocks')
  • Refinements to normalizer layer arg handling and normalizer+act layer handling in some models
  • Default AMP mode changed to native PyTorch AMP instead of APEX. Issues not being fixed with APEX. Native works with --channels-last and --torchscript model training, APEX does not.
  • Fix a few bugs introduced since last pypi release

Feb 8, 2021

  • Add several ResNet weights with ECA attention. 26t & 50t trained @ 256, test @ 320. 269d train @ 256, fine-tune @320, test @ 352.
    • ecaresnet26t - 79.88 top-1 @ 320x320, 79.08 @ 256x256
    • ecaresnet50t - 82.35 top-1 @ 320x320, 81.52 @ 256x256
    • ecaresnet269d - 84.93 top-1 @ 352x352, 84.87 @ 320x320
  • Remove separate tiered (t) vs tiered_narrow (tn) ResNet model defs, all tn changed to t and t models removed (seresnext26t_32x4d only model w/ weights that was removed).
  • Support model default_cfgs with separate train vs test resolution test_input_size and remove extra _320 suffix ResNet model defs that were just for test.

Jan 30, 2021

  • Add initial "Normalization Free" NF-RegNet-B* and NF-ResNet model definitions based on paper

Jan 25, 2021

  • Add ResNetV2 Big Transfer (BiT) models w/ ImageNet-1k and 21k weights from https://github.com/google-research/big_transfer
  • Add official R50+ViT-B/16 hybrid models + weights from https://github.com/google-research/vision_transformer
  • ImageNet-21k ViT weights are added w/ model defs and representation layer (pre logits) support
    • NOTE: ImageNet-21k classifier heads were zero'd in original weights, they are only useful for transfer learning
  • Add model defs and weights for DeiT Vision Transformer models from https://github.com/facebookresearch/deit
  • Refactor dataset classes into ImageDataset/IterableImageDataset + dataset specific parser classes
  • Add Tensorflow-Datasets (TFDS) wrapper to allow use of TFDS image classification sets with train script
    • Ex: train.py /data/tfds --dataset tfds/oxford_iiit_pet --val-split test --model resnet50 -b 256 --amp --num-classes 37 --opt adamw --lr 3e-4 --weight-decay .001 --pretrained -j 2
  • Add improved .tar dataset parser that reads images from .tar, folder of .tar files, or .tar within .tar
    • Run validation on full ImageNet-21k directly from tar w/ BiT model: validate.py /data/fall11_whole.tar --model resnetv2_50x1_bitm_in21k --amp
  • Models in this update should be stable w/ possible exception of ViT/BiT, possibility of some regressions with train/val scripts and dataset handling

Jan 3, 2021

  • Add SE-ResNet-152D weights
    • 256x256 val, 0.94 crop top-1 - 83.75
    • 320x320 val, 1.0 crop - 84.36
  • Update results files

Dec 18, 2020

  • Add ResNet-101D, ResNet-152D, and ResNet-200D weights trained @ 256x256
    • 256x256 val, 0.94 crop (top-1) - 101D (82.33), 152D (83.08), 200D (83.25)
    • 288x288 val, 1.0 crop - 101D (82.64), 152D (83.48), 200D (83.76)
    • 320x320 val, 1.0 crop - 101D (83.00), 152D (83.66), 200D (84.01)

Dec 7, 2020

  • Simplify EMA module (ModelEmaV2), compatible with fully torchscripted models
  • Misc fixes for SiLU ONNX export, default_cfg missing from Feature extraction models, Linear layer w/ AMP + torchscript
  • PyPi release @ 0.3.2 (needed by EfficientDet)

Oct 30, 2020

  • Test with PyTorch 1.7 and fix a small top-n metric view vs reshape issue.
  • Convert newly added 224x224 Vision Transformer weights from official JAX repo. 81.8 top-1 for B/16, 83.1 L/16.
  • Support PyTorch 1.7 optimized, native SiLU (aka Swish) activation. Add mapping to 'silu' name, custom swish will eventually be deprecated.
  • Fix regression for loading pretrained classifier via direct model entrypoint functions. Didn't impact create_model() factory usage.
  • PyPi release @ 0.3.0 version!

Oct 26, 2020

  • Update Vision Transformer models to be compatible with official code release at https://github.com/google-research/vision_transformer
  • Add Vision Transformer weights (ImageNet-21k pretrain) for 384x384 base and large models converted from official jax impl
    • ViT-B/16 - 84.2
    • ViT-B/32 - 81.7
    • ViT-L/16 - 85.2
    • ViT-L/32 - 81.5

Oct 21, 2020

  • Weights added for Vision Transformer (ViT) models. 77.86 top-1 for 'small' and 79.35 for 'base'. Thanks to Christof for training the base model w/ lots of GPUs.

Oct 13, 2020

  • Initial impl of Vision Transformer models. Both patch and hybrid (CNN backbone) variants. Currently trying to train...
  • Adafactor and AdaHessian (FP32 only, no AMP) optimizers
  • EdgeTPU-M (efficientnet_em) model trained in PyTorch, 79.3 top-1
  • Pip release, doc updates pending a few more changes...

Sept 18, 2020

  • New ResNet 'D' weights. 72.7 (top-1) ResNet-18-D, 77.1 ResNet-34-D, 80.5 ResNet-50-D
  • Added a few untrained defs for other ResNet models (66D, 101D, 152D, 200/200D)

Sept 3, 2020

  • New weights
    • Wide-ResNet50 - 81.5 top-1 (vs 78.5 torchvision)
    • SEResNeXt50-32x4d - 81.3 top-1 (vs 79.1 cadene)
  • Support for native Torch AMP and channels_last memory format added to train/validate scripts (--channels-last, --native-amp vs --apex-amp)
  • Models tested with channels_last on latest NGC 20.08 container. AdaptiveAvgPool in attn layers changed to mean((2,3)) to work around bug with NHWC kernel.

Introduction

PyTorch Image Models (timm) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.

The work of many others is present here. I've tried to make sure all source material is acknowledged via links to github, arxiv papers, etc in the README, documentation, and code docstrings. Please let me know if I missed anything.

Models

All model architecture families include variants with pretrained weights. There are specific model variants without any weights, it is NOT a bug. Help training new or better weights is always appreciated. Here are some example training hparams to get you started.

A full version of the list below with source links can be found in the documentation.

Features

Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:

Results

Model validation results can be found in the documentation and in the results tables

Getting Started (Documentation)

My current documentation for timm covers the basics.

timmdocs is quickly becoming a much more comprehensive set of documentation for timm. A big thanks to Aman Arora for his efforts creating timmdocs.

paperswithcode is a good resource for browsing the models within timm.

Train, Validation, Inference Scripts

The root folder of the repository contains reference train, validation, and inference scripts that work with the included models and other features of this repository. They are adaptable for other datasets and use cases with a little hacking. See documentation for some basics and training hparams for some train examples that produce SOTA ImageNet results.

Awesome PyTorch Resources

One of the greatest assets of PyTorch is the community and their contributions. A few of my favourite resources that pair well with the models and componenets here are listed below.

Object Detection, Instance and Semantic Segmentation

Computer Vision / Image Augmentation

Knowledge Distillation

Metric Learning

Training / Frameworks

Licenses

Code

The code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL / LGPL conflicts. That said, it is your responsibility to ensure you comply with license here and conditions of any dependent licenses. Where applicable, I've linked the sources/references for various components in docstrings. If you think I've missed anything please create an issue.

Pretrained Weights

So far all of the pretrained weights available here are pretrained on ImageNet with a select few that have some additional pretraining (see extra note below). ImageNet was released for non-commercial research purposes only (http://www.image-net.org/download-faq). It's not clear what the implications of that are for the use of pretrained weights from that dataset. Any models I have trained with ImageNet are done for research purposes and one should assume that the original dataset license applies to the weights. It's best to seek legal advice if you intend to use the pretrained weights in a commercial product.

Pretrained on more than ImageNet

Several weights included or references here were pretrained with proprietary datasets that I do not have access to. These include the Facebook WSL, SSL, SWSL ResNe(Xt) and the Google Noisy Student EfficientNet models. The Facebook models have an explicit non-commercial license (CC-BY-NC 4.0, https://github.com/facebookresearch/semi-supervised-ImageNet1K-models, https://github.com/facebookresearch/WSL-Images). The Google models do not appear to have any restriction beyond the Apache 2.0 license (and ImageNet concerns). In either case, you should contact Facebook or Google with any questions.

Citing

BibTeX

@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
}

Latest DOI

DOI

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "{}" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2019 Ross Wightman Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

https://github.com/rwightman/pytorch-image-models 展开 收起
Python 等 2 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
Python
1
https://gitee.com/liuyunfei/pytorch-image-models.git
git@gitee.com:liuyunfei/pytorch-image-models.git
liuyunfei
pytorch-image-models
pytorch-image-models
master

搜索帮助