# CVNets **Repository Path**: jzyonggs/CVNets ## Basic Information - **Project Name**: CVNets - **Description**: CVNets 是一个计算机视觉工具包，它允许研究人员和工程师为各种任务训练标准和移动/非移动计算机视觉模型，包括对象分类、对象检测、语义分割和基础模型（例如，CLIP） - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/cvnets - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2023-12-07 - **Last Updated**: 2023-12-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CVNets: A library for training computer vision networks CVNets is a computer vision toolkit that allows researchers and engineers to train standard and novel mobile- and non-mobile computer vision models for variety of tasks, including object classification, object detection, semantic segmentation, and foundation models (e.g., CLIP). ## Table of contents * [What's new?](#whats-new) * [Installation](#installation) * [Getting started](#getting-started) * [Supported models and tasks](#supported-models-and-tasks) * [Maintainers](#maintainers) * [Research effort at Apple using CVNets](#research-effort-at-apple-using-cvnets) * [Contributing to CVNets](#contributing-to-cvnets) * [License](#license) * [Citation](#citation) ## What's new? * ***July 2023***: Version 0.4 of the CVNets library includes * [Bytes Are All You Need: Transformers Operating Directly On File Bytes ](https://arxiv.org/abs/2306.00238) * [RangeAugment: Efficient online augmentation with Range Learning](https://arxiv.org/abs/2212.10553) * Training and evaluating foundation models (CLIP) * Mask R-CNN * EfficientNet, Swin Transformer, and ViT * Enhanced distillation support ## Installation We recommend to use Python 3.10+ and [PyTorch](https://pytorch.org) (version >= v1.12.0) Instructions below use Conda, if you don't have Conda installed, you can check out [How to Install Conda](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links). ```bash # Clone the repo git clone git@github.com:apple/ml-cvnets.git cd ml-cvnets # Create a virtual env. We use Conda conda create -n cvnets python=3.10.8 conda activate cvnets # install requirements and CVNets package pip install -r requirements.txt -c constraints.txt pip install --editable . ``` ## Getting started * General instructions for working with CVNets are given [here](docs/source/en/general). * Examples for training and evaluating models are provided [here](docs/source/en/models) and [here](examples). * Examples for converting a PyTorch model to CoreML are provided [here](docs/source/en/general/README-pytorch-to-coreml.md). ## Supported models and Tasks To see a list of available models and benchmarks, please refer to [Model Zoo](docs/source/en/general/README-model-zoo.md) and [examples](examples) folder.

ImageNet classification models

* CNNs * [MobileNetv1](https://arxiv.org/abs/1704.04861) * [MobileNetv2](https://arxiv.org/abs/1801.04381) * [MobileNetv3](https://arxiv.org/abs/1905.02244) * [EfficientNet](https://arxiv.org/abs/1905.11946) * [ResNet](https://arxiv.org/abs/1512.03385) * [RegNet](https://arxiv.org/abs/2003.13678) * Transformers * [Vision Transformer](https://arxiv.org/abs/2010.11929) * [MobileViTv1](https://arxiv.org/abs/2110.02178) * [MobileViTv2](https://arxiv.org/abs/2206.02680) * [SwinTransformer](https://arxiv.org/abs/2103.14030)

Multimodal Classification

* [ByteFormer](https://arxiv.org/abs/2306.00238)

Object detection

* [SSD](https://arxiv.org/abs/1512.02325) * [Mask R-CNN](https://arxiv.org/abs/1703.06870)

Semantic segmentation

* [DeepLabv3](https://arxiv.org/abs/1706.05587) * [PSPNet](https://arxiv.org/abs/1612.01105)

Foundation models

* [CLIP](https://arxiv.org/abs/2103.00020)

Automatic Data Augmentation

* [RangeAugment](https://arxiv.org/abs/2212.10553) * [AutoAugment](https://arxiv.org/abs/1805.09501) * [RandAugment](https://arxiv.org/abs/1909.13719)

Distillation

* Soft distillation * Hard distillation

## Maintainers This code is developed by Sachin, and is now maintained by Sachin, Maxwell Horton, Mohammad Sekhavat, and Yanzi Jin. ### Previous Maintainers * Farzad ## Research effort at Apple using CVNets Below is the list of publications from Apple that uses CVNets: * [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, ICLR'22](https://arxiv.org/abs/2110.02178) * [CVNets: High performance library for Computer Vision, ACM MM'22](https://arxiv.org/abs/2206.02002) * [Separable Self-attention for Mobile Vision Transformers (MobileViTv2)](https://arxiv.org/abs/2206.02680) * [RangeAugment: Efficient Online Augmentation with Range Learning](https://arxiv.org/abs/2212.10553) * [Bytes Are All You Need: Transformers Operating Directly on File Bytes](https://arxiv.org/abs/2306.00238) ## Contributing to CVNets We welcome PRs from the community! You can find information about contributing to CVNets in our [contributing](CONTRIBUTING.md) document. Please remember to follow our [Code of Conduct](CODE_OF_CONDUCT.md). ## License For license details, see [LICENSE](LICENSE). ## Citation If you find our work useful, please cite the following paper: ``` @inproceedings{mehta2022mobilevit, title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer}, author={Sachin Mehta and Mohammad Rastegari}, booktitle={International Conference on Learning Representations}, year={2022} } @inproceedings{mehta2022cvnets, author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, title = {CVNets: High Performance Library for Computer Vision}, year = {2022}, booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, series = {MM '22} } ```