# efficientvit **Repository Path**: sun-fuxiang/efficientvit ## Basic Information - **Project Name**: efficientvit - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-08-11 - **Last Updated**: 2024-08-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction ([paper](https://arxiv.org/abs/2205.14756), [poster](assets/files/efficientvit_poster.pdf)) ## News **If you are interested in getting updates, please join our mailing list [here](https://forms.gle/Z6DNkRidJ1ouxmUk9).** - [2024/07/10] EfficientViT is used as the backbone in [Grounding DINO 1.5 Edge](https://arxiv.org/pdf/2405.10300) for efficient open-set object detection. - [2024/07/10] EfficientViT-SAM is used in [MedficientSAM](https://github.com/hieplpvip/medficientsam), the 1st place model in [CVPR 2024 Segment Anything In Medical Images On Laptop Challenge](https://www.codabench.org/competitions/1847/). - [2024/07/10] An FPGA-based accelerator for EfficientViT: [link](https://arxiv.org/abs/2403.20230). - [2024/04/23] We released the training code of EfficientViT-SAM. - [2024/04/06] EfficientViT-SAM is accepted by [eLVM@CVPR'24](https://sites.google.com/view/elvm/home?authuser=0). - [2024/03/19] Online demo of EfficientViT-SAM is available: [https://evitsam.hanlab.ai/](https://evitsam.hanlab.ai/). - [2024/02/07] We released [EfficientViT-SAM](https://arxiv.org/abs/2402.05008), the first accelerated SAM model that matches/outperforms SAM-ViT-H's zero-shot performance, delivering the SOTA performance-efficiency trade-off. - [2023/11/20] EfficientViT is available in the [NVIDIA Jetson Generative AI Lab](https://www.jetson-ai-lab.com/tutorial_efficientvit.html). - [2023/09/12] EfficientViT is highlighted by [MIT home page](https://www.mit.edu/archive/spotlight/efficient-computer-vision/) and [MIT News](https://news.mit.edu/2023/ai-model-high-resolution-computer-vision-0912). - [2023/07/18] EfficientViT is accepted by ICCV 2023. ## About EfficientViT Models EfficientViT is a new family of ViT models for efficient high-resolution dense prediction vision tasks. The core building block of EfficientViT is a lightweight, multi-scale linear attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations, making EfficientViT TensorRT-friendly and suitable for GPU deployment. ## Third-Party Implementation/Integration - [NVIDIA Jetson Generative AI Lab](https://www.jetson-ai-lab.com/tutorial_efficientvit.html) - [timm](https://github.com/huggingface/pytorch-image-models): [link](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_mit.py) - [X-AnyLabeling](https://github.com/CVHub520/X-AnyLabeling): [link](https://github.com/CVHub520/X-AnyLabeling/blob/main/anylabeling/services/auto_labeling/efficientvit_sam.py) - [Grounding DINO 1.5 Edge](https://github.com/IDEA-Research/Grounding-DINO-1.5-API): [link](https://arxiv.org/pdf/2405.10300) ## Getting Started ```bash conda create -n efficientvit python=3.10 conda activate efficientvit conda install -c conda-forge mpi4py openmpi pip install -r requirements.txt ``` ## EfficientViT Applications ### [Segment Anything](applications/sam.md) - [Datasets](applications/sam.md#datasets) - [Pretrained Models](applications/sam.md#pretrained-models) - [Use in Pytorch](applications/sam.md#usage) - [Evaluation](applications/sam.md#evaluation) - [Visualization](applications/sam.md#visualization) - [Web Demo](demo/sam/README.md) - [Deployment using ONNX and TensorRT](applications/sam.md#deployment) - [Training](applications/sam.md#training)

| Model | Resolution | COCO mAP | LVIS mAP | Params | MACs | Jetson Orin Latency (bs1) | A100 Throughput (bs16) | Checkpoint | |----------------------|:----------:|:----------:|:---------:|:------------:|:---------:|:---------:|:------------:|:------------:| | EfficientViT-SAM-L0 | 512x512 | 45.7 | 41.8 | 34.8M | 35G | 8.2ms | 762 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l0.pt) | | EfficientViT-SAM-L1 | 512x512 | 46.2 | 42.1 | 47.7M | 49G | 10.2ms | 638 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l1.pt) | | EfficientViT-SAM-L2 | 512x512 | 46.6 | 42.7 | 61.3M | 69G | 12.9ms | 538 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt) | | EfficientViT-SAM-XL0 | 1024x1024 | 47.5 | 43.9 | 117.0M | 185G | 22.5ms | 278 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/xl0.pt) | | EfficientViT-SAM-XL1 | 1024x1024 | 47.8 | 44.4 | 203.3M | 322G | 37.2ms | 182 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/xl1.pt) |

Table1: Summary of All EfficientViT-SAM Variants. COCO mAP and LVIS mAP are measured using ViTDet's predicted bounding boxes as the prompt. End-to-end Jetson Orin latency and A100 throughput are measured with TensorRT and fp16.

### [Image Classification](applications/cls.md) - [Datasets](applications/cls.md#datasets) - [Pretrained Models](applications/cls.md#pretrained-models) - [Use in Pytorch](applications/cls.md#usage) - [Evaluation](applications/cls.md#evaluation) - [Deployment](applications/cls.md#export) - [Training](applications/cls.md#training)

### [Semantic Segmentation](applications/seg.md) - [Datasets](applications/seg.md#datasets) - [Pretrained Models](applications/seg.md#pretrained-models) - [Use in Pytorch](applications/seg.md#usage) - [Evaluation](applications/seg.md#evaluation) - [Visualization](applications/seg.md#visualization) - [Deployment](applications/seg.md#export) ![demo](assets/demo/cityscapes_l1.gif) ## Demo - GazeSAM: Combining EfficientViT-SAM with Gaze Estimation ![GazeSAM demo](demo/gazesam/assets/gazesam_demo.gif) ## Contact Han Cai: ## TODO - [x] ImageNet Pretrained models - [x] Segmentation Pretrained models - [x] ImageNet training code - [x] EfficientViT L series, designed for cloud - [x] EfficientViT for segment anything - [ ] EfficientViT for image generation - [ ] EfficientViT for CLIP - [ ] EfficientViT for super-resolution - [ ] Segmentation training code ## Citation If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper: ``` @article{cai2022efficientvit, title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition}, author={Cai, Han and Gan, Chuang and Han, Song}, journal={arXiv preprint arXiv:2205.14756}, year={2022} } ```