From f41b9803b6bc0cf1eecf699588eb591564d17717 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Fri, 14 Mar 2025 11:45:39 +0800 Subject: [PATCH 1/3] unify model readme format 1st part - cv/classification Signed-off-by: mingjiang.li --- .../tacotron2/pytorch/README.md | 14 ++-- cv/classification/acmix/pytorch/README.md | 51 ++++++++----- cv/classification/acnet/pytorch/README.md | 54 ++++++++------ cv/classification/alexnet/pytorch/README.md | 54 ++++++++------ .../alexnet/tensorflow/README.md | 45 +++++++----- cv/classification/byol/pytorch/README.md | 71 ++++++++++--------- cv/classification/cbam/pytorch/README.md | 54 ++++++++++---- cv/classification/convnext/pytorch/README.md | 13 ++-- .../cspdarknet53/pytorch/README.md | 6 +- .../densenet/paddlepaddle/README.md | 10 +-- cv/classification/densenet/pytorch/README.md | 6 +- cv/classification/dpn107/pytorch/README.md | 6 +- cv/classification/dpn92/pytorch/README.md | 4 +- .../eca_mobilenet_v2/pytorch/README.md | 6 +- .../eca_resnet152/pytorch/README.md | 6 +- .../efficientnet_b0/paddlepaddle/README.md | 10 +-- .../efficientnet_b4/pytorch/README.md | 6 +- cv/classification/fasternet/pytorch/README.md | 10 +-- .../googlenet/paddlepaddle/README.md | 2 +- cv/classification/googlenet/pytorch/README.md | 6 +- .../inceptionv3/mindspore/README.md | 10 +-- .../inceptionv3/pytorch/README.md | 8 ++- .../inceptionv3/tensorflow/README.md | 10 +-- .../inceptionv4/pytorch/README.md | 6 +- .../internimage/pytorch/README.md | 6 +- cv/classification/lenet/pytorch/README.md | 6 +- .../mobilenetv2/pytorch/README.md | 6 +- .../mobilenetv3/mindspore/README.md | 10 +-- .../mobilenetv3/paddlepaddle/README.md | 4 +- .../mobilenetv3/pytorch/README.md | 8 ++- .../paddlepaddle/README.md | 10 +-- cv/classification/mobileone/pytorch/README.md | 10 +-- cv/classification/mocov2/pytorch/README.md | 6 +- .../pp-lcnet/paddlepaddle/README.md | 10 +-- cv/classification/repmlp/pytorch/README.md | 10 +-- .../repvgg/paddlepaddle/README.md | 2 +- cv/classification/repvgg/pytorch/README.md | 2 +- cv/classification/repvit/pytorch/README.md | 10 +-- .../res2net50_14w_8s/paddlepaddle/README.md | 10 +-- .../resnest101/pytorch/README.md | 6 +- cv/classification/resnest14/pytorch/README.md | 6 +- .../resnest269/pytorch/README.md | 6 +- .../resnest50/paddlepaddle/README.md | 2 +- cv/classification/resnest50/pytorch/README.md | 6 +- cv/classification/resnet101/pytorch/README.md | 6 +- cv/classification/resnet152/pytorch/README.md | 6 +- cv/classification/resnet18/pytorch/README.md | 6 +- .../resnet50/paddlepaddle/README.md | 4 +- cv/classification/resnet50/pytorch/README.md | 8 +-- .../resnet50/tensorflow/README.md | 2 +- .../resnext101_32x8d/pytorch/README.md | 6 +- .../resnext50_32x4d/mindspore/README.md | 10 +-- .../resnext50_32x4d/pytorch/README.md | 6 +- .../se_resnet50_vd/paddlepaddle/README.md | 10 +-- cv/classification/seresnext/pytorch/README.md | 6 +- .../shufflenetv2/paddlepaddle/README.md | 10 +-- .../shufflenetv2/pytorch/README.md | 6 +- .../squeezenet/pytorch/README.md | 6 +- .../swin_transformer/paddlepaddle/README.md | 2 +- .../swin_transformer/pytorch/README.md | 7 +- cv/classification/vgg/paddlepaddle/README.md | 4 +- cv/classification/vgg/pytorch/README.md | 8 ++- cv/classification/vgg/tensorflow/README.md | 10 +-- cv/classification/wavemlp/pytorch/README.md | 9 +-- .../wide_resnet101_2/pytorch/README.md | 6 +- .../xception/paddlepaddle/README.md | 10 +-- cv/classification/xception/pytorch/README.md | 6 +- 67 files changed, 441 insertions(+), 317 deletions(-) diff --git a/audio/speech_synthesis/tacotron2/pytorch/README.md b/audio/speech_synthesis/tacotron2/pytorch/README.md index f7f113ce6..b5bc8ef8f 100644 --- a/audio/speech_synthesis/tacotron2/pytorch/README.md +++ b/audio/speech_synthesis/tacotron2/pytorch/README.md @@ -2,14 +2,12 @@ ## Model Description -This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is -composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale -spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those -spectrograms. Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally -recorded speech. To validate our design choices, we present ablation studies of key components of our system and -evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and F_0 features. -We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of -the WaveNet architecture. +Tacotron2 is an end-to-end neural text-to-speech synthesis system that directly converts text into natural-sounding +speech. It combines a sequence-to-sequence network that generates mel-spectrograms from text with a WaveNet-based +vocoder to produce high-quality audio. The model achieves near-human speech quality with a Mean Opinion Score (MOS) of +4.53, rivaling professional recordings. Its architecture simplifies traditional speech synthesis pipelines by using +learned acoustic representations, enabling more natural prosody and articulation while maintaining computational +efficiency. ## Model Preparation diff --git a/cv/classification/acmix/pytorch/README.md b/cv/classification/acmix/pytorch/README.md index a64b45247..724ed82fc 100644 --- a/cv/classification/acmix/pytorch/README.md +++ b/cv/classification/acmix/pytorch/README.md @@ -1,18 +1,20 @@ # ACmix -## Model description +## Model Description -Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. Specifically, we first show that a traditional convolution with kernel size k x k can be decomposed into k^2 individual 1x1 convolutions, followed by shift and summation operations. Then, we interpret the projections of queries, keys, and values in self-attention module as multiple 1x1 convolutions, followed by the computation of attention weights and aggregation of the values. Therefore, the first stage of both two modules comprises the similar operation. More importantly, the first stage contributes a dominant computation complexity (square of the channel size) comparing to the second stage. This observation naturally leads to an elegant integration of these two seemingly distinct paradigms, i.e., a mixed model that enjoys the benefit of both self-Attention and Convolution (ACmix), while having minimum computational overhead compared to the pure convolution or self-attention counterpart. Extensive experiments show that our model achieves consistently improved results over competitive baselines on image recognition and downstream tasks. Code and pre-trained models will be released at https://github.com/LeapLabTHU/ACmix and https://gitee.com/mindspore/models. +ACmix is an innovative deep learning model that unifies convolution and self-attention mechanisms by revealing their +shared computational foundation. It demonstrates that both operations can be decomposed into 1x1 convolutions followed +by different aggregation strategies. This insight enables ACmix to efficiently combine the benefits of both paradigms - +the local feature extraction of convolutions and the global context modeling of self-attention. The model achieves +improved performance on image recognition tasks with minimal computational overhead compared to pure convolution or +attention-based approaches. -## Step 1: Installing packages -```bash -git clone https://github.com/LeapLabTHU/ACmix.git -pip install termcolor==1.1.0 yacs==0.1.8 timm==0.4.5 -cd ACmix/Swin-Transformer -git checkout 81dddb6dff98f5e238a7fb6ab174e256489c07fa -``` +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,21 +32,32 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies -### Swin-S + ACmix on ImageNet using 8 cards: ```bash -# fix --local-rank for torch 2.x +git clone https://github.com/LeapLabTHU/ACmix.git +pip install termcolor==1.1.0 yacs==0.1.8 timm==0.4.5 +cd ACmix/Swin-Transformer +git checkout 81dddb6dff98f5e238a7fb6ab174e256489c07fa +``` + +## Model Training + +```bash +# Swin-S + ACmix on ImageNet using 8 cards + +## fix --local-rank for torch 2.x sed -i 's/--local_rank/--local-rank/g' main.py + python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg configs/acmix_swin_small_patch4_window7_224.yaml --data-path /path/to/imagenet --batch-size 128 ``` -## Results on BI-V100 +## Model Results -| card | batch_size | Single Card | 8 Cards | -|:-----|------------|------------:|:-------:| -| BI | 128 | 63.59 | 502.22 | +| Model | GPU | batch_size | Single Card | 8 Cards | +|-------|---------|------------|-------------|---------| +| ACmix | BI-V100 | 128 | 63.59 | 502.22 | +## References -## Reference -[acmix](https://github.com/leaplabthu/acmix) \ No newline at end of file +- [acmix](https://github.com/leaplabthu/acmix) diff --git a/cv/classification/acnet/pytorch/README.md b/cv/classification/acnet/pytorch/README.md index 3a6b90535..65a5e8037 100755 --- a/cv/classification/acnet/pytorch/README.md +++ b/cv/classification/acnet/pytorch/README.md @@ -1,17 +1,20 @@ # ACNet -## Model description -As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels. +## Model Description -## Step 1: Installation +ACNet (Asymmetric Convolutional Network) is an innovative CNN architecture that enhances model performance through +Asymmetric Convolution Blocks (ACBs). These blocks use 1D asymmetric convolutions to strengthen standard square +convolution kernels, improving robustness to rotational distortions and reinforcing central kernel structures. ACNet can +be seamlessly integrated into existing architectures, boosting accuracy without additional inference costs. After +training, ACNet converts back to the original architecture, maintaining efficiency. It demonstrates consistent +performance improvements across various models on datasets like CIFAR and ImageNet. -```bash -git clone https://github.com/DingXiaoH/ACNet.git -cd ACNet -git checkout 748fb0c734b41c48eacaacf7fc5e851e33a63ce8 -``` +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -29,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies + +```bash +git clone https://github.com/DingXiaoH/ACNet.git +cd ACNet/ +git checkout 748fb0c734b41c48eacaacf7fc5e851e33a63ce8 +``` + +## Model Training ```bash ln -s /path/to/imagenet imagenet_data @@ -37,27 +48,26 @@ rm -rf acnet/acb.py rm -rf utils/misc.py mv ../acb.py acnet/ mv ../misc.py utils/ + # fix --local-rank for torch 2.x sed -i 's/--local_rank/--local-rank/g' acnet/do_acnet.py export PYTHONPATH=$PYTHONPATH:. -``` -### One single GPU -```bash +# One single GPU export CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=1 acnet/do_acnet.py -a sres18 -b acb -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 acnet/do_acnet.py -a sres18 -b acb ``` -## Results +## Model Results + +| Model | GPU | ACC | FPS | +|-------|------------|-----------------------------|----------| +| ACNet | BI-V100 ×8 | top1=71.27000,top5=90.00800 | 5.78it/s | -| GPUS | ACC | FPS | -| ----------| ------------------------------|---------| -| BI V100×8 | top1=71.27000,top5=90.00800 | 5.78it/s| +## References -## Reference -- [ACNet](https://github.com/DingXiaoH/ACNet/tree/748fb0c734b41c48eacaacf7fc5e851e33a63ce8) \ No newline at end of file +- [ACNet](https://github.com/DingXiaoH/ACNet/tree/748fb0c734b41c48eacaacf7fc5e851e33a63ce8) diff --git a/cv/classification/alexnet/pytorch/README.md b/cv/classification/alexnet/pytorch/README.md index d6c665b09..699735310 100644 --- a/cv/classification/alexnet/pytorch/README.md +++ b/cv/classification/alexnet/pytorch/README.md @@ -1,14 +1,21 @@ # AlexNet -## Model description -AlexNet is a classic convolutional neural network architecture. It consists of convolutions, max pooling and dense layers as the basic building blocks. -## Step 1: Installing +## Model Description -```bash -pip3 install torch -pip3 install torchvision -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +AlexNet is a groundbreaking deep convolutional neural network that revolutionized computer vision. It introduced key +innovations like ReLU activations, dropout regularization, and GPU acceleration. With its 8-layer architecture featuring +5 convolutional and 3 fully-connected layers, AlexNet achieved record-breaking performance on ImageNet in 2012. Its +success popularized deep learning and established CNNs as the dominant approach for image recognition. AlexNet's design +principles continue to influence modern neural network architectures in computer vision applications.AlexNet is a +classic convolutional neural network architecture. It consists of convolutions, max pooling and dense layers as the +basic building blocks. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,28 +33,33 @@ imagenet └── val_list.txt ``` -## Step 2: Training +### Install Dependencies ```bash -cd start_scripts +pip3 install torch +pip3 install torchvision ``` -### One single GPU +## Model Training + ```bash -bash train_alexnet_torch.sh --data-path /path/to/imagenet +cd start_scripts ``` -### One single GPU (AMP) + ```bash +# One single GPU +bash train_alexnet_torch.sh --data-path /path/to/imagenet + +# One single GPU (AMP) bash train_alexnet_amp_torch.sh --data-path /path/to/imagenet -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine bash train_alexnet_dist_torch.sh --data-path /path/to/imagenet -``` -### 8 GPUs on one machine (AMP) -```bash + +# 8 GPUs on one machine (AMP) bash train_alexnet_dist_amp_torch.sh --data-path /path/to/imagenet ``` -## Reference -https://github.com/pytorch/vision/blob/main/torchvision +## References + +- [vision](https://github.com/pytorch/vision/blob/main/torchvision) diff --git a/cv/classification/alexnet/tensorflow/README.md b/cv/classification/alexnet/tensorflow/README.md index f462a643b..89d391d2b 100644 --- a/cv/classification/alexnet/tensorflow/README.md +++ b/cv/classification/alexnet/tensorflow/README.md @@ -1,18 +1,21 @@ # AlexNet -AlexNet is a groundbreaking convolutional neural network (CNN) introduced in 2012. It revolutionized computer vision by demonstrating the power of deep learning in image classification. With eight layers, including five convolutional and three fully connected layers, it achieved remarkable results on the ImageNet challenge with a top-1 accuracy of around 57.1%. AlexNet's success paved the way for widespread adoption of deep neural networks in computer vision tasks. +AlexNet is a groundbreaking deep convolutional neural network that revolutionized computer vision. It introduced key +innovations like ReLU activations, dropout regularization, and GPU acceleration. With its 8-layer architecture featuring +5 convolutional and 3 fully-connected layers, AlexNet achieved record-breaking performance on ImageNet in 2012. Its +success popularized deep learning and established CNNs as the dominant approach for image recognition. AlexNet's design +principles continue to influence modern neural network architectures in computer vision applications. -## Installation +## Model Preparation -```bash -pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger -``` - -## Preparing datasets +### Prepare Resources You can get ImageNet 1K TFrecords ILSVRC2012 dataset directly from below links: -- [ImageNet 1K TFrecords ILSVRC2012 - part 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) -- [ImageNet 1K TFrecords ILSVRC2012 - part 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) + +- [ImageNet 1K TFrecords ILSVRC2012 - part + 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) +- [ImageNet 1K TFrecords ILSVRC2012 - part + 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) The ImageNet TFrecords dataset path structure should look like: @@ -26,9 +29,15 @@ imagenet_tfrecord └── validation-00127-of-00128 ``` -## Training +### Install Dependencies + +```bash +pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger +``` + +## Model Training -**Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link.** +Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link. ```bash # 1 GPU @@ -38,10 +47,12 @@ bash run_train_alexnet_imagenet.sh bash run_train_alexnet_multigpu_imagenet.sh ``` -## Results -|GPUs|ACC|FPS| -|:---:|:---:|:---:| -|BI-v100 x8|Accuracy @1 = 0.5633 Accuracy @ 5 = 0.7964|1833.9 images/sec| +## Model Results + +| Model | GPU | ACC | FPS | +|---------|------------|--------------------------------------------|-------------------| +| AlexNet | BI-v100 x8 | Accuracy @1 = 0.5633 Accuracy @ 5 = 0.7964 | 1833.9 images/sec | + +## References -## Reference -- [TensorFlow Models](https://github.com/tensorflow/models) +- [TensorFlow Models](https://github.com/tensorflow/models) diff --git a/cv/classification/byol/pytorch/README.md b/cv/classification/byol/pytorch/README.md index c96405436..3b53a8d0b 100644 --- a/cv/classification/byol/pytorch/README.md +++ b/cv/classification/byol/pytorch/README.md @@ -1,13 +1,39 @@ # BYOL -> [Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) +## Model Description -## Model description +BYOL (Bootstrap Your Own Latent) is a self-supervised learning method that learns visual representations without +negative samples. It uses two neural networks - an online network and a target network - that learn from each other +through contrasting augmented views of the same image. BYOL's unique approach eliminates the need for negative pairs, +achieving state-of-the-art performance in unsupervised learning. It's particularly effective for pre-training models on +large datasets before fine-tuning for specific tasks. -**B**ootstrap **Y**our **O**wn **L**atent (BYOL) is a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. +## Model Preparation +### Prepare Resources -## Step 1: Installation +Prepare your dataset according to the +[docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). Sign up and login +in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole +ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash # Install libGL @@ -34,29 +60,7 @@ sed -i 's/python /python3 /g' tools/dist_train.sh python3 setup.py install ``` -## Step 2: Preparing datasets - -Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. -Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt -``` - -## Step 3: Training +## Model Training ```bash mkdir -p data @@ -70,12 +74,13 @@ model = dict( bash tools/dist_train.sh configs/byol/benchmarks/resnet50_8xb512-linear-coslr-90e_in1k.py 8 ``` -## Results -| GPUs | FPS | TOP1 Accuracy | -| ------------ | --------- | -------------- | -| BI-V100 x8 | 5408 | 71.80 | +## Model Results +| Model | GPU | FPS | TOP1 Accuracy | +|-------|------------|------|---------------| +| BYOL | BI-V100 x8 | 5408 | 71.80 | -## Reference -- [mmpretrain](https://github.com/open-mmlab/mmpretrain/) +## References +- [Paper](https://arxiv.org/abs/2006.07733) +- [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/cbam/pytorch/README.md b/cv/classification/cbam/pytorch/README.md index d6af1ceee..1534fa753 100644 --- a/cv/classification/cbam/pytorch/README.md +++ b/cv/classification/cbam/pytorch/README.md @@ -1,22 +1,50 @@ # CBAM -## Model description -Official PyTorch code for "[CBAM: Convolutional Block Attention Module (ECCV2018)](http://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.html)" +## Model Description +CBAM (Convolutional Block Attention Module) is an attention mechanism that enhances CNN feature representations. It +sequentially applies channel and spatial attention to refine feature maps, improving model performance without +significant computational overhead. CBAM helps networks focus on important features while suppressing irrelevant ones, +leading to better object recognition and localization. The module is lightweight and can be easily integrated into +existing CNN architectures, making it a versatile tool for improving various computer vision tasks. -## Step 1: Installing +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash pip3 install torch pip3 install torchvision ``` -## Step 2: Training +## Model Training -ResNet50 based examples are included. Example scripts are included under ```./scripts/``` directory. ImageNet data should be included under ```./data/ImageNet/``` with foler named ```train``` and ```val```. -``` +ResNet50 based examples are included. Example scripts are included under ```./scripts/``` directory. + +```bash # To train with CBAM (ResNet50 backbone) # For 8 GPUs python3 train_imagenet.py --ngpu 8 --workers 20 --arch resnet --depth 50 --epochs 100 --batch-size 256 --lr 0.1 --att-type CBAM --prefix RESNET50_IMAGENET_CBAM ./data/ImageNet @@ -24,13 +52,13 @@ python3 train_imagenet.py --ngpu 8 --workers 20 --arch resnet --depth 50 --epoch python3 train_imagenet.py --ngpu 1 --workers 20 --arch resnet --depth 50 --epochs 100 --batch-size 64 --lr 0.1 --att-type CBAM --prefix RESNET50_IMAGENET_CBAM ./data/ImageNet ``` -## Result +## Model Results -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Prec@1 76.216 fps:83.11 | -| 1 cards | fps:2634.37 | +| Model | GPU | FP32 | +|-------|------------|---------------------------| +| CBAM | BI-V100 x8 | Prec@1 76.216 fps:83.11 | +| CBAM | BI-V100 x1 | fps:2634.37 | -## Reference +## References -- [MXNet implementation of CBAM with several modifications](https://github.com/bruinxiong/Modified-CBAMnet.mxnet) by [bruinxiong](https://github.com/bruinxiong) +- [Modified-CBAMnet.mxnet](https://github.com/bruinxiong/Modified-CBAMnet.mxnet) by diff --git a/cv/classification/convnext/pytorch/README.md b/cv/classification/convnext/pytorch/README.md index 0f9b1553d..8795a653b 100644 --- a/cv/classification/convnext/pytorch/README.md +++ b/cv/classification/convnext/pytorch/README.md @@ -1,9 +1,11 @@ # ConvNext -## Model description +## Model Description + The ConvNeXT model was proposed in [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. ## Step 1: Installing + ```bash pip install timm==0.4.12 tensorboardX six torch torchvision ``` @@ -26,8 +28,10 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training + ### Multiple GPUs on one machine + ```bash git clone https://github.com/facebookresearch/ConvNeXt.git cd /path/to/ConvNeXt @@ -44,5 +48,6 @@ python3 -m torch.distributed.launch --nproc_per_node=8 main.py \ --output_dir /path/to/save_results ``` -## Reference -https://github.com/facebookresearch/ConvNeXt +## References + +- [ConvNeXt](https://github.com/facebookresearch/ConvNeXt) diff --git a/cv/classification/cspdarknet53/pytorch/README.md b/cv/classification/cspdarknet53/pytorch/README.md index 7d5a5938c..96bbe1862 100644 --- a/cv/classification/cspdarknet53/pytorch/README.md +++ b/cv/classification/cspdarknet53/pytorch/README.md @@ -1,6 +1,6 @@ # CspDarknet53 -## Model description +## Model Description This is an implementation of CSPDarknet53 in pytorch. @@ -10,7 +10,7 @@ This is an implementation of CSPDarknet53 in pytorch. pip3 install torchsummary ``` -## Step 2: Training +## Model Training ### One single GPU @@ -32,6 +32,6 @@ python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --batc | 8 cards | Acc@1 76.644 fps 1049 | | 1 card | fps 148 | -## Reference +## References https://github.com/WongKinYiu/CrossStagePartialNetworks diff --git a/cv/classification/densenet/paddlepaddle/README.md b/cv/classification/densenet/paddlepaddle/README.md index ea27049f8..38035cfac 100644 --- a/cv/classification/densenet/paddlepaddle/README.md +++ b/cv/classification/densenet/paddlepaddle/README.md @@ -1,10 +1,12 @@ # DenseNet -## Model description +## Model Description A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git @@ -55,12 +57,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results | GPUs | Top1 | Top5 |ips | |-------------|-------------|----------------|----------------| | BI-V100 x 4 | 0.757 | 0.925 | 171 | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/densenet/pytorch/README.md b/cv/classification/densenet/pytorch/README.md index ed15508c3..e25fdf825 100755 --- a/cv/classification/densenet/pytorch/README.md +++ b/cv/classification/densenet/pytorch/README.md @@ -1,6 +1,6 @@ # DenseNet -## Model description +## Model Description A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. @@ -28,7 +28,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### One single GPU @@ -42,6 +42,6 @@ python3 train.py --data-path /path/to/imagenet --model densenet201 --batch-size python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 ``` -## Reference +## References [densenet](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) diff --git a/cv/classification/dpn107/pytorch/README.md b/cv/classification/dpn107/pytorch/README.md index b4d0964f4..7114963eb 100644 --- a/cv/classification/dpn107/pytorch/README.md +++ b/cv/classification/dpn107/pytorch/README.md @@ -1,5 +1,5 @@ # DPN107 -## Model description +## Model Description A Dual Path Network (DPN) is a convolutional neural network which presents a new topology of connection paths internally.The intuition is that ResNets enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet :beers: Done! -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -37,5 +37,5 @@ bash train_dpn107_amp_dist.sh ``` :beers: Done! -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/dpn92/pytorch/README.md b/cv/classification/dpn92/pytorch/README.md index bd40d63bb..705ff175d 100644 --- a/cv/classification/dpn92/pytorch/README.md +++ b/cv/classification/dpn92/pytorch/README.md @@ -1,5 +1,5 @@ # DPN92 -## Model description +## Model Description A Dual Path Network (DPN) is a convolutional neural network which presents a new topology of connection paths internally. The intuition is that ResNets enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures. ## Step 1: Installing @@ -35,5 +35,5 @@ bash train_dpn92_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/eca_mobilenet_v2/pytorch/README.md b/cv/classification/eca_mobilenet_v2/pytorch/README.md index 05f2dcd12..8f3224512 100644 --- a/cv/classification/eca_mobilenet_v2/pytorch/README.md +++ b/cv/classification/eca_mobilenet_v2/pytorch/README.md @@ -1,6 +1,6 @@ # ECA MobileNet V2 -## Model description +## Model Description An ECA-Net is a type of convolutional neural network that utilises an Efficient Channel Attention module. @@ -28,7 +28,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) @@ -38,6 +38,6 @@ Set data path by `export DATA_PATH=/path/to/imagenet`. The following command use bash train_eca_mobilenet_v2_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/eca_resnet152/pytorch/README.md b/cv/classification/eca_resnet152/pytorch/README.md index 53f461652..d930f6ecb 100644 --- a/cv/classification/eca_resnet152/pytorch/README.md +++ b/cv/classification/eca_resnet152/pytorch/README.md @@ -1,6 +1,6 @@ # ECA ResNet152 -## Model description +## Model Description An ECA-Net is a type of convolutional neural network that utilises an Efficient Channel Attention module. @@ -28,7 +28,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) @@ -38,6 +38,6 @@ Set data path by `export DATA_PATH=/path/to/imagenet`. The following command use bash train_eca_resnet152_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/efficientnet_b0/paddlepaddle/README.md b/cv/classification/efficientnet_b0/paddlepaddle/README.md index aa41572f4..4ebecc744 100644 --- a/cv/classification/efficientnet_b0/paddlepaddle/README.md +++ b/cv/classification/efficientnet_b0/paddlepaddle/README.md @@ -1,10 +1,12 @@ # EfficientNetB0 -## Model description +## Model Description This model is the B0 version of the EfficientNet series, whitch can be used for image classification tasks, such as cat and dog classification, flower classification, and so on. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git @@ -67,11 +69,11 @@ export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c ppcls/configs/ImageNet/EfficientNet/EfficientNetB0.yaml ``` -## Results +## Model Results | GPUs| ips | Top1 | Top5 | | ------ | ---------- |--------------|--------------| | BI-V100 x8 | 1065.28 | 0.7683 | 0.9316 | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) \ No newline at end of file diff --git a/cv/classification/efficientnet_b4/pytorch/README.md b/cv/classification/efficientnet_b4/pytorch/README.md index aa4c4c9cf..6564fefae 100755 --- a/cv/classification/efficientnet_b4/pytorch/README.md +++ b/cv/classification/efficientnet_b4/pytorch/README.md @@ -1,6 +1,6 @@ # EfficientNetB4 -## Model description +## Model Description EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a compound coefficient. @@ -28,7 +28,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### One single GPU @@ -42,6 +42,6 @@ python3 train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-s python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 ``` -## Reference +## References diff --git a/cv/classification/fasternet/pytorch/README.md b/cv/classification/fasternet/pytorch/README.md index cfa2c15de..a81493a4b 100644 --- a/cv/classification/fasternet/pytorch/README.md +++ b/cv/classification/fasternet/pytorch/README.md @@ -1,6 +1,6 @@ # FasterNet -## Model description +## Model Description This is the official Pytorch/PytorchLightning implementation of the paper:
> [**Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**](https://arxiv.org/abs/2303.03667) @@ -10,7 +10,9 @@ This is the official Pytorch/PytorchLightning implementation of the paper:
We propose a simple yet fast and effective partial convolution (**PConv**), as well as a latency-efficient family of architectures called **FasterNet**. -## Step 1: Installation +## Model Preparation + +### Install Dependencies Clone this repo and install the required packages: ```bash pip install -r requirements.txt @@ -66,12 +68,12 @@ python3 train_test.py -g 0 --num_nodes 1 -n 4 -b 512 -e 2000 \ To train other FasterNet variants, `--cfg` need to be changed. You may also want to change the training batch size `-b`. -## Results +## Model Results | GPUs | FP32 | | ----------- | ------------------------------------ | | BI-V100 x8 | test_acc1 71.832 val_acc1 71.722 | -## Reference +## References [FasterNet](https://github.com/JierunChen/FasterNet/tree/e8fba4465ae912359c9f661a72b14e39347e4954) diff --git a/cv/classification/googlenet/paddlepaddle/README.md b/cv/classification/googlenet/paddlepaddle/README.md index b1a78c558..06bbb4ee4 100644 --- a/cv/classification/googlenet/paddlepaddle/README.md +++ b/cv/classification/googlenet/paddlepaddle/README.md @@ -1,6 +1,6 @@ # GoogLeNet -## Model description +## Model Description GoogLeNet is a type of convolutional neural network based on the Inception architecture. It utilises Inception modules, which allow the network to choose between multiple convolutional filter sizes in each block. An Inception network stacks these modules on top of each other, with occasional max-pooling layers with stride 2 to halve the resolution of the grid. ## Step 1: Installing diff --git a/cv/classification/googlenet/pytorch/README.md b/cv/classification/googlenet/pytorch/README.md index ab5b3862a..2ae336b08 100755 --- a/cv/classification/googlenet/pytorch/README.md +++ b/cv/classification/googlenet/pytorch/README.md @@ -1,6 +1,6 @@ # GoogLeNet -## Model description +## Model Description GoogLeNet is a type of convolutional neural network based on the Inception architecture. It utilises Inception modules, which allow the network to choose between multiple convolutional filter sizes in each block. An Inception network stacks these modules on top of each other, with occasional max-pooling layers with stride 2 to halve the resolution of the grid. ## Step 1: Preparing @@ -23,7 +23,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### One single GPU ```bash python3 train.py --data-path /path/to/imagenet --model googlenet --batch-size 512 @@ -33,5 +33,5 @@ python3 train.py --data-path /path/to/imagenet --model googlenet --batch-size 51 python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model googlenet --batch-size 512 --wd 0.000001 ``` -## Reference +## References https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py diff --git a/cv/classification/inceptionv3/mindspore/README.md b/cv/classification/inceptionv3/mindspore/README.md index 2af9d8f88..bb6ede650 100644 --- a/cv/classification/inceptionv3/mindspore/README.md +++ b/cv/classification/inceptionv3/mindspore/README.md @@ -1,9 +1,11 @@ # InceptionV3 -## Model description +## Model Description InceptionV3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifier to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash yum install -y mesa-libGL @@ -61,11 +63,11 @@ DEVICE_ID=0 bash run_eval_gpu.sh $DEVICE_ID /path/to/imagenet/val/ /path/to/checkpoint ``` -## Results +## Model Results | GPUS | ACC (epoch 108) | FPS | | ----------| --------------------------| ----- | | BI V100×4 | 'Loss': 3.9033, 'Top1-Acc': 0.4847, 'Top5-Acc': 0.7405 | 447.2 | -## Reference +## References - [MindSpore Models](https://gitee.com/mindspore/models/tree/master/official/) \ No newline at end of file diff --git a/cv/classification/inceptionv3/pytorch/README.md b/cv/classification/inceptionv3/pytorch/README.md index 731ae8113..fae45d54a 100644 --- a/cv/classification/inceptionv3/pytorch/README.md +++ b/cv/classification/inceptionv3/pytorch/README.md @@ -1,9 +1,11 @@ # InceptionV3 -## Model description +## Model Description Inception-v3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifer to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install -r requirements.txt @@ -40,5 +42,5 @@ export DATA_PATH=/path/to/imagenet bash train_inception_v3_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/inceptionv3/tensorflow/README.md b/cv/classification/inceptionv3/tensorflow/README.md index 903a34ab3..4aa148b79 100644 --- a/cv/classification/inceptionv3/tensorflow/README.md +++ b/cv/classification/inceptionv3/tensorflow/README.md @@ -1,10 +1,12 @@ # InceptionV3 -## Model description +## Model Description InceptionV3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifer to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger @@ -47,12 +49,12 @@ bash run_train_inceptionV3_imagenet.sh bash run_train_inceptionV3_multigpu_imagenet.sh --epoch 200 ``` -## Results +## Model Results | GPUS | ACC | FPS | | ---------- | ----- | ------------ | | BI-V100 ×8 | 76.4% | 312 images/s | -## Reference +## References - [TensorFlow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) \ No newline at end of file diff --git a/cv/classification/inceptionv4/pytorch/README.md b/cv/classification/inceptionv4/pytorch/README.md index 1ebba767d..e7287e4fb 100644 --- a/cv/classification/inceptionv4/pytorch/README.md +++ b/cv/classification/inceptionv4/pytorch/README.md @@ -1,6 +1,6 @@ # InceptionV4 -## Model description +## Model Description Inception-v4 is a convolutional neural network architecture that builds on previous iterations of the Inception family by simplifying the architecture and using more inception modules than Inception-v3. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet :beers: Done! -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -38,5 +38,5 @@ bash train_inceptionv4_amp_dist.sh :beers: Done! -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/internimage/pytorch/README.md b/cv/classification/internimage/pytorch/README.md index 33708fe8a..5a30a3f4f 100644 --- a/cv/classification/internimage/pytorch/README.md +++ b/cv/classification/internimage/pytorch/README.md @@ -1,6 +1,6 @@ # InternImage for Image Classification -## Model description +## Model Description "INTERN-2.5" is a powerful multimodal multitask general model jointly released by SenseTime and Shanghai AI Laboratory. It consists of large-scale vision foundation model "InternImage", pre-training method "M3I-Pretraining", generic decoder "Uni-Perceiver" series, and generic encoder for autonomous driving perception "BEVFormer" series. @@ -64,7 +64,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ```bash # Training on 8 GPUs @@ -86,6 +86,6 @@ python3 main.py --cfg configs/internimage_t_1k_224.yaml --data-path /path/to/ima | 8 cards | Acc@1 83.440 fps 252 | | 1 card | fps 31 | -## Reference +## References https://github.com/OpenGVLab/InternImage diff --git a/cv/classification/lenet/pytorch/README.md b/cv/classification/lenet/pytorch/README.md index 39244b1c8..8c609c550 100755 --- a/cv/classification/lenet/pytorch/README.md +++ b/cv/classification/lenet/pytorch/README.md @@ -1,6 +1,6 @@ # LeNet -## Model description +## Model Description LeNet is a classic convolutional neural network employing the use of convolutions, pooling and fully connected layers. It was used for the handwritten digit recognition task with the MNIST dataset. The architectural design served as inspiration for future networks such as AlexNet and VGG. ## Step 1: Preparing @@ -23,7 +23,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### One single GPU ```bash python3 train.py --data-path /path/to/imagenet --model lenet @@ -33,5 +33,5 @@ python3 train.py --data-path /path/to/imagenet --model lenet python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model lenet ``` -## Reference +## References http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf diff --git a/cv/classification/mobilenetv2/pytorch/README.md b/cv/classification/mobilenetv2/pytorch/README.md index fa81f1ee2..f85c60876 100644 --- a/cv/classification/mobilenetv2/pytorch/README.md +++ b/cv/classification/mobilenetv2/pytorch/README.md @@ -1,6 +1,6 @@ # MobileNetV2 -## Model description +## Model Description MobileNetV2 is a convolutional neural network architecture that seeks to perform well on mobile devices. It is based on an inverted residual structure where the residual connections are between the bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. As a whole, the architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -36,5 +36,5 @@ bash train_mobilenet_v2_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/mobilenetv3/mindspore/README.md b/cv/classification/mobilenetv3/mindspore/README.md index f72c9801d..a7aff689c 100644 --- a/cv/classification/mobilenetv3/mindspore/README.md +++ b/cv/classification/mobilenetv3/mindspore/README.md @@ -1,11 +1,13 @@ # MobileNetV3 -## Model description +## Model Description MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019. [Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash # Install requirements @@ -58,7 +60,7 @@ bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 /path/to/imagenet/train/ bash run_infer.sh GPU /path/to/imagenet/val/ ../train/checkpointckpt_0/mobilenetv3-300_2135.ckpt ``` -## Results +## Model Results
| GPUS | ACC (ckpt107) | FPS | @@ -67,5 +69,5 @@ bash run_infer.sh GPU /path/to/imagenet/val/ ../train/checkpointckpt_0/mobilenet
-## Reference +## References - [mindspore/models](https://gitee.com/mindspore/models) \ No newline at end of file diff --git a/cv/classification/mobilenetv3/paddlepaddle/README.md b/cv/classification/mobilenetv3/paddlepaddle/README.md index 5e9a5f9b2..0247cb51a 100644 --- a/cv/classification/mobilenetv3/paddlepaddle/README.md +++ b/cv/classification/mobilenetv3/paddlepaddle/README.md @@ -1,5 +1,5 @@ # MobileNetV3 -## Model description +## Model Description MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. ## Step 1: Installing @@ -42,5 +42,5 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/mobilenetv3/pytorch/README.md b/cv/classification/mobilenetv3/pytorch/README.md index 1e83d9b7b..38ae3ba41 100644 --- a/cv/classification/mobilenetv3/pytorch/README.md +++ b/cv/classification/mobilenetv3/pytorch/README.md @@ -1,9 +1,11 @@ # MobileNetV3 -## Model description +## Model Description MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install -r requirements.txt ``` @@ -38,5 +40,5 @@ export DATA_PATH=/path/to/imagenet bash train_mobilenet_v3_large_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification#mobilenetv3-large--small) diff --git a/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md b/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md index 5d1796d4f..0c13400c2 100644 --- a/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md +++ b/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md @@ -1,9 +1,11 @@ # MobileNetV3_large_x1_0 -## Model description +## Model Description MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ``` git clone https://github.com/PaddlePaddle/PaddleClas.git ``` @@ -51,11 +53,11 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_large_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results | GPUs | Top1 | Top5 | ips | |-------------|-------------|----------------|----------| | BI-V100 x 4 | 0.749 | 0.922 | 512 samples/s | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/mobileone/pytorch/README.md b/cv/classification/mobileone/pytorch/README.md index 1ac883241..d4eb8acde 100644 --- a/cv/classification/mobileone/pytorch/README.md +++ b/cv/classification/mobileone/pytorch/README.md @@ -2,7 +2,7 @@ > [An Improved One millisecond Mobile Backbone](https://arxiv.org/abs/2206.04040) -## Model description +## Model Description Mobileone is proposed by apple and based on reparameterization. On the apple chips, the accuracy of the model is close to 0.76 on the ImageNet dataset when the latency is less than 1ms. Its main improvements based on [RepVGG](../repvgg) are fllowing: @@ -12,7 +12,9 @@ Mobileone is proposed by apple and based on reparameterization. On the apple chi Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural networks and provide ways to mitigate these bottlenecks. To this end, we design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12 with 75.9% top-1 accuracy on ImageNet. We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile. Our best model obtains similar performance on ImageNet as MobileFormer while being 38x faster. Our model obtains 2.3% better top-1 accuracy on ImageNet than EfficientNet at similar latency. Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash # Install libGL @@ -67,11 +69,11 @@ imagenet bash tools/dist_train.sh configs/mobileone/mobileone-s0_8xb32_in1k.py 8 ``` -## Results +## Model Results | GPUs | FPS | TOP1 Accuracy | | ------------ | --------- |-------------- | | BI-V100 x8 | 1014 | 71.49 | -## Reference +## References - [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/mocov2/pytorch/README.md b/cv/classification/mocov2/pytorch/README.md index 408f1ab2f..70e654819 100644 --- a/cv/classification/mocov2/pytorch/README.md +++ b/cv/classification/mocov2/pytorch/README.md @@ -3,7 +3,7 @@ > [Improved Baselines with Momentum Contrastive Learning](https://arxiv.org/abs/2003.04297) -## Model description +## Model Description Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. With simple modifications to MoCo—namely, using an MLP projection head and more data augmentation—we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. @@ -73,10 +73,10 @@ model = dict( bash tools/dist_train.sh configs/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py 8 ``` -## Results +## Model Results | Model | FPS | TOP1 Accuracy | | ------------ | --------- |--------------| | BI-V100 x8 | 4663 | 67.50 | -## Reference +## References - [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/pp-lcnet/paddlepaddle/README.md b/cv/classification/pp-lcnet/paddlepaddle/README.md index c7d75baa9..cc4c9ebd0 100644 --- a/cv/classification/pp-lcnet/paddlepaddle/README.md +++ b/cv/classification/pp-lcnet/paddlepaddle/README.md @@ -1,9 +1,11 @@ # PP-LCNet: A Lightweight CPU Convolutional Neural Network -## Model description +## Model Description We propose a lightweight CPU network based on the MKLDNN acceleration strategy, named PP-LCNet, which improves the performance of lightweight models on multiple tasks. This paper lists technologies which can improve network accuracy while the latency is almost constant. With these improvements, the accuracy of PP-LCNet can greatly surpass the previous network structure with the same inference time for classification. It outperforms the most state-of-the-art models. And for downstream tasks of computer vision, it also performs very well, such as object detection, semantic segmentation, etc. All our experiments are implemented based on PaddlePaddle. Code and pretrained models are available at PaddleClas. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git @@ -45,7 +47,7 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results on BI-V100 +## Model Results on BI-V100
@@ -55,6 +57,6 @@ python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/c
-## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/repmlp/pytorch/README.md b/cv/classification/repmlp/pytorch/README.md index 99d354727..c96928cf6 100644 --- a/cv/classification/repmlp/pytorch/README.md +++ b/cv/classification/repmlp/pytorch/README.md @@ -1,9 +1,11 @@ # RepMLP -## Model description +## Model Description RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. Construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install timm yacs @@ -44,12 +46,12 @@ sed -i "s@dataset = torchvision.datasets.ImageNet(root=config.DATA.DATA_PATH, sp python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12349 main_repmlp.py --arch RepMLPNet-B256 --batch-size 32 --tag my_experiment --opts TRAIN.EPOCHS 100 TRAIN.BASE_LR 0.001 TRAIN.WEIGHT_DECAY 0.1 TRAIN.OPTIMIZER.NAME adamw TRAIN.OPTIMIZER.MOMENTUM 0.9 TRAIN.WARMUP_LR 5e-7 TRAIN.MIN_LR 0.0 TRAIN.WARMUP_EPOCHS 10 AUG.PRESET raug15 AUG.MIXUP 0.4 AUG.CUTMIX 1.0 DATA.IMG_SIZE 256 --data-path [/path/to/imagenet] ``` -## Results +## Model Results |GPUs|FPS|ACC| |----|---|---| |BI-V100 x8|319|epoch 40: 64.866%| -## Reference +## References - [RepMLP](https://github.com/DingXiaoH/RepMLP/tree/3eff13fa0257af28663880d870f327d665f0a8e2) diff --git a/cv/classification/repvgg/paddlepaddle/README.md b/cv/classification/repvgg/paddlepaddle/README.md index 5862f58f2..28a99fea4 100644 --- a/cv/classification/repvgg/paddlepaddle/README.md +++ b/cv/classification/repvgg/paddlepaddle/README.md @@ -1,5 +1,5 @@ # RepVGG -## Model description +## Model Description A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. ## Step 1: Installing diff --git a/cv/classification/repvgg/pytorch/README.md b/cv/classification/repvgg/pytorch/README.md index be5907217..8061fdbcc 100755 --- a/cv/classification/repvgg/pytorch/README.md +++ b/cv/classification/repvgg/pytorch/README.md @@ -1,5 +1,5 @@ # RepVGG -## Model description +## Model Description A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. ## Step 1: Installing diff --git a/cv/classification/repvit/pytorch/README.md b/cv/classification/repvit/pytorch/README.md index 1650ab34f..3438ba680 100644 --- a/cv/classification/repvit/pytorch/README.md +++ b/cv/classification/repvit/pytorch/README.md @@ -3,11 +3,13 @@ -## Model description +## Model Description Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M2.3, obtains 83.7\% accuracy with only 2.3ms latency. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone https://github.com/THU-MIG/RepViT.git @@ -55,10 +57,10 @@ wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results ``` -## Results +## Model Results |GPUs|FPS|ACC| |:---:|:---:|:---:| |BI-V100 x8|1.5984 s / it| Acc@1 78.53% | -## Reference +## References [RepViT](https://github.com/THU-MIG/RepViT/tree/298f42075eda5d2e6102559fad260c970769d34e) diff --git a/cv/classification/res2net50_14w_8s/paddlepaddle/README.md b/cv/classification/res2net50_14w_8s/paddlepaddle/README.md index d924475f5..dcf813466 100644 --- a/cv/classification/res2net50_14w_8s/paddlepaddle/README.md +++ b/cv/classification/res2net50_14w_8s/paddlepaddle/README.md @@ -1,8 +1,10 @@ # Res2Net50_14w_8s -## Model description +## Model Description Res2Net is modified from the source code of ResNet. The main function of Res2Net is to add hierarchical connections within the block and indirectly increase the receptive field while reusing the feature map. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git @@ -44,11 +46,11 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ./ppcls/configs/ImageNet/Res2Net/Res2Net50_14w_8s.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results | GPUs | ACC | FPS | ---------- | ------ | --- | BI-V100 x8 | top1: 0.7943 | 338.29 images/sec -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/resnest101/pytorch/README.md b/cv/classification/resnest101/pytorch/README.md index cd8593f1b..bb373fbe8 100644 --- a/cv/classification/resnest101/pytorch/README.md +++ b/cv/classification/resnest101/pytorch/README.md @@ -1,6 +1,6 @@ # ResNeSt101 -## Model description +## Model Description A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -37,5 +37,5 @@ bash train_resnest101_amp_dist.sh -## Reference +## References https://github.com/zhanghang1989/ResNeSt diff --git a/cv/classification/resnest14/pytorch/README.md b/cv/classification/resnest14/pytorch/README.md index be0d67917..78f290141 100644 --- a/cv/classification/resnest14/pytorch/README.md +++ b/cv/classification/resnest14/pytorch/README.md @@ -1,6 +1,6 @@ # ResNeSt14 -## Model description +## Model Description A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -37,5 +37,5 @@ bash train_resnest14_amp_dist.sh -## Reference +## References https://github.com/zhanghang1989/ResNeSt diff --git a/cv/classification/resnest269/pytorch/README.md b/cv/classification/resnest269/pytorch/README.md index e51db4758..594824707 100644 --- a/cv/classification/resnest269/pytorch/README.md +++ b/cv/classification/resnest269/pytorch/README.md @@ -1,5 +1,5 @@ # ResNeSt269 -## Model description +## Model Description A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet :beers: Done! -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -38,5 +38,5 @@ bash train_resnest269_amp_dist.sh :beers: Done! -## Reference +## References https://github.com/zhanghang1989/ResNeSt diff --git a/cv/classification/resnest50/paddlepaddle/README.md b/cv/classification/resnest50/paddlepaddle/README.md index 6204dd7d1..a60d69d87 100644 --- a/cv/classification/resnest50/paddlepaddle/README.md +++ b/cv/classification/resnest50/paddlepaddle/README.md @@ -1,5 +1,5 @@ # ResNeSt50 -## Model description +## Model Description A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. ## Step 1: Installing diff --git a/cv/classification/resnest50/pytorch/README.md b/cv/classification/resnest50/pytorch/README.md index d801c719a..f8468b83a 100644 --- a/cv/classification/resnest50/pytorch/README.md +++ b/cv/classification/resnest50/pytorch/README.md @@ -1,6 +1,6 @@ # ResNeSt50 -## Model description +## Model Description A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -36,5 +36,5 @@ bash train_resnest50_amp_dist.sh ``` -## Reference +## References https://github.com/zhanghang1989/ResNeSt diff --git a/cv/classification/resnet101/pytorch/README.md b/cv/classification/resnet101/pytorch/README.md index 96f8c04fb..e11d2a4ab 100644 --- a/cv/classification/resnet101/pytorch/README.md +++ b/cv/classification/resnet101/pytorch/README.md @@ -1,5 +1,5 @@ # ResNet101 -## Model description +## Model Description Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Step 1: Installing @@ -25,7 +25,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -35,5 +35,5 @@ bash train_resnet101_amp_dist.sh -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet152/pytorch/README.md b/cv/classification/resnet152/pytorch/README.md index 404208c3f..da9acf9c3 100644 --- a/cv/classification/resnet152/pytorch/README.md +++ b/cv/classification/resnet152/pytorch/README.md @@ -1,5 +1,5 @@ # ResNet152 -## Model description +## Model Description Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Step 1: Installing @@ -26,7 +26,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -36,5 +36,5 @@ bash train_resnet152_amp_dist.sh -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet18/pytorch/README.md b/cv/classification/resnet18/pytorch/README.md index c75289f89..7b443d0c3 100644 --- a/cv/classification/resnet18/pytorch/README.md +++ b/cv/classification/resnet18/pytorch/README.md @@ -1,5 +1,5 @@ # ResNet18 -## Model description +## Model Description Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet :beers: Done! -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -38,5 +38,5 @@ bash train_resnet18_amp_dist.sh :beers: Done! -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet50/paddlepaddle/README.md b/cv/classification/resnet50/paddlepaddle/README.md index c4f0657c4..456023898 100644 --- a/cv/classification/resnet50/paddlepaddle/README.md +++ b/cv/classification/resnet50/paddlepaddle/README.md @@ -1,5 +1,5 @@ # ResNet50 -## Model description +## Model Description Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Step 1: Installing @@ -58,7 +58,7 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results on BI-V100 +## Model Results on BI-V100
diff --git a/cv/classification/resnet50/pytorch/README.md b/cv/classification/resnet50/pytorch/README.md index 3e3ef3f6f..16c156d77 100644 --- a/cv/classification/resnet50/pytorch/README.md +++ b/cv/classification/resnet50/pytorch/README.md @@ -1,5 +1,5 @@ # ResNet50 -## Model description +## Model Description Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Step 1: Preparing @@ -24,7 +24,7 @@ imagenet -## Step 2: Training +## Model Training ### One single GPU ```bash @@ -49,7 +49,7 @@ bash scripts/amp_8cards.sh --data-path /path/to/imagenet bash scripts/fp32_16cards.sh --data-path /path/to/imagenet ``` -## Results on BI-V100 +## Model Results on BI-V100 | | FP32 | AMP+NHWC | | ----------- | ----------------------------------------------- | --------------------------------------------- | @@ -62,5 +62,5 @@ bash scripts/fp32_16cards.sh --data-path /path/to/imagenet | top1 75.9% | SDK V2.2,bs:512,8x,AMP | 5221 | 76.43% | 128\*8 | 0.97 | 29.1\*8 | 1 | -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet50/tensorflow/README.md b/cv/classification/resnet50/tensorflow/README.md index e5151215c..14881ba1b 100644 --- a/cv/classification/resnet50/tensorflow/README.md +++ b/cv/classification/resnet50/tensorflow/README.md @@ -1,6 +1,6 @@ # ResNet50 -## Model description +## Model Description Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Prepare diff --git a/cv/classification/resnext101_32x8d/pytorch/README.md b/cv/classification/resnext101_32x8d/pytorch/README.md index 9b6abab27..bb96b91d4 100644 --- a/cv/classification/resnext101_32x8d/pytorch/README.md +++ b/cv/classification/resnext101_32x8d/pytorch/README.md @@ -1,6 +1,6 @@ # ResNeXt101_32x8d -## Model description +## Model Description A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -37,5 +37,5 @@ bash train_resnext101_32x8d_amp_dist.sh -## Reference +## References https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214 diff --git a/cv/classification/resnext50_32x4d/mindspore/README.md b/cv/classification/resnext50_32x4d/mindspore/README.md index 3ea8ce978..3093a1646 100644 --- a/cv/classification/resnext50_32x4d/mindspore/README.md +++ b/cv/classification/resnext50_32x4d/mindspore/README.md @@ -1,9 +1,11 @@ # ResNeXt50_32x4d -## Model description +## Model Description A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. -## Step 1: Installation +## Model Preparation + +### Install Dependencies Install OpenMPI and mesa-libGL ```bash wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz @@ -72,11 +74,11 @@ python3 eval.py \ --checkpoint_file_path=/path/to/checkpoint/model_data_dir/ckpt_0/ ``` -## Results +## Model Results | GPUs | FPS | ACC(TOP1) | ACC(TOP5) | |-------------|-----------|--------------|--------------| | BI-V100 x 8 | 109.97 | 78.18% | 94.03% | -## Reference +## References https://gitee.com/mindspore/models/tree/master/research/cv/ResNeXt diff --git a/cv/classification/resnext50_32x4d/pytorch/README.md b/cv/classification/resnext50_32x4d/pytorch/README.md index 4e8a75883..eca6864a7 100644 --- a/cv/classification/resnext50_32x4d/pytorch/README.md +++ b/cv/classification/resnext50_32x4d/pytorch/README.md @@ -1,6 +1,6 @@ # ResNeXt50_32x4d -## Model description +## Model Description A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -37,5 +37,5 @@ bash train_resnext50_32x4d_amp_dist.sh -## Reference +## References https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L200 diff --git a/cv/classification/se_resnet50_vd/paddlepaddle/README.md b/cv/classification/se_resnet50_vd/paddlepaddle/README.md index e7ecfbcd4..d5b1b29ec 100644 --- a/cv/classification/se_resnet50_vd/paddlepaddle/README.md +++ b/cv/classification/se_resnet50_vd/paddlepaddle/README.md @@ -1,10 +1,12 @@ # SE_ResNet50_vd -## Model description +## Model Description The SENet structure is a weighted average between graph channels that can be embedded into other network structures. SE_ResNet50_vd is a model that adds the senet structure to ResNet50, further learning the dependency relationships between graph channels to obtain better image features. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ``` pip3 install -r requirements.txt @@ -55,12 +57,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ./ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml ``` -## Results +## Model Results | GPUS | ACC | FPS | | ---- | ------ | --------- | | BI-V100 x8 | 79.20% | 139.63 samples/s | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) diff --git a/cv/classification/seresnext/pytorch/README.md b/cv/classification/seresnext/pytorch/README.md index 641c15a35..d3cd83994 100644 --- a/cv/classification/seresnext/pytorch/README.md +++ b/cv/classification/seresnext/pytorch/README.md @@ -1,5 +1,5 @@ # SEResNeXt -## Model description +## Model Description SE ResNeXt is a variant of a ResNext that employs squeeze-and-excitation blocks to enable the network to perform dynamic channel-wise feature recalibration. ## Step 1: Installing ```bash @@ -25,7 +25,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -35,5 +35,5 @@ bash train_seresnext101_32x4d_amp_dist.sh -## Reference +## References https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214 diff --git a/cv/classification/shufflenetv2/paddlepaddle/README.md b/cv/classification/shufflenetv2/paddlepaddle/README.md index 35fa71c9c..d5df18cc3 100644 --- a/cv/classification/shufflenetv2/paddlepaddle/README.md +++ b/cv/classification/shufflenetv2/paddlepaddle/README.md @@ -1,8 +1,10 @@ # ShuffleNetv2 -## Model description +## Model Description ShuffleNet v2 is a convolutional neural network optimized for a direct metric (speed) rather than indirect metrics like FLOPs. It builds upon ShuffleNet v1, which utilised pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation. Differences are shown in the Figure to the right, including a new channel split operation and moving the channel shuffle operation further down the block.ShuffleNetv2 is an efficient convolutional neural network architecture for mobile devices. For more information check the paper: [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git @@ -51,11 +53,11 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ShuffleNet/ShuffleNetV2_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Results +## Model Results | GPUs | Top1 | Top5 |ips | |-------------|-------------|----------------|----------------| | BI-V100 x 4 | 0.684 | 0.881 | 1236 | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/shufflenetv2/pytorch/README.md b/cv/classification/shufflenetv2/pytorch/README.md index 8ed42cde5..ddcb41db8 100644 --- a/cv/classification/shufflenetv2/pytorch/README.md +++ b/cv/classification/shufflenetv2/pytorch/README.md @@ -1,5 +1,5 @@ # ShuffleNetV2 -## Model description +## Model Description ShuffleNet v2 is a convolutional neural network optimized for a direct metric (speed) rather than indirect metrics like FLOPs. It builds upon ShuffleNet v1, which utilised pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation. ## Step 1: Installing ```bash @@ -25,7 +25,7 @@ imagenet ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -34,5 +34,5 @@ bash train_shufflenet_v2_x2_0_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification#shufflenet-v2) diff --git a/cv/classification/squeezenet/pytorch/README.md b/cv/classification/squeezenet/pytorch/README.md index 61637e1cf..a004257d1 100644 --- a/cv/classification/squeezenet/pytorch/README.md +++ b/cv/classification/squeezenet/pytorch/README.md @@ -1,6 +1,6 @@ # SqueezeNet -## Model description +## Model Description SqueezeNet is a convolutional neural network that employs design strategies to reduce the number of parameters, notably with the use of fire modules that "squeeze" parameters using 1x1 convolutions. ## Step 1: Installing @@ -27,7 +27,7 @@ imagenet :beers: Done! -## Step 2: Training +## Model Training ### One single GPU ```bash python3 train.py --data-path /path/to/imagenet --model squeezenet1_0 --lr 0.001 @@ -37,5 +37,5 @@ python3 train.py --data-path /path/to/imagenet --model squeezenet1_0 --lr 0.001 python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenette --model squeezenet1_0 --lr 0.001 ``` -## Reference +## References https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py diff --git a/cv/classification/swin_transformer/paddlepaddle/README.md b/cv/classification/swin_transformer/paddlepaddle/README.md index 218dbd9fd..6035169ee 100644 --- a/cv/classification/swin_transformer/paddlepaddle/README.md +++ b/cv/classification/swin_transformer/paddlepaddle/README.md @@ -1,5 +1,5 @@ # Swin Transformer -## Model description +## Model Description The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. ## Step 1: Installing diff --git a/cv/classification/swin_transformer/pytorch/README.md b/cv/classification/swin_transformer/pytorch/README.md index 3b2d0b280..584600b3a 100644 --- a/cv/classification/swin_transformer/pytorch/README.md +++ b/cv/classification/swin_transformer/pytorch/README.md @@ -1,5 +1,5 @@ # Swin Transformer -## Model description +## Model Description The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. ## Step 1: Installing @@ -29,7 +29,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### Multiple GPUs on one machine ```bash # fix --local-rank for torch 2.x @@ -40,4 +40,5 @@ python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main ``` ## Reference -[Swin-Transformer](https://github.com/microsoft/Swin-Transformer) + +- [Swin-Transformer](https://github.com/microsoft/Swin-Transformer) diff --git a/cv/classification/vgg/paddlepaddle/README.md b/cv/classification/vgg/paddlepaddle/README.md index 4152d3a4b..c22107937 100644 --- a/cv/classification/vgg/paddlepaddle/README.md +++ b/cv/classification/vgg/paddlepaddle/README.md @@ -1,5 +1,5 @@ # VGG16 -## Model description +## Model Description VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. ## Step 1: Installing @@ -53,5 +53,5 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/VGG/VGG16.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/vgg/pytorch/README.md b/cv/classification/vgg/pytorch/README.md index a55b1cb7d..197800e7b 100644 --- a/cv/classification/vgg/pytorch/README.md +++ b/cv/classification/vgg/pytorch/README.md @@ -1,9 +1,11 @@ # VGG16 -## Model description +## Model Description VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install -r requirements.txt @@ -39,5 +41,5 @@ export DATA_PATH=/path/to/imagenet bash train_vgg16_amp_dist.sh ``` -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/vgg/tensorflow/README.md b/cv/classification/vgg/tensorflow/README.md index c43928627..9c583620b 100644 --- a/cv/classification/vgg/tensorflow/README.md +++ b/cv/classification/vgg/tensorflow/README.md @@ -1,10 +1,12 @@ # VGG16 -## Model description +## Model Description VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger @@ -38,11 +40,11 @@ bash run_train_vgg16_imagenet.sh bash run_train_vgg16_multigpu_imagenet.sh ``` -## Results +## Model Results | GPUS | acc | fps | | ----------| --------------------------| ----- | | BI V100×8 | acc@1=0.7160,acc@5=0.9040 | 435.9 | -## Reference +## References - [TensorFlow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) \ No newline at end of file diff --git a/cv/classification/wavemlp/pytorch/README.md b/cv/classification/wavemlp/pytorch/README.md index ac8c08c43..d4bd919a5 100644 --- a/cv/classification/wavemlp/pytorch/README.md +++ b/cv/classification/wavemlp/pytorch/README.md @@ -1,6 +1,6 @@ # Wave-MLP -## Model description +## Model Description In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_mlp. @@ -30,7 +30,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### WaveMLP_T*: @@ -48,7 +48,7 @@ sed -i 's/args.max_history/100/g' train.py python3 -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 train.py /your_path_to/imagenet/ --output /your_path_to/output/ --model WaveMLP_T_dw --sched cosine --epochs 300 --opt adamw -j 8 --warmup-lr 1e-6 --mixup .8 --cutmix 1.0 --model-ema --model-ema-decay 0.99996 --aa rand-m9-mstd0.5-inc1 --color-jitter 0.4 --warmup-epochs 5 --opt-eps 1e-8 --repeated-aug --remode pixel --reprob 0.25 --amp --lr 1e-3 --weight-decay .05 --drop 0 --drop-path 0.1 -b 128 ``` -## Results on BI-V100 +## Model Results on BI-V100 ### FP16 @@ -69,4 +69,5 @@ python3 -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 ## Reference -[wavemlp_pytorch](https://github.com/huawei-noah/Efficient-AI-Backbones/blob/master/wavemlp_pytorch/) + +- [wavemlp_pytorch](https://github.com/huawei-noah/Efficient-AI-Backbones/blob/master/wavemlp_pytorch/) diff --git a/cv/classification/wide_resnet101_2/pytorch/README.md b/cv/classification/wide_resnet101_2/pytorch/README.md index b5cdc54ab..626597583 100644 --- a/cv/classification/wide_resnet101_2/pytorch/README.md +++ b/cv/classification/wide_resnet101_2/pytorch/README.md @@ -1,6 +1,6 @@ # Wide_ResNet101_2 -## Model description +## Model Description Wide Residual Networks are a variant on ResNets where we decrease depth and increase the width of residual networks. This is achieved through the use of wide residual blocks. ## Step 1: Installing @@ -28,7 +28,7 @@ imagenet :beers: Done! -## Step 2: Training +## Model Training ### Multiple GPUs on one machine Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: @@ -39,5 +39,5 @@ bash train_wide_resnet101_2_amp_dist.sh :beers: Done! -## Reference +## References - [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/xception/paddlepaddle/README.md b/cv/classification/xception/paddlepaddle/README.md index af341bc49..ab56b9bca 100644 --- a/cv/classification/xception/paddlepaddle/README.md +++ b/cv/classification/xception/paddlepaddle/README.md @@ -1,10 +1,12 @@ # Xception -## Model description +## Model Description Xception is a convolutional neural network architecture that relies solely on depthwise separable convolution layers. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git @@ -57,10 +59,10 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ./ppcls/configs/ImageNet/Xception/Xception41.yaml ``` -## Results +## Model Results | GPUs | TOP1 | TOP5 | ips | |:-----------:|:-----------:|:-----------:|:-----------:| | BI-V100 x 8 |0.783 | 0.941 | 537.04 | -## Reference +## References - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/xception/pytorch/README.md b/cv/classification/xception/pytorch/README.md index a22a73fc6..1acdb3cf1 100755 --- a/cv/classification/xception/pytorch/README.md +++ b/cv/classification/xception/pytorch/README.md @@ -1,6 +1,6 @@ # Xception -## Model description +## Model Description Xception is a convolutional neural network architecture that relies solely on depthwise separable convolution layers. ## Step 1: Installing @@ -26,7 +26,7 @@ imagenet └── val_list.txt ``` -## Step 2: Training +## Model Training ### One single GPU ```bash python3 train.py --data-path /path/to/imagenet --model xception @@ -36,5 +36,5 @@ python3 train.py --data-path /path/to/imagenet --model xception python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model xception ``` -## Reference +## References https://github.com/tstandley/Xception-PyTorch -- Gitee From 49dceb7b203b4b4b51ad5fa8d128ab877c1d43a0 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Fri, 14 Mar 2025 14:57:49 +0800 Subject: [PATCH 2/3] unify model readme format 2nd part - cv/classification Signed-off-by: mingjiang.li --- cv/classification/README.md | 1 - .../alexnet/tensorflow/README.md | 2 + cv/classification/cbam/pytorch/README.md | 5 +- cv/classification/convnext/pytorch/README.md | 30 ++++--- .../cspdarknet53/pytorch/README.md | 50 ++++++++--- .../densenet/paddlepaddle/README.md | 57 ++++++------ cv/classification/densenet/pytorch/README.md | 29 ++++--- cv/classification/dpn107/pytorch/README.md | 30 ++++--- cv/classification/dpn92/pytorch/README.md | 32 ++++--- .../eca_mobilenet_v2/pytorch/README.md | 27 ++++-- .../eca_resnet152/pytorch/README.md | 27 ++++-- .../efficientnet_b0/paddlepaddle/README.md | 58 +++++++------ .../efficientnet_b4/pytorch/README.md | 29 ++++--- cv/classification/fasternet/pytorch/README.md | 54 ++++++------ .../googlenet/paddlepaddle/README.md | 28 +++--- cv/classification/googlenet/pytorch/README.md | 25 ++++-- .../inceptionv3/mindspore/README.md | 71 +++++++-------- .../inceptionv3/pytorch/README.md | 6 +- .../inceptionv3/tensorflow/README.md | 4 +- .../inceptionv4/pytorch/README.md | 34 +++++--- .../internimage/pytorch/README.md | 87 ++++++++++--------- cv/classification/lenet/pytorch/README.md | 25 ++++-- .../mobilenetv2/pytorch/README.md | 29 +++++-- .../mobilenetv3/mindspore/README.md | 66 +++++++------- .../mobilenetv3/paddlepaddle/README.md | 38 +++++--- .../mobilenetv3/pytorch/README.md | 29 ++++--- .../paddlepaddle/README.md | 52 ++++++----- cv/classification/mobileone/pytorch/README.md | 67 +++++++------- cv/classification/mocov2/pytorch/README.md | 66 +++++++------- .../pp-lcnet/paddlepaddle/README.md | 49 ++++++----- cv/classification/repmlp/pytorch/README.md | 39 +++++---- .../repvgg/paddlepaddle/README.md | 38 +++++--- cv/classification/repvgg/pytorch/README.md | 54 +++++++----- cv/classification/repvit/pytorch/README.md | 52 ++++++----- .../res2net50_14w_8s/paddlepaddle/README.md | 40 +++++---- .../resnest101/pytorch/README.md | 31 ++++--- cv/classification/resnest14/pytorch/README.md | 31 ++++--- .../resnest269/pytorch/README.md | 34 +++++--- .../resnest50/paddlepaddle/README.md | 36 +++++--- cv/classification/resnest50/pytorch/README.md | 30 +++++-- cv/classification/resnet101/pytorch/README.md | 33 ++++--- cv/classification/resnet152/pytorch/README.md | 32 ++++--- cv/classification/resnet18/pytorch/README.md | 33 ++++--- .../resnet50/paddlepaddle/README.md | 48 +++++----- cv/classification/resnet50/pytorch/README.md | 53 +++++------ .../resnet50/tensorflow/README.md | 42 ++++----- .../resnext101_32x8d/pytorch/README.md | 31 ++++--- .../resnext50_32x4d/mindspore/README.md | 71 ++++++++------- .../resnext50_32x4d/pytorch/README.md | 30 ++++--- .../se_resnet50_vd/paddlepaddle/README.md | 50 ++++++----- cv/classification/seresnext/pytorch/README.md | 33 ++++--- .../shufflenetv2/paddlepaddle/README.md | 53 ++++++----- .../shufflenetv2/pytorch/README.md | 31 +++++-- .../squeezenet/pytorch/README.md | 36 +++++--- .../swin_transformer/paddlepaddle/README.md | 38 +++++--- .../swin_transformer/pytorch/README.md | 38 +++++--- cv/classification/vgg/paddlepaddle/README.md | 40 ++++++--- cv/classification/vgg/pytorch/README.md | 29 ++++--- cv/classification/vgg/tensorflow/README.md | 42 +++++---- cv/classification/wavemlp/pytorch/README.md | 51 ++++++----- .../wide_resnet101_2/pytorch/README.md | 32 ++++--- .../xception/paddlepaddle/README.md | 46 ++++++---- cv/classification/xception/pytorch/README.md | 35 +++++--- 63 files changed, 1463 insertions(+), 956 deletions(-) delete mode 100644 cv/classification/README.md diff --git a/cv/classification/README.md b/cv/classification/README.md deleted file mode 100644 index 468826e5f..000000000 --- a/cv/classification/README.md +++ /dev/null @@ -1 +0,0 @@ -# Image Classification diff --git a/cv/classification/alexnet/tensorflow/README.md b/cv/classification/alexnet/tensorflow/README.md index 89d391d2b..f95aad1c9 100644 --- a/cv/classification/alexnet/tensorflow/README.md +++ b/cv/classification/alexnet/tensorflow/README.md @@ -1,5 +1,7 @@ # AlexNet +## Model Description + AlexNet is a groundbreaking deep convolutional neural network that revolutionized computer vision. It introduced key innovations like ReLU activations, dropout regularization, and GPU acceleration. With its 8-layer architecture featuring 5 convolutional and 3 fully-connected layers, AlexNet achieved record-breaking performance on ImageNet in 2012. Its diff --git a/cv/classification/cbam/pytorch/README.md b/cv/classification/cbam/pytorch/README.md index 1534fa753..43d353876 100644 --- a/cv/classification/cbam/pytorch/README.md +++ b/cv/classification/cbam/pytorch/README.md @@ -46,9 +46,10 @@ ResNet50 based examples are included. Example scripts are included under ```./sc ```bash # To train with CBAM (ResNet50 backbone) -# For 8 GPUs +## For 8 GPUs python3 train_imagenet.py --ngpu 8 --workers 20 --arch resnet --depth 50 --epochs 100 --batch-size 256 --lr 0.1 --att-type CBAM --prefix RESNET50_IMAGENET_CBAM ./data/ImageNet -# For 1 GPUs + +## For 1 GPUs python3 train_imagenet.py --ngpu 1 --workers 20 --arch resnet --depth 50 --epochs 100 --batch-size 64 --lr 0.1 --att-type CBAM --prefix RESNET50_IMAGENET_CBAM ./data/ImageNet ``` diff --git a/cv/classification/convnext/pytorch/README.md b/cv/classification/convnext/pytorch/README.md index 8795a653b..aab8f3f66 100644 --- a/cv/classification/convnext/pytorch/README.md +++ b/cv/classification/convnext/pytorch/README.md @@ -2,15 +2,19 @@ ## Model Description -The ConvNeXT model was proposed in [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them. +ConvNext is a modern convolutional neural network architecture that bridges the gap between traditional ConvNets and +Vision Transformers. Inspired by Transformer designs, it incorporates techniques like large kernel sizes, layer +normalization, and inverted bottlenecks to achieve state-of-the-art performance. ConvNext demonstrates that properly +modernized ConvNets can match or exceed Transformer-based models in accuracy and efficiency across various vision tasks. +Its simplicity and strong performance make it a compelling choice for image classification and other computer vision +applications. -## Step 1: Installing +## Model Preparation -```bash -pip install timm==0.4.12 tensorboardX six torch torchvision -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,14 +32,20 @@ imagenet └── val_list.txt ``` -## Model Training - -### Multiple GPUs on one machine +### Install Dependencies ```bash +pip install timm==0.4.12 tensorboardX six torch torchvision + git clone https://github.com/facebookresearch/ConvNeXt.git -cd /path/to/ConvNeXt +cd ConvNeXt/ git checkout 048efcea897d999aed302f2639b6270aedf8d4c8 +``` + +## Model Training + +```bash +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 main.py \ --model convnext_tiny \ --drop_path 0.1 \ diff --git a/cv/classification/cspdarknet53/pytorch/README.md b/cv/classification/cspdarknet53/pytorch/README.md index 96bbe1862..c4812e97f 100644 --- a/cv/classification/cspdarknet53/pytorch/README.md +++ b/cv/classification/cspdarknet53/pytorch/README.md @@ -2,9 +2,36 @@ ## Model Description -This is an implementation of CSPDarknet53 in pytorch. +CspDarknet53 is an efficient backbone network for object detection, combining Cross Stage Partial (CSP) connections with +the Darknet architecture. It reduces computational complexity while maintaining feature richness by splitting feature +maps across stages. The model achieves better gradient flow and reduces memory usage compared to traditional Darknet +architectures. CspDarknet53 is particularly effective in real-time detection tasks, offering a good balance between +accuracy and speed, making it popular in modern object detection frameworks like YOLOv4. -## Step 1: Installing +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies ```bash pip3 install torchsummary @@ -12,26 +39,23 @@ pip3 install torchsummary ## Model Training -### One single GPU - ```bash +# One single GPU export CUDA_VISIBLE_DEVICES=0 python3 train.py --batch-size 64 --epochs 120 --data-path /home/datasets/cv/imagenet -``` -### 8 GPUs on one machine -```bash +# 8 GPUs on one machine export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --batch-size 64 --epochs 120 --data-path /home/datasets/cv/imagenet ``` -## Result +## Model Results -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1 76.644 fps 1049 | -| 1 card | fps 148 | +| Model | GPU | FP32 | +|--------------|------------|---------------------------| +| CspDarknet53 | BI-V100 x8 | Acc@1 76.644 fps 1049 | +| CspDarknet53 | BI-V100 x1 | fps 148 | ## References -https://github.com/WongKinYiu/CrossStagePartialNetworks +- [CrossStagePartialNetworks](https://github.com/WongKinYiu/CrossStagePartialNetworks) diff --git a/cv/classification/densenet/paddlepaddle/README.md b/cv/classification/densenet/paddlepaddle/README.md index 38035cfac..678b4e196 100644 --- a/cv/classification/densenet/paddlepaddle/README.md +++ b/cv/classification/densenet/paddlepaddle/README.md @@ -2,29 +2,18 @@ ## Model Description -A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. +DenseNet is an innovative convolutional neural network architecture that introduces dense connections between layers. In +each dense block, every layer receives feature maps from all preceding layers and passes its own features to all +subsequent layers. This dense connectivity pattern improves gradient flow, encourages feature reuse, and reduces +vanishing gradient problems. DenseNet achieves state-of-the-art performance with fewer parameters compared to +traditional CNNs, making it efficient for various computer vision tasks like image classification and object detection. ## Model Preparation -### Install Dependencies - -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git - -cd PaddleClas - -yum install mesa-libGL -y - -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 - -python3 setup.py install -``` - -## Step 2: Preparing Datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -42,11 +31,25 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git + +cd PaddleClas/ + +yum install mesa-libGL -y + +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 + +python3 setup.py install +``` + +## Model Training ```bash -# Make sure your dataset path is the same as above -cd PaddleClas # Link your dataset to default location ln -s /path/to/imagenet ./dataset/ILSVRC2012 @@ -54,14 +57,16 @@ export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True export CUDA_VISIBLE_DEVICES=0,1,2,3 -python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml -o Arch.pretrained=False -o Global.device=gpu +python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py \ + -c ppcls/configs/ImageNet/DenseNet/DenseNet121.yaml \ + -o Arch.pretrained=False -o Global.device=gpu ``` ## Model Results -| GPUs | Top1 | Top5 |ips | -|-------------|-------------|----------------|----------------| -| BI-V100 x 4 | 0.757 | 0.925 | 171 | +| Model | GPU | Top1 | Top5 | ips | +|-----------|-------------|-------|-------|-----| +| DeneseNet | BI-V100 x 4 | 0.757 | 0.925 | 171 | ## References diff --git a/cv/classification/densenet/pytorch/README.md b/cv/classification/densenet/pytorch/README.md index e25fdf825..454f846ae 100755 --- a/cv/classification/densenet/pytorch/README.md +++ b/cv/classification/densenet/pytorch/README.md @@ -2,15 +2,18 @@ ## Model Description -A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. +DenseNet is an innovative convolutional neural network architecture that introduces dense connections between layers. In +each dense block, every layer receives feature maps from all preceding layers and passes its own features to all +subsequent layers. This dense connectivity pattern improves gradient flow, encourages feature reuse, and reduces +vanishing gradient problems. DenseNet achieves state-of-the-art performance with fewer parameters compared to +traditional CNNs, making it efficient for various computer vision tasks like image classification and object detection. -## Step 1: Installing +## Model Preparation -```bash -pip install torch torchvision -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,20 +31,22 @@ imagenet └── val_list.txt ``` -## Model Training - -### One single GPU +### Install Dependencies ```bash -python3 train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 +pip install torch torchvision ``` -### Multiple GPUs on one machine +## Model Training ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model densenet201 --batch-size 128 ``` ## References -[densenet](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) diff --git a/cv/classification/dpn107/pytorch/README.md b/cv/classification/dpn107/pytorch/README.md index 7114963eb..3bf787dee 100644 --- a/cv/classification/dpn107/pytorch/README.md +++ b/cv/classification/dpn107/pytorch/README.md @@ -1,13 +1,19 @@ # DPN107 + ## Model Description -A Dual Path Network (DPN) is a convolutional neural network which presents a new topology of connection paths internally.The intuition is that ResNets enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +DPN107 is an advanced dual-path network that combines the feature reuse capability of ResNet with the feature +exploration of DenseNet. This architecture enables efficient learning by maintaining two parallel paths: one for +preserving important features and another for discovering new ones. DPN107 achieves state-of-the-art performance in +image classification tasks while maintaining computational efficiency. Its unique design makes it particularly effective +for complex visual recognition tasks, offering a balance between model accuracy and resource utilization. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,17 +31,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_dpn107_amp_dist.sh ``` -:beers: Done! ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/dpn92/pytorch/README.md b/cv/classification/dpn92/pytorch/README.md index 705ff175d..9dc35d694 100644 --- a/cv/classification/dpn92/pytorch/README.md +++ b/cv/classification/dpn92/pytorch/README.md @@ -1,13 +1,19 @@ # DPN92 + ## Model Description -A Dual Path Network (DPN) is a convolutional neural network which presents a new topology of connection paths internally. The intuition is that ResNets enables feature re-usage while DenseNet enables new feature exploration, and both are important for learning good representations. To enjoy the benefits from both path topologies, Dual Path Networks share common features while maintaining the flexibility to explore new features through dual path architectures. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +DPN92 is a dual-path network that combines the strengths of ResNet and DenseNet architectures. It features two parallel +paths: one for feature reuse (like ResNet) and another for feature exploration (like DenseNet). This dual-path approach +enables efficient learning of both shared and new features. DPN92 achieves state-of-the-art performance in image +classification tasks while maintaining computational efficiency. Its unique architecture makes it particularly effective +for tasks requiring both feature preservation and discovery. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,15 +31,21 @@ imagenet └── val_list.txt ``` -## Step2: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training -### Multiple GPUs on one machine (AMP) Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_dpn92_amp_dist.sh ``` - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/eca_mobilenet_v2/pytorch/README.md b/cv/classification/eca_mobilenet_v2/pytorch/README.md index 8f3224512..7a0a34717 100644 --- a/cv/classification/eca_mobilenet_v2/pytorch/README.md +++ b/cv/classification/eca_mobilenet_v2/pytorch/README.md @@ -2,15 +2,19 @@ ## Model Description -An ECA-Net is a type of convolutional neural network that utilises an Efficient Channel Attention module. +ECA MobileNet V2 is an efficient convolutional neural network that combines MobileNet V2's lightweight architecture with +an Efficient Channel Attention (ECA) module. The ECA module enhances feature representation by adaptively recalibrating +channel-wise feature responses without dimensionality reduction. This integration improves model performance while +maintaining computational efficiency, making it suitable for mobile and edge devices. ECA MobileNet V2 achieves better +accuracy than standard MobileNet V2 with minimal additional parameters, making it ideal for resource-constrained image +classification tasks. -## Step 1: Installing +## Model Preparation -```bash -pip3 install -r requirements.txt -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,16 +32,21 @@ imagenet └── val_list.txt ``` -## Model Training +### Install Dependencies -### Multiple GPUs on one machine (AMP) +```bash +pip3 install -r requirements.txt +``` + +## Model Training Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_eca_mobilenet_v2_amp_dist.sh ``` ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/eca_resnet152/pytorch/README.md b/cv/classification/eca_resnet152/pytorch/README.md index d930f6ecb..93d955757 100644 --- a/cv/classification/eca_resnet152/pytorch/README.md +++ b/cv/classification/eca_resnet152/pytorch/README.md @@ -2,15 +2,19 @@ ## Model Description -An ECA-Net is a type of convolutional neural network that utilises an Efficient Channel Attention module. +ECA ResNet152 is an enhanced version of ResNet152 that incorporates the Efficient Channel Attention (ECA) module. This +module improves feature representation by adaptively recalibrating channel-wise feature responses without dimensionality +reduction. The ECA mechanism boosts model performance while maintaining computational efficiency. ECA ResNet152 achieves +superior accuracy in image classification tasks compared to standard ResNet152, making it particularly effective for +complex visual recognition problems. Its architecture balances performance and efficiency, making it suitable for +various computer vision applications. -## Step 1: Installing +## Model Preparation -```bash -pip3 install -r requirements.txt -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,16 +32,21 @@ imagenet └── val_list.txt ``` -## Model Training +### Install Dependencies -### Multiple GPUs on one machine (AMP) +```bash +pip3 install -r requirements.txt +``` + +## Model Training Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_eca_resnet152_amp_dist.sh ``` ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/efficientnet_b0/paddlepaddle/README.md b/cv/classification/efficientnet_b0/paddlepaddle/README.md index 4ebecc744..af8c0c9a3 100644 --- a/cv/classification/efficientnet_b0/paddlepaddle/README.md +++ b/cv/classification/efficientnet_b0/paddlepaddle/README.md @@ -2,33 +2,24 @@ ## Model Description -This model is the B0 version of the EfficientNet series, whitch can be used for image classification tasks, such as cat and dog classification, flower classification, and so on. +EfficientNetB0 is the baseline model in the EfficientNet series, known for its exceptional balance between accuracy and +efficiency. It uses compound scaling to uniformly scale up network width, depth, and resolution, achieving +state-of-the-art performance with minimal computational resources. The model employs mobile inverted bottleneck +convolution (MBConv) blocks with squeeze-and-excitation optimization. EfficientNetB0 is particularly effective for +mobile and edge devices, offering high accuracy in image classification tasks while maintaining low computational +requirements. ## Model Preparation -### Install Dependencies - -```bash -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git - -cd PaddleClas -pip3 install -r requirements.txt -pip3 install paddleclas -pip3 install protobuf==3.20.3 -yum install mesa-libGL -pip3 install urllib3==1.26.15 - -``` +### Prepare Resources - -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: ```bash -PaddleClas/dataset/ILSVRC2012/ +ILSVRC2012 ├── train │ └── n01440764 │ ├── n01440764_10026.JPEG @@ -41,9 +32,9 @@ PaddleClas/dataset/ILSVRC2012/ └── val_list.txt ``` -**Tips** +Tips: for `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` +directories: -For `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: - train_list.txt: train/n01440764/n01440764_10026.JPEG 0 - val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 @@ -53,7 +44,21 @@ sed -i 's#^#train/#g' train_list.txt sed -i 's#^#val/#g' val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +yum install -y mesa-libGL + +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt +pip3 install paddleclas +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.15 + +``` + +## Model Training ```bash # Link your dataset to default location @@ -71,9 +76,10 @@ python3 tools/train.py -c ppcls/configs/ImageNet/EfficientNet/EfficientNetB0.yam ## Model Results -| GPUs| ips | Top1 | Top5 | -| ------ | ---------- |--------------|--------------| -| BI-V100 x8 | 1065.28 | 0.7683 | 0.9316 | +| Model | GPU | ips | Top1 | Top5 | +|----------------|------------|---------|--------|--------| +| EfficientNetB0 | BI-V100 x8 | 1065.28 | 0.7683 | 0.9316 | ## References -- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) \ No newline at end of file + +- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/release/2.5) diff --git a/cv/classification/efficientnet_b4/pytorch/README.md b/cv/classification/efficientnet_b4/pytorch/README.md index 6564fefae..91585de19 100755 --- a/cv/classification/efficientnet_b4/pytorch/README.md +++ b/cv/classification/efficientnet_b4/pytorch/README.md @@ -2,15 +2,18 @@ ## Model Description -EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a compound coefficient. +EfficientNetB4 is a scaled-up version of the EfficientNet architecture, using compound scaling to balance network width, +depth, and resolution. It builds upon the efficient MBConv blocks with squeeze-and-excitation optimization, achieving +superior accuracy compared to smaller EfficientNet variants. The model maintains computational efficiency while handling +more complex visual recognition tasks. EfficientNetB4 is particularly effective for high-accuracy image classification +scenarios where computational resources are available, offering a good trade-off between performance and efficiency. -## Step 1: Installing +## Model Preparation -```bash -pip3 install torch torchvision -``` +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -28,20 +31,22 @@ imagenet └── val_list.txt ``` -## Model Training - -### One single GPU +### Install Dependencies ```bash -python3 train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 +pip3 install torch torchvision ``` -### Multiple GPUs on one machine +## Model Training ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model efficientnet_b4 --batch-size 128 ``` ## References - +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/efficientnet.py) diff --git a/cv/classification/fasternet/pytorch/README.md b/cv/classification/fasternet/pytorch/README.md index a81493a4b..09b0e132a 100644 --- a/cv/classification/fasternet/pytorch/README.md +++ b/cv/classification/fasternet/pytorch/README.md @@ -2,28 +2,19 @@ ## Model Description -This is the official Pytorch/PytorchLightning implementation of the paper:
-> [**Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**](https://arxiv.org/abs/2303.03667) -> Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan -> *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023* -> - -We propose a simple yet fast and effective partial convolution (**PConv**), as well as a latency-efficient family of architectures called **FasterNet**. +FasterNet is a high-speed neural network architecture that introduces Partial Convolution (PConv) to optimize +computational efficiency. It achieves superior performance by reducing redundant computations while maintaining feature +learning capabilities. FasterNet is designed for real-time applications, offering an excellent balance between accuracy +and speed. Its innovative architecture makes it particularly effective for mobile and edge devices, where computational +resources are limited. The model demonstrates state-of-the-art results in various computer vision tasks while +maintaining low latency. ## Model Preparation -### Install Dependencies -Clone this repo and install the required packages: -```bash -pip install -r requirements.txt -git clone https://github.com/JierunChen/FasterNet.git -cd FasterNet -git checkout e8fba4465ae912359c9f661a72b14e39347e4954 -``` +### Prepare Resources -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -41,8 +32,21 @@ imagenet └── val_list.txt ``` -## Step 3: Training -**Remark**: Training will prompt wondb visualization options, you'll need a W&B account to visualize, choose "3" if you don't need to. +### Install Dependencies + +Clone this repo and install the required packages: + +```bash +pip install -r requirements.txt +git clone https://github.com/JierunChen/FasterNet.git +cd FasterNet +git checkout e8fba4465ae912359c9f661a72b14e39347e4954 +``` + +## Model Training + +**Remark**: Training will prompt wondb visualization options, you'll need a W&B account to visualize, choose "3" if you +don't need to. FasterNet-T0 training on ImageNet with a 8-GPU node: @@ -66,14 +70,14 @@ python3 train_test.py -g 0 --num_nodes 1 -n 4 -b 512 -e 2000 \ --cfg cfg/fasternet_t0.yaml ``` -To train other FasterNet variants, `--cfg` need to be changed. You may also want to change the training batch size `-b`. +To train other FasterNet variants, `--cfg` need to be changed. You may also want to change the training batch size `-b`. ## Model Results -| GPUs | FP32 | -| ----------- | ------------------------------------ | -| BI-V100 x8 | test_acc1 71.832 val_acc1 71.722 | +| Model | GPU | FP32 | +|-----------|------------|----------------------------------| +| FasterNet | BI-V100 x8 | test_acc1 71.832 val_acc1 71.722 | ## References -[FasterNet](https://github.com/JierunChen/FasterNet/tree/e8fba4465ae912359c9f661a72b14e39347e4954) +- [FasterNet](https://github.com/JierunChen/FasterNet/tree/e8fba4465ae912359c9f661a72b14e39347e4954) diff --git a/cv/classification/googlenet/paddlepaddle/README.md b/cv/classification/googlenet/paddlepaddle/README.md index 06bbb4ee4..6f4139d9c 100644 --- a/cv/classification/googlenet/paddlepaddle/README.md +++ b/cv/classification/googlenet/paddlepaddle/README.md @@ -1,19 +1,19 @@ # GoogLeNet ## Model Description -GoogLeNet is a type of convolutional neural network based on the Inception architecture. It utilises Inception modules, which allow the network to choose between multiple convolutional filter sizes in each block. An Inception network stacks these modules on top of each other, with occasional max-pooling layers with stride 2 to halve the resolution of the grid. -## Step 1: Installing +GoogLeNet is a pioneering deep convolutional neural network that introduced the Inception architecture. It features +multiple parallel convolutional filters of different sizes within Inception modules, allowing efficient feature +extraction at various scales. The network uses 1x1 convolutions for dimensionality reduction, making it computationally +efficient. GoogLeNet achieved state-of-the-art performance in image classification tasks while maintaining relatively +low computational complexity. Its innovative design has influenced many subsequent CNN architectures in computer vision. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -31,7 +31,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run GoogLeNet AMP +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash # Modify the file: PaddleClas/ppcls/configs/ImageNet/Inception/GoogLeNet.yaml to add the option of AMP diff --git a/cv/classification/googlenet/pytorch/README.md b/cv/classification/googlenet/pytorch/README.md index 2ae336b08..759dd4d28 100755 --- a/cv/classification/googlenet/pytorch/README.md +++ b/cv/classification/googlenet/pytorch/README.md @@ -1,11 +1,19 @@ # GoogLeNet ## Model Description -GoogLeNet is a type of convolutional neural network based on the Inception architecture. It utilises Inception modules, which allow the network to choose between multiple convolutional filter sizes in each block. An Inception network stacks these modules on top of each other, with occasional max-pooling layers with stride 2 to halve the resolution of the grid. -## Step 1: Preparing +GoogLeNet is a pioneering deep convolutional neural network that introduced the Inception architecture. It features +multiple parallel convolutional filters of different sizes within Inception modules, allowing efficient feature +extraction at various scales. The network uses 1x1 convolutions for dimensionality reduction, making it computationally +efficient. GoogLeNet achieved state-of-the-art performance in image classification tasks while maintaining relatively +low computational complexity. Its innovative design has influenced many subsequent CNN architectures in computer vision. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -24,14 +32,15 @@ imagenet ``` ## Model Training -### One single GPU + ```bash +# One single GPU python3 train.py --data-path /path/to/imagenet --model googlenet --batch-size 512 -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model googlenet --batch-size 512 --wd 0.000001 ``` ## References -https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py + +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py) diff --git a/cv/classification/inceptionv3/mindspore/README.md b/cv/classification/inceptionv3/mindspore/README.md index bb6ede650..45c8a0701 100644 --- a/cv/classification/inceptionv3/mindspore/README.md +++ b/cv/classification/inceptionv3/mindspore/README.md @@ -1,26 +1,20 @@ # InceptionV3 ## Model Description -InceptionV3 is a convolutional neural network architecture from the Inception family that makes several improvements including using Label Smoothing, Factorized 7 x 7 convolutions, and the use of an auxiliary classifier to propagate label information lower down the network (along with the use of batch normalization for layers in the sidehead). -## Model Preparation +InceptionV3 is an advanced convolutional neural network architecture that improves upon previous Inception models with +several key innovations. It introduces factorized convolutions, label smoothing, and an auxiliary classifier to enhance +feature extraction and training stability. The network utilizes batch normalization in side branches to improve gradient +flow and convergence. InceptionV3 achieves state-of-the-art performance in image classification tasks while maintaining +computational efficiency, making it suitable for various computer vision applications requiring high accuracy and robust +feature learning. -### Install Dependencies +## Model Preparation -```bash -yum install -y mesa-libGL -pip3 install -r requirements.txt -wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz -tar xf openmpi-4.0.7.tar.gz -cd openmpi-4.0.7/ -./configure --prefix=/usr/local/bin --with-orte -make -j4 && make install -export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH -export PATH=/usr/local/openmpi/bin:$PATH -``` +### Prepare Resources -## Step 2: Preparing Datasets -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -38,36 +32,45 @@ imagenet └── val_list.txt ``` +### Install Dependencies -## Step 3: Training -```shell -ln -sf $(which python3) $(which python) +```bash +yum install -y mesa-libGL +pip3 install -r requirements.txt +wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz +tar xf openmpi-4.0.7.tar.gz +cd openmpi-4.0.7/ +./configure --prefix=/usr/local/bin --with-orte +make -j4 && make install +export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH +export PATH=/usr/local/openmpi/bin:$PATH ``` -### On single GPU -```shell -bash scripts/run_standalone_train_gpu.sh DEVICE_ID DATA_DIR CKPT_PATH -# example: bash scripts/run_standalone_train_gpu.sh /path/to/imagenet/train ./ckpt/ -``` +## Model Training -### Multiple GPUs on one machine ```shell -bash scripts/run_distribute_train_gpu.sh DATA_DIR CKPT_PATH -# example: bash scripts/run_distribute_train_gpu.sh /path/to/imagenet/train ./ckpt/ -``` +ln -sf $(which python3) $(which python) -### Use checkpoint to eval -```shell +# On single GPU +## bash scripts/run_standalone_train_gpu.sh DEVICE_ID DATA_DIR CKPT_PATH +bash scripts/run_standalone_train_gpu.sh /path/to/imagenet/train ./ckpt/ + +# Multiple GPUs on one machine +## bash scripts/run_distribute_train_gpu.sh DATA_DIR CKPT_PATH +bash scripts/run_distribute_train_gpu.sh /path/to/imagenet/train ./ckpt/ + +# Evaluation cd scripts/ DEVICE_ID=0 bash run_eval_gpu.sh $DEVICE_ID /path/to/imagenet/val/ /path/to/checkpoint ``` ## Model Results -| GPUS | ACC (epoch 108) | FPS | -| ----------| --------------------------| ----- | -| BI V100×4 | 'Loss': 3.9033, 'Top1-Acc': 0.4847, 'Top5-Acc': 0.7405 | 447.2 | +| Model | GPU | epoch | Loss | ACC | FPS | +|-------------|-----------|-------|--------|----------------------------------------|-------| +| InceptionV3 | BI-V100×4 | 108 | 3.9033 | 'Top1-Acc': 0.4847, 'Top5-Acc': 0.7405 | 447.2 | ## References -- [MindSpore Models](https://gitee.com/mindspore/models/tree/master/official/) \ No newline at end of file + +- [mindspore/models](https://gitee.com/mindspore/models/tree/master/official/) diff --git a/cv/classification/inceptionv3/pytorch/README.md b/cv/classification/inceptionv3/pytorch/README.md index fae45d54a..c8f869658 100644 --- a/cv/classification/inceptionv3/pytorch/README.md +++ b/cv/classification/inceptionv3/pytorch/README.md @@ -11,7 +11,7 @@ Inception-v3 is a convolutional neural network architecture from the Inception f pip3 install -r requirements.txt ``` -## Step 2: Preparing datasets +### Prepare Resources Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. @@ -31,7 +31,7 @@ imagenet └── val_list.txt ``` -## Step 3: Training +## Model Training ```bash @@ -43,4 +43,4 @@ bash train_inception_v3_amp_dist.sh ``` ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/inceptionv3/tensorflow/README.md b/cv/classification/inceptionv3/tensorflow/README.md index 4aa148b79..ae20907e7 100644 --- a/cv/classification/inceptionv3/tensorflow/README.md +++ b/cv/classification/inceptionv3/tensorflow/README.md @@ -12,7 +12,7 @@ InceptionV3 is a convolutional neural network architecture from the Inception fa pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger ``` -## Step 2: Preparing datasets +### Prepare Resources Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. @@ -39,7 +39,7 @@ Refer below links to convert ImageNet data to TFrecord data. Put the TFrecord data in "./imagenet_tfrecord" directory. -## Step 3: Training +## Model Training ```bash # 1 GPU diff --git a/cv/classification/inceptionv4/pytorch/README.md b/cv/classification/inceptionv4/pytorch/README.md index e7287e4fb..625b75f18 100644 --- a/cv/classification/inceptionv4/pytorch/README.md +++ b/cv/classification/inceptionv4/pytorch/README.md @@ -1,13 +1,20 @@ # InceptionV4 ## Model Description -Inception-v4 is a convolutional neural network architecture that builds on previous iterations of the Inception family by simplifying the architecture and using more inception modules than Inception-v3. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +InceptionV4 is an advanced convolutional neural network architecture that refines the Inception family of models. It +simplifies previous designs while incorporating more inception modules for enhanced feature extraction. The architecture +achieves state-of-the-art performance in image classification tasks by efficiently balancing model depth and width. +InceptionV4 demonstrates improved accuracy over its predecessors while maintaining computational efficiency, making it +suitable for various computer vision applications. Its design focuses on optimizing network structure for better feature +representation and classification performance. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,18 +32,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine (AMP) + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_inceptionv4_amp_dist.sh ``` -:beers: Done! - - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/internimage/pytorch/README.md b/cv/classification/internimage/pytorch/README.md index 5a30a3f4f..6f61e8052 100644 --- a/cv/classification/internimage/pytorch/README.md +++ b/cv/classification/internimage/pytorch/README.md @@ -1,15 +1,43 @@ -# InternImage for Image Classification +# InternImage ## Model Description -"INTERN-2.5" is a powerful multimodal multitask general model jointly released by SenseTime and Shanghai AI Laboratory. It consists of large-scale vision foundation model "InternImage", pre-training method "M3I-Pretraining", generic decoder "Uni-Perceiver" series, and generic encoder for autonomous driving perception "BEVFormer" series. +InternImage is a large-scale vision foundation model developed by SenseTime and Shanghai AI Laboratory. It's part of the +INTERN-2.5 multimodal multitask general model, designed for comprehensive visual understanding tasks. The architecture +leverages advanced techniques to achieve state-of-the-art performance in image classification and other vision tasks. +InternImage demonstrates exceptional scalability and efficiency, making it suitable for various applications from +general image recognition to complex autonomous driving perception systems. Its design focuses on balancing model +capacity with computational efficiency. -## Step 1: Installing +## Model Preparation -### Environment Preparation +### Prepare Resources -- `CUDA>=10.2` with `cudnn>=7` -- `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2` +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` + +### Install Dependencies + +Environment Preparation. + +- `CUDA>=10.2` with `cudnn>=7` +- `PyTorch>=1.10.0` and `torchvision>=0.9.0` with `CUDA>=10.2` ```bash # Install libGL @@ -18,50 +46,27 @@ yum install -y mesa-libGL ## Ubuntu apt install -y libgl1-mesa-glx -## Install mmcv +# Install mmcv cd mmcv/ bash clean_mmcv.sh bash build_mmcv.sh bash install_mmcv.sh cd ../ -## Install timm and mmdet +# Install timm and mmdet pip3 install timm==0.6.11 mmdet==2.28.1 -``` -- Install other requirements: - -```bash +# Install other requirements: pip3 install addict yapf opencv-python termcolor yacs pyyaml scipy -``` -- Compiling CUDA operators -```bash +# Compiling CUDA operators cd ./ops_dcnv3 sh ./make.sh + # unit test (should see all checking is True) python3 test.py -cd ../ -``` - -### Data Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt +cd ../ ``` ## Model Training @@ -79,13 +84,13 @@ python3 main.py --cfg configs/internimage_t_1k_224.yaml --data-path /path/to/ima ``` -## Result +## Model Results -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1 83.440 fps 252 | -| 1 card | fps 31 | +| Model | GPU | FP32 | +|-------------|------------|--------------------------| +| InternImage | BI-V100 x8 | Acc@1 83.440 fps 252 | +| InternImage | BI-V100 x1 | fps 31 | ## References -https://github.com/OpenGVLab/InternImage +- [InternImage](https://github.com/OpenGVLab/InternImage) diff --git a/cv/classification/lenet/pytorch/README.md b/cv/classification/lenet/pytorch/README.md index 8c609c550..a6b0bcd6f 100755 --- a/cv/classification/lenet/pytorch/README.md +++ b/cv/classification/lenet/pytorch/README.md @@ -1,11 +1,19 @@ # LeNet ## Model Description -LeNet is a classic convolutional neural network employing the use of convolutions, pooling and fully connected layers. It was used for the handwritten digit recognition task with the MNIST dataset. The architectural design served as inspiration for future networks such as AlexNet and VGG. -## Step 1: Preparing +LeNet is a pioneering convolutional neural network architecture developed for handwritten digit recognition. It +introduced fundamental concepts like convolutional layers, pooling, and fully connected layers, laying the groundwork +for modern deep learning. Designed for the MNIST dataset, LeNet demonstrated the effectiveness of CNNs for image +recognition tasks. Its simple yet effective architecture inspired subsequent networks like AlexNet and VGG, making it a +cornerstone in the evolution of deep learning for computer vision applications. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -24,14 +32,15 @@ imagenet ``` ## Model Training -### One single GPU + ```bash +# One single GPU python3 train.py --data-path /path/to/imagenet --model lenet -``` -### 8 GPUs on one machine -```bash + +# 8 GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model lenet ``` ## References -http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf + +- [Paper](http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf) diff --git a/cv/classification/mobilenetv2/pytorch/README.md b/cv/classification/mobilenetv2/pytorch/README.md index f85c60876..3f1723793 100644 --- a/cv/classification/mobilenetv2/pytorch/README.md +++ b/cv/classification/mobilenetv2/pytorch/README.md @@ -1,14 +1,19 @@ # MobileNetV2 ## Model Description -MobileNetV2 is a convolutional neural network architecture that seeks to perform well on mobile devices. It is based on an inverted residual structure where the residual connections are between the bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. As a whole, the architecture of MobileNetV2 contains the initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +MobileNetV2 is an efficient convolutional neural network designed for mobile and embedded vision applications. It +introduces inverted residual blocks with linear bottlenecks, using depthwise separable convolutions to reduce +computational complexity. This architecture maintains high accuracy while significantly decreasing model size and +latency compared to traditional CNNs. MobileNetV2's design focuses on balancing performance and efficiency, making it +ideal for real-time applications on resource-constrained devices like smartphones and IoT devices. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,15 +31,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine (AMP) + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_mobilenet_v2_amp_dist.sh ``` - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/mobilenetv3/mindspore/README.md b/cv/classification/mobilenetv3/mindspore/README.md index a7aff689c..068999ce2 100644 --- a/cv/classification/mobilenetv3/mindspore/README.md +++ b/cv/classification/mobilenetv3/mindspore/README.md @@ -1,32 +1,20 @@ # MobileNetV3 ## Model Description -MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware- aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances.Nov 20, 2019. -[Paper](https://arxiv.org/pdf/1905.02244) Howard, Andrew, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang et al. "Searching for mobilenetv3." In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324. 2019. +MobileNetV3 is an efficient convolutional neural network optimized for mobile devices, combining hardware-aware neural +architecture search with novel design techniques. It introduces improved nonlinearities and efficient network structures +to reduce computational complexity while maintaining accuracy. MobileNetV3 achieves state-of-the-art performance in +mobile vision tasks, offering variants for different computational budgets. Its design focuses on minimizing latency and +power consumption, making it ideal for real-time applications on resource-constrained devices like smartphones and +embedded systems. ## Model Preparation -### Install Dependencies +### Prepare Resources -```bash -# Install requirements -pip3 install easydict -yum install mesa-libGL - -# Install openmpi -wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz -tar xf openmpi-4.0.7.tar.gz -cd openmpi-4.0.7/ -./configure --prefix=/usr/local/bin --with-orte -make -j4 && make install -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ -``` - - -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -43,7 +31,24 @@ imagenet │ └── ... └── val_list.txt ``` -## Step 3: Training + +### Install Dependencies + +```bash +# Install requirements +pip3 install easydict +yum install mesa-libGL + +# Install openmpi +wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz +tar xf openmpi-4.0.7.tar.gz +cd openmpi-4.0.7/ +./configure --prefix=/usr/local/bin --with-orte +make -j4 && make install +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ +``` + +## Model Training ```bash cd ../scripts @@ -53,21 +58,18 @@ bash run_train.sh GPU 1 0 /path/to/imagenet/train/ # 8 GPUs bash run_train.sh GPU 8 0,1,2,3,4,5,6,7 /path/to/imagenet/train/ -``` -## Step 4: Inference -```bash +# Inference bash run_infer.sh GPU /path/to/imagenet/val/ ../train/checkpointckpt_0/mobilenetv3-300_2135.ckpt ``` ## Model Results -
- -| GPUS | ACC (ckpt107) | FPS | -| ---------- | ---------- | ---- | -| BI-V100 ×8 | 0.55 | 378.43 | -
+| Model | GPU | ACC (ckpt107) | FPS | +|-------------|------------|---------------|--------| +| MobileNetV3 | BI-V100 ×8 | 0.55 | 378.43 | +| | ## References -- [mindspore/models](https://gitee.com/mindspore/models) \ No newline at end of file + +- [mindspore/models](https://gitee.com/mindspore/models) diff --git a/cv/classification/mobilenetv3/paddlepaddle/README.md b/cv/classification/mobilenetv3/paddlepaddle/README.md index 0247cb51a..1b707203d 100644 --- a/cv/classification/mobilenetv3/paddlepaddle/README.md +++ b/cv/classification/mobilenetv3/paddlepaddle/README.md @@ -1,20 +1,20 @@ # MobileNetV3 + ## Model Description -MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. -## Step 1: Installing -``` -git clone https://github.com/PaddlePaddle/PaddleClas.git -``` +MobileNetV3 is an efficient convolutional neural network optimized for mobile devices, combining hardware-aware neural +architecture search with novel design techniques. It introduces improved nonlinearities and efficient network structures +to reduce computational complexity while maintaining accuracy. MobileNetV3 achieves state-of-the-art performance in +mobile vision tasks, offering variants for different computational budgets. Its design focuses on minimizing latency and +power consumption, making it ideal for real-time applications on resource-constrained devices like smartphones and +embedded systems. -```bash -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Prepare Datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -32,9 +32,20 @@ imagenet └── val_list.txt ``` -## Step 3: Training -**Notice**: modify PaddleClas/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml file, modify the datasets path as yours. +### Install Dependencies + +```bash +git clone https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt ``` + +## Model Training + +**Notice**: modify PaddleClas/ppcls/configs/ImageNet/MobileNetV3/MobileNetV3_small_x1_25.yaml file, modify the datasets +path as yours. + +```bash cd PaddleClas export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True @@ -43,4 +54,5 @@ python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/c ``` ## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/mobilenetv3/pytorch/README.md b/cv/classification/mobilenetv3/pytorch/README.md index 38ae3ba41..29c69a342 100644 --- a/cv/classification/mobilenetv3/pytorch/README.md +++ b/cv/classification/mobilenetv3/pytorch/README.md @@ -1,18 +1,20 @@ # MobileNetV3 ## Model Description -MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. -## Model Preparation +MobileNetV3 is an efficient convolutional neural network optimized for mobile devices, combining hardware-aware neural +architecture search with novel design techniques. It introduces improved nonlinearities and efficient network structures +to reduce computational complexity while maintaining accuracy. MobileNetV3 achieves state-of-the-art performance in +mobile vision tasks, offering variants for different computational budgets. Its design focuses on minimizing latency and +power consumption, making it ideal for real-time applications on resource-constrained devices like smartphones and +embedded systems. -### Install Dependencies -```bash -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,13 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training ```bash # Set data path @@ -41,4 +49,5 @@ bash train_mobilenet_v3_large_amp_dist.sh ``` ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification#mobilenetv3-large--small) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification#mobilenetv3-large--small) diff --git a/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md b/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md index 0c13400c2..996ddf9f6 100644 --- a/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md +++ b/cv/classification/mobilenetv3_large_x1_0/paddlepaddle/README.md @@ -1,27 +1,20 @@ # MobileNetV3_large_x1_0 ## Model Description -MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances. Advances include (1) complementary search techniques, (2) new efficient versions of nonlinearities practical for the mobile setting, (3) new efficient network design. -## Model Preparation +MobileNetV3_large_x1_0 is an efficient convolutional neural network optimized for mobile devices. It combines +hardware-aware neural architecture search with novel design techniques, including improved nonlinearities and efficient +network structures. This variant offers a balance between accuracy and computational efficiency, achieving 74.9% top-1 +accuracy on ImageNet. Its design focuses on reducing latency while maintaining performance, making it suitable for +mobile applications. MobileNetV3_large_x1_0 serves as a general-purpose backbone for various computer vision tasks on +resource-constrained devices. -### Install Dependencies -``` -git clone https://github.com/PaddlePaddle/PaddleClas.git -``` - -```bash -cd PaddleClas -yum install mesa-libGL -y +## Model Preparation -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 -python3 setup.py install -``` +### Prepare Resources -## Step 2: Preparing Datasets -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -39,11 +32,25 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +yum install mesa-libGL -y + +git clone https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt +python3 setup.py install + +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 +``` + +## Model Training ```bash # Make sure your dataset path is the same as above -cd PaddleClas +cd PaddleClas/ # Link your dataset to default location ln -s /path/to/imagenet ./dataset/ILSVRC2012 @@ -54,10 +61,11 @@ python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/c ``` ## Model Results -| GPUs | Top1 | Top5 | ips | -|-------------|-------------|----------------|----------| -| BI-V100 x 4 | 0.749 | 0.922 | 512 samples/s | +| Model | GPU | Top1 | Top5 | ips | +|------------------------|------------|-------|-------|---------------| +| MobileNetV3_large_x1_0 | BI-V100 x4 | 0.749 | 0.922 | 512 samples/s | ## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/mobileone/pytorch/README.md b/cv/classification/mobileone/pytorch/README.md index d4eb8acde..f1770469e 100644 --- a/cv/classification/mobileone/pytorch/README.md +++ b/cv/classification/mobileone/pytorch/README.md @@ -1,18 +1,38 @@ # MobileOne -> [An Improved One millisecond Mobile Backbone](https://arxiv.org/abs/2206.04040) - ## Model Description -Mobileone is proposed by apple and based on reparameterization. On the apple chips, the accuracy of the model is close to 0.76 on the ImageNet dataset when the latency is less than 1ms. Its main improvements based on [RepVGG](../repvgg) are fllowing: +MobileOne is an efficient neural network backbone designed for mobile devices, focusing on real-world latency rather +than just FLOPs or parameter count. It uses reparameterization with depthwise and pointwise convolutions, optimizing for +speed on mobile chips. Achieving under 1ms inference time on iPhone 12 with 75.9% ImageNet accuracy, MobileOne +outperforms other efficient architectures in both speed and accuracy. It's versatile for tasks like image +classification, object detection, and segmentation, making it ideal for mobile deployment. -- Reparameterization using Depthwise convolution and Pointwise convolution instead of normal convolution. -- Removal of the residual structure which is not friendly to access memory. +## Model Preparation +### Prepare Resources -Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural networks and provide ways to mitigate these bottlenecks. To this end, we design an efficient backbone MobileOne, with variants achieving an inference time under 1 ms on an iPhone12 with 75.9% top-1 accuracy on ImageNet. We show that MobileOne achieves state-of-the-art performance within the efficient architectures while being many times faster on mobile. Our best model obtains similar performance on ImageNet as MobileFormer while being 38x faster. Our model obtains 2.3% better top-1 accuracy on ImageNet than EfficientNet at similar latency. Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device. +Prepare your dataset according to the +[docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -## Model Preparation +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. + +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` ### Install Dependencies @@ -41,39 +61,18 @@ sed -i 's/python /python3 /g' tools/dist_train.sh python3 setup.py install ``` -## Step 2: Preparing datasets - -Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. -Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt -``` - -## Step 3: Training +## Model Training ```bash bash tools/dist_train.sh configs/mobileone/mobileone-s0_8xb32_in1k.py 8 ``` ## Model Results -| GPUs | FPS | TOP1 Accuracy | -| ------------ | --------- |-------------- | -| BI-V100 x8 | 1014 | 71.49 | + +| Model | GPU | FPS | TOP1 Accuracy | +|-----------|------------|------|---------------| +| MobileOne | BI-V100 x8 | 1014 | 71.49 | ## References -- [mmpretrain](https://github.com/open-mmlab/mmpretrain/) +- [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/mocov2/pytorch/README.md b/cv/classification/mocov2/pytorch/README.md index 70e654819..9c81bd977 100644 --- a/cv/classification/mocov2/pytorch/README.md +++ b/cv/classification/mocov2/pytorch/README.md @@ -1,13 +1,39 @@ # MoCoV2 -> [Improved Baselines with Momentum Contrastive Learning](https://arxiv.org/abs/2003.04297) +## Model Description +MoCoV2 is an improved version of Momentum Contrast (MoCo) for unsupervised learning, combining the strengths of +contrastive learning with momentum-based updates. It introduces an MLP projection head and enhanced data augmentation +techniques to boost performance without requiring large batch sizes. This approach enables effective feature learning +from unlabeled data, establishing strong baselines for self-supervised learning. MoCoV2 outperforms previous methods +like SimCLR while maintaining computational efficiency, making it accessible for various computer vision tasks. -## Model Description +## Model Preparation + +### Prepare Resources + +Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. -Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. With simple modifications to MoCo—namely, using an MLP projection head and more data augmentation—we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. +The ImageNet dataset path structure should look like: + +```bash +imagenet +├── train +│ └── n01440764 +│ ├── n01440764_10026.JPEG +│ └── ... +├── train_list.txt +├── val +│ └── n01440764 +│ ├── ILSVRC2012_val_00000293.JPEG +│ └── ... +└── val_list.txt +``` -## Installation +### Install Dependencies ```bash # Install libGL @@ -34,29 +60,7 @@ sed -i 's/python /python3 /g' tools/dist_train.sh python3 setup.py install ``` -## Preparing datasets - -Prepare your dataset according to the [docs](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html#prepare-dataset). -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. -Specify `/path/to/imagenet` to your ImageNet path in later training process. - -The ImageNet dataset path structure should look like: - -```bash -imagenet -├── train -│ └── n01440764 -│ ├── n01440764_10026.JPEG -│ └── ... -├── train_list.txt -├── val -│ └── n01440764 -│ ├── ILSVRC2012_val_00000293.JPEG -│ └── ... -└── val_list.txt -``` - -## Training +## Model Training ```bash # get mocov2_resnet50_8xb32-coslr-200e_in1k_20220825-b6d23c86.pth @@ -74,9 +78,11 @@ bash tools/dist_train.sh configs/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py ``` ## Model Results -| Model | FPS | TOP1 Accuracy | -| ------------ | --------- |--------------| -| BI-V100 x8 | 4663 | 67.50 | + + | Model | GPU | FPS | TOP1 Accuracy | + |--------|------------|------|---------------| + | MoCoV2 | BI-V100 x8 | 4663 | 67.50 | ## References + - [mmpretrain](https://github.com/open-mmlab/mmpretrain/) diff --git a/cv/classification/pp-lcnet/paddlepaddle/README.md b/cv/classification/pp-lcnet/paddlepaddle/README.md index cc4c9ebd0..54decf1f0 100644 --- a/cv/classification/pp-lcnet/paddlepaddle/README.md +++ b/cv/classification/pp-lcnet/paddlepaddle/README.md @@ -1,22 +1,20 @@ -# PP-LCNet: A Lightweight CPU Convolutional Neural Network +# PP-LCNet ## Model Description -We propose a lightweight CPU network based on the MKLDNN acceleration strategy, named PP-LCNet, which improves the performance of lightweight models on multiple tasks. This paper lists technologies which can improve network accuracy while the latency is almost constant. With these improvements, the accuracy of PP-LCNet can greatly surpass the previous network structure with the same inference time for classification. It outperforms the most state-of-the-art models. And for downstream tasks of computer vision, it also performs very well, such as object detection, semantic segmentation, etc. All our experiments are implemented based on PaddlePaddle. Code and pretrained models are available at PaddleClas. -## Model Preparation - -### Install Dependencies +PP-LCNet is a lightweight CPU-optimized neural network designed for efficient inference on edge devices. It leverages +MKLDNN acceleration strategies to enhance performance while maintaining low latency. The architecture achieves +state-of-the-art accuracy for lightweight models in image classification tasks and performs well in downstream computer +vision applications like object detection and semantic segmentation. PP-LCNet's design focuses on maximizing accuracy +with minimal computational overhead, making it ideal for resource-constrained environments requiring fast and efficient +inference. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -python3 setup.py install -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -34,7 +32,16 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +python3 setup.py install +``` + +## Model Training ```bash # Make sure your dataset path is the same as above @@ -47,16 +54,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Model Results on BI-V100 - -
- -| Method | Crop Size | FPS (BI x 4) | TOP1 Accuracy | -| ------ | --------- | -------- |--------------:| -| PPLCNet_x1_0 | 224x224 | 2537 | 0.7062 | +## Model Results -
+| Model | GPU | Crop Size | FPS | TOP1 Accuracy | +|--------------|------------|-----------|------|---------------| +| PPLCNet_x1_0 | BI-V100 x4 | 224x224 | 2537 | 0.7062 | ## References -- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) +- [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/repmlp/pytorch/README.md b/cv/classification/repmlp/pytorch/README.md index c96928cf6..fbc158f2d 100644 --- a/cv/classification/repmlp/pytorch/README.md +++ b/cv/classification/repmlp/pytorch/README.md @@ -1,22 +1,20 @@ # RepMLP ## Model Description -RepMLP, a multi-layer-perceptron-style neural network building block for image recognition, which is composed of a series of fully-connected (FC) layers. Compared to convolutional layers, FC layers are more efficient, better at modeling the long-range dependencies and positional patterns, but worse at capturing the local structures, hence usually less favored for image recognition. Construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. -## Model Preparation - -### Install Dependencies +RepMLP is an innovative neural network architecture that combines the strengths of fully-connected (FC) layers and +convolutional operations. It uses FC layers for efficient long-range dependency modeling while incorporating +convolutional layers during training to capture local structures. Through structural re-parameterization, RepMLP merges +these components into pure FC layers for inference, achieving both high accuracy and computational efficiency. This +architecture is particularly effective for image recognition tasks, offering a novel approach to balance global and +local feature learning. -```bash -pip3 install timm yacs -git clone https://github.com/DingXiaoH/RepMLP.git -cd RepMLP -git checkout 3eff13fa0257af28663880d870f327d665f0a8e2 -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -34,7 +32,16 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install timm yacs +git clone https://github.com/DingXiaoH/RepMLP.git +cd RepMLP +git checkout 3eff13fa0257af28663880d870f327d665f0a8e2 +``` + +## Model Training ```bash # fix --local-rank for torch 2.x @@ -48,9 +55,9 @@ python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12349 main_ ## Model Results -|GPUs|FPS|ACC| -|----|---|---| -|BI-V100 x8|319|epoch 40: 64.866%| +| Model | GPU | FPS | ACC | +|--------|------------|-----|-------------------| +| RepMLP | BI-V100 x8 | 319 | epoch 40: 64.866% | ## References diff --git a/cv/classification/repvgg/paddlepaddle/README.md b/cv/classification/repvgg/paddlepaddle/README.md index 28a99fea4..0f638266b 100644 --- a/cv/classification/repvgg/paddlepaddle/README.md +++ b/cv/classification/repvgg/paddlepaddle/README.md @@ -1,18 +1,20 @@ # RepVGG + ## Model Description - A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. -## Step 1: Installing +RepVGG is a simple yet powerful convolutional neural network architecture that combines training-time multi-branch +topology with inference-time VGG-like simplicity. It uses structural re-parameterization to convert complex training +models into efficient inference models composed solely of 3x3 convolutions and ReLU activations. This approach achieves +state-of-the-art performance in image classification tasks while maintaining high speed and efficiency. RepVGG's design +is particularly suitable for applications requiring both high accuracy and fast inference, making it ideal for +real-world deployment scenarios. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run RepVGG +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash cd PaddleClas @@ -42,6 +52,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ppcls/configs/ImageNet/RepVGG/RepVGG_A0.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1=0.6990 | +## Model Results + +| Model | GPU | FP32 | +|--------|------------|--------------| +| RepVGG | BI-V100 x8 | Acc@1=0.6990 | diff --git a/cv/classification/repvgg/pytorch/README.md b/cv/classification/repvgg/pytorch/README.md index 8061fdbcc..2b23b995f 100755 --- a/cv/classification/repvgg/pytorch/README.md +++ b/cv/classification/repvgg/pytorch/README.md @@ -1,19 +1,20 @@ # RepVGG + ## Model Description - A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. -## Step 1: Installing +RepVGG is a simple yet powerful convolutional neural network architecture that combines training-time multi-branch +topology with inference-time VGG-like simplicity. It uses structural re-parameterization to convert complex training +models into efficient inference models composed solely of 3x3 convolutions and ReLU activations. This approach achieves +state-of-the-art performance in image classification tasks while maintaining high speed and efficiency. RepVGG's design +is particularly suitable for applications requiring both high accuracy and fast inference, making it ideal for +real-world deployment scenarios. -```bash -pip3 install timm yacs -git clone https://github.com/DingXiaoH/RepVGG.git -cd RepVGG -git checkout eae7c5204001eaf195bbe2ee72fb6a37855cce33 -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -31,7 +32,16 @@ imagenet └── val_list.txt ``` -## Step 3: Run RepVGG +### Install Dependencies + +```bash +pip3 install timm yacs +git clone https://github.com/DingXiaoH/RepVGG.git +cd RepVGG +git checkout eae7c5204001eaf195bbe2ee72fb6a37855cce33 +``` + +## Model Training ```bash # fix --local-rank for torch 2.x @@ -44,7 +54,10 @@ sed -i "s@dataset = torchvision.datasets.ImageNet(root=config.DATA.DATA_PATH, sp python3 -m torch.distributed.launch --nproc_per_node 4 --master_port 12349 main.py --arch RepVGG-A0 --data-path ./imagenet --batch-size 32 --tag train_from_scratch --output ./ --opts TRAIN.EPOCHS 300 TRAIN.BASE_LR 0.1 TRAIN.WEIGHT_DECAY 1e-4 TRAIN.WARMUP_EPOCHS 5 MODEL.LABEL_SMOOTHING 0.1 AUG.PRESET weak AUG.MIXUP 0.0 DATA.DATASET imagenet DATA.IMG_SIZE 224 ``` -The original RepVGG models were trained in 120 epochs with cosine learning rate decay from 0.1 to 0. We used 8 GPUs, global batch size of 256, weight decay of 1e-4 (no weight decay on fc.bias, bn.bias, rbr_dense.bn.weight and rbr_1x1.bn.weight) (weight decay on rbr_identity.weight makes little difference, and it is better to use it in most of the cases), and the same simple data preprocssing as the PyTorch official example: +The original RepVGG models were trained in 120 epochs with cosine learning rate decay from 0.1 to 0. We used 8 GPUs, +global batch size of 256, weight decay of 1e-4 (no weight decay on fc.bias, bn.bias, rbr_dense.bn.weight and +rbr_1x1.bn.weight) (weight decay on rbr_identity.weight makes little difference, and it is better to use it in most of +the cases), and the same simple data preprocssing as the PyTorch official example: ```py trans = transforms.Compose([ @@ -54,16 +67,15 @@ The original RepVGG models were trained in 120 epochs with cosine learning rate transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) ``` -The valid model names include (--arch [model name]) +The valid model names include (--arch [model name]): -``` -RepVGGplus-L2pse, RepVGG-A0, RepVGG-A1, RepVGG-A2, RepVGG-B0, RepVGG-B1, RepVGG-B1g2, RepVGG-B1g4, RepVGG-B2, RepVGG-B2g2, RepVGG-B2g4, RepVGG-B3, RepVGG-B3g2, RepVGG-B3g4 -``` +RepVGGplus-L2pse, RepVGG-A0, RepVGG-A1, RepVGG-A2, RepVGG-B0, RepVGG-B1, RepVGG-B1g2, RepVGG-B1g4, RepVGG-B2, +RepVGG-B2g2, RepVGG-B2g4, RepVGG-B3, RepVGG-B3g2, RepVGG-B3g4. -| model | GPU | FP32 | -|----------| ----------- | ------------------------------------ | -| RepVGG-A0| 8 cards | Acc@1=0.7241 | +| Model | GPU | FP32 | +|-----------|------------|--------------| +| RepVGG-A0 | BI-V100 x8 | Acc@1=0.7241 | -## Reference +## References -- [RepMLP](https://github.com/DingXiaoH/RepVGG/tree/eae7c5204001eaf195bbe2ee72fb6a37855cce33) \ No newline at end of file +- [RepVGG](https://github.com/DingXiaoH/RepVGG/tree/eae7c5204001eaf195bbe2ee72fb6a37855cce33) diff --git a/cv/classification/repvit/pytorch/README.md b/cv/classification/repvit/pytorch/README.md index 3438ba680..05c43e7fe 100644 --- a/cv/classification/repvit/pytorch/README.md +++ b/cv/classification/repvit/pytorch/README.md @@ -1,26 +1,19 @@ -# RepViT -> [RepViT: Revisiting Mobile CNN From ViT Perspective](https://arxiv.org/abs/2307.09283) - - +# RepViT ## Model Description -Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on resource-constrained mobile devices. This improvement is usually attributed to the multi-head self-attention module, which enables the model to learn global representations. However, the architectural disparities between lightweight ViTs and lightweight CNNs have not been adequately examined. In this study, we revisit the efficient design of lightweight CNNs and emphasize their potential for mobile devices. We incrementally enhance the mobile-friendliness of a standard lightweight CNN, specifically MobileNetV3, by integrating the efficient architectural choices of lightweight ViTs. This ends up with a new family of pure lightweight CNNs, namely RepViT. Extensive experiments show that RepViT outperforms existing state-of-the-art lightweight ViTs and exhibits favorable latency in various vision tasks. On ImageNet, RepViT achieves over 80\% top-1 accuracy with 1ms latency on an iPhone 12, which is the first time for a lightweight model, to the best of our knowledge. Our largest model, RepViT-M2.3, obtains 83.7\% accuracy with only 2.3ms latency. +RepViT is an efficient lightweight vision model that combines the strengths of CNNs and Transformers for mobile devices. +It enhances MobileNetV3 architecture with Transformer-inspired design choices, achieving superior performance and lower +latency than lightweight ViTs. RepViT demonstrates state-of-the-art accuracy on ImageNet while maintaining fast +inference speeds, making it ideal for resource-constrained applications. Its pure CNN architecture ensures +mobile-friendliness, with the largest variant achieving 83.7% accuracy at just 2.3ms latency on an iPhone 12. ## Model Preparation -### Install Dependencies - -```bash -git clone https://github.com/THU-MIG/RepViT.git -cd RepViT -git checkout 298f42075eda5d2e6102559fad260c970769d34e -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in the later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in the later training process. The ImageNet dataset path structure should look like: @@ -38,7 +31,16 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone https://github.com/THU-MIG/RepViT.git +cd RepViT +git checkout 298f42075eda5d2e6102559fad260c970769d34e +pip3 install -r requirements.txt +``` + +## Model Training ```bash # On single GPU @@ -47,8 +49,10 @@ python3 main.py --model repvit_m0_9 --data-path /path/to/imagenet --dist-eval # Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --master_port 12346 --use_env main.py --model repvit_m0_9 --data-path /path/to/imagenet --dist-eval ``` -Tips: -- Specify your data path and model name! + +Tips: + +- Specify your data path and model name! - Choose "3" when getting the output log below during training. ```bash @@ -58,9 +62,11 @@ wandb: (3) Don't visualize my results ``` ## Model Results -|GPUs|FPS|ACC| -|:---:|:---:|:---:| -|BI-V100 x8|1.5984 s / it| Acc@1 78.53% | + +| Model | GPU | FPS | ACC | +|--------|------------|---------------|--------------| +| RepViT | BI-V100 x8 | 1.5984 s / it | Acc@1 78.53% | ## References -[RepViT](https://github.com/THU-MIG/RepViT/tree/298f42075eda5d2e6102559fad260c970769d34e) + +- [RepViT](https://github.com/THU-MIG/RepViT/tree/298f42075eda5d2e6102559fad260c970769d34e) diff --git a/cv/classification/res2net50_14w_8s/paddlepaddle/README.md b/cv/classification/res2net50_14w_8s/paddlepaddle/README.md index dcf813466..c34af0204 100644 --- a/cv/classification/res2net50_14w_8s/paddlepaddle/README.md +++ b/cv/classification/res2net50_14w_8s/paddlepaddle/README.md @@ -1,22 +1,19 @@ # Res2Net50_14w_8s ## Model Description -Res2Net is modified from the source code of ResNet. The main function of Res2Net is to add hierarchical connections within the block and indirectly increase the receptive field while reusing the feature map. -## Model Preparation -### Install Dependencies +Res2Net50_14w_8s is a convolutional neural network that enhances ResNet architecture by introducing hierarchical +residual-like connections within individual blocks. It increases the receptive field while reusing feature maps, +improving feature representation. The 14w_8s variant uses 14 width and 8 scales, achieving state-of-the-art performance +in image classification tasks. This architecture effectively balances model complexity and computational efficiency, +making it suitable for various computer vision applications requiring both high accuracy and efficient processing. -```bash -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 urllib3==1.26.6 -yum install -y mesa-libGL -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -34,7 +31,17 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 urllib3==1.26.6 +yum install -y mesa-libGL +``` + +## Model Training ```bash cd PaddleClas @@ -48,9 +55,10 @@ python3 -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ./ppcls/co ## Model Results -| GPUs | ACC | FPS -| ---------- | ------ | --- -| BI-V100 x8 | top1: 0.7943 | 338.29 images/sec +| Model | GPU | ACC | FPS | +|------------------|------------|--------------|-------------------| +| Res2Net50_14w_8s | BI-V100 x8 | top1: 0.7943 | 338.29 images/sec | ## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/resnest101/pytorch/README.md b/cv/classification/resnest101/pytorch/README.md index bb373fbe8..8177b6a51 100644 --- a/cv/classification/resnest101/pytorch/README.md +++ b/cv/classification/resnest101/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeSt101 ## Model Description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt101 is a deep convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It +introduces channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups +with adaptive feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks +by effectively balancing computational efficiency and model capacity. ResNeSt101's design is particularly suitable for +large-scale visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining +efficient training and inference capabilities. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine (AMP) + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_resnest101_amp_dist.sh ``` - - ## References -https://github.com/zhanghang1989/ResNeSt + +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnest14/pytorch/README.md b/cv/classification/resnest14/pytorch/README.md index 78f290141..230d10c6a 100644 --- a/cv/classification/resnest14/pytorch/README.md +++ b/cv/classification/resnest14/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeSt14 ## Model Description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt14 is a lightweight convolutional neural network that combines ResNet architecture with Split-Attention blocks. +It introduces channel-wise attention mechanisms to enhance feature representation, using adaptive feature aggregation +across multiple groups. The architecture achieves efficient performance in image classification tasks by balancing model +complexity and computational efficiency. ResNeSt14's design is particularly suitable for applications with limited +resources, offering improved accuracy over standard ResNet variants while maintaining fast training and inference +capabilities. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine (AMP) + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_resnest14_amp_dist.sh ``` - - ## References -https://github.com/zhanghang1989/ResNeSt + +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnest269/pytorch/README.md b/cv/classification/resnest269/pytorch/README.md index 594824707..d19cd53da 100644 --- a/cv/classification/resnest269/pytorch/README.md +++ b/cv/classification/resnest269/pytorch/README.md @@ -1,13 +1,20 @@ # ResNeSt269 + ## Model Description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt269 is an advanced convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It +introduces channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups +with adaptive feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks +by effectively balancing computational efficiency and model capacity. ResNeSt269's design is particularly suitable for +large-scale visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining +efficient training and inference capabilities. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,18 +32,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnest269_amp_dist.sh ``` -:beers: Done! - - ## References -https://github.com/zhanghang1989/ResNeSt + +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnest50/paddlepaddle/README.md b/cv/classification/resnest50/paddlepaddle/README.md index a60d69d87..9717d0caa 100644 --- a/cv/classification/resnest50/paddlepaddle/README.md +++ b/cv/classification/resnest50/paddlepaddle/README.md @@ -1,18 +1,20 @@ # ResNeSt50 + ## Model Description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing +ResNeSt50 is a convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It introduces +channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups with adaptive +feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks by effectively +balancing computational efficiency and model capacity. ResNeSt50's design is particularly suitable for large-scale +visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining efficient training +and inference capabilities. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run ResNeSt50 +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash cd PaddleClas @@ -42,6 +52,6 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ResNeSt/ResNeSt50.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 4 cards | Acc@1=0.7677 | +| Model | GPU | FP32 | +|-----------|------------|--------------| +| ResNeSt50 | BI-V100 x4 | Acc@1=0.7677 | diff --git a/cv/classification/resnest50/pytorch/README.md b/cv/classification/resnest50/pytorch/README.md index f8468b83a..8f3fba24a 100644 --- a/cv/classification/resnest50/pytorch/README.md +++ b/cv/classification/resnest50/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeSt50 ## Model Description -A ResNest is a variant on a ResNet, which instead stacks Split-Attention blocks. The cardinal group representations are then concatenated along the channel dimension.As in standard residual blocks, the final output of otheur Split-Attention block is produced using a shortcut connection. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeSt50 is a convolutional neural network that enhances ResNet architecture with Split-Attention blocks. It introduces +channel-wise attention mechanisms to improve feature representation, combining multiple feature-map groups with adaptive +feature aggregation. The architecture achieves state-of-the-art performance in image classification tasks by effectively +balancing computational efficiency and model capacity. ResNeSt50's design is particularly suitable for large-scale +visual recognition tasks, offering improved accuracy over standard ResNet variants while maintaining efficient training +and inference capabilities. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,15 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine (AMP) + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine (AMP) bash train_resnest50_amp_dist.sh ``` - ## References -https://github.com/zhanghang1989/ResNeSt + +- [ResNeSt](https://github.com/zhanghang1989/ResNeSt) diff --git a/cv/classification/resnet101/pytorch/README.md b/cv/classification/resnet101/pytorch/README.md index e11d2a4ab..ec5c7a7f5 100644 --- a/cv/classification/resnet101/pytorch/README.md +++ b/cv/classification/resnet101/pytorch/README.md @@ -1,13 +1,20 @@ # ResNet101 + ## Model Description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNet101 is a deep convolutional neural network with 101 layers, building upon the ResNet architecture's residual +learning framework. It extends ResNet50's capabilities with additional layers for more complex feature extraction. The +model uses skip connections to address vanishing gradient problems, enabling effective training of very deep networks. +ResNet101 achieves state-of-the-art performance in image classification tasks while maintaining computational +efficiency. Its architecture is widely used as a backbone for various computer vision applications, including object +detection and segmentation. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,15 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnet101_amp_dist.sh ``` - - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet152/pytorch/README.md b/cv/classification/resnet152/pytorch/README.md index da9acf9c3..d21818f66 100644 --- a/cv/classification/resnet152/pytorch/README.md +++ b/cv/classification/resnet152/pytorch/README.md @@ -1,13 +1,20 @@ # ResNet152 + ## Model Description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNet152 is a deep convolutional neural network with 152 layers, representing one of the largest variants in the ResNet +family. It builds upon the residual learning framework, using skip connections to enable effective training of very deep +networks. The model achieves state-of-the-art performance in image classification tasks by extracting complex +hierarchical features. ResNet152's architecture is particularly effective for large-scale visual recognition tasks, +offering improved accuracy over smaller ResNet variants while maintaining computational efficiency through its residual +connections. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnet152_amp_dist.sh ``` - - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet18/pytorch/README.md b/cv/classification/resnet18/pytorch/README.md index 7b443d0c3..e2004f281 100644 --- a/cv/classification/resnet18/pytorch/README.md +++ b/cv/classification/resnet18/pytorch/README.md @@ -1,13 +1,19 @@ # ResNet18 + ## Model Description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNet18 is a lightweight convolutional neural network with 18 layers, featuring residual connections that enable +efficient training of deep networks. It introduces skip connections that bypass layers, addressing vanishing gradient +problems and allowing for better feature learning. ResNet18 achieves strong performance in image classification tasks +while maintaining computational efficiency. Its compact architecture makes it suitable for applications with limited +resources, serving as a backbone for various computer vision tasks like object detection and segmentation. -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,18 +31,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnet18_amp_dist.sh ``` -:beers: Done! - - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet50/paddlepaddle/README.md b/cv/classification/resnet50/paddlepaddle/README.md index 456023898..5151cfb89 100644 --- a/cv/classification/resnet50/paddlepaddle/README.md +++ b/cv/classification/resnet50/paddlepaddle/README.md @@ -1,21 +1,19 @@ # ResNet50 + ## Model Description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Installing +ResNet50 is a deep convolutional neural network with 50 layers, known for its innovative residual learning framework. It +introduces skip connections that bypass layers, enabling the training of very deep networks by addressing vanishing +gradient problems. This architecture achieved breakthrough performance in image classification tasks, winning the 2015 +ImageNet competition. ResNet50's efficient design and strong feature extraction capabilities make it widely used in +computer vision applications, serving as a backbone for various tasks like object detection and segmentation. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -yum install mesa-libGL -y -pip3 install urllib3==1.26.6 -pip3 install protobuf==3.20.3 -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -33,9 +31,19 @@ imagenet └── val_list.txt ``` -**Tips** +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +yum install mesa-libGL -y +pip3 install urllib3==1.26.6 +pip3 install protobuf==3.20.3 +``` + +Tips: for `PaddleClas` training, the images path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: -For `PaddleClas` training, the images path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: - train_list.txt: train/n01440764/n01440764_10026.JPEG 0 - val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 @@ -58,12 +66,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -## Model Results on BI-V100 - -
- -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 4 cards | Acc@1=76.27,FPS=80.37,BatchSize=64 | +## Model Results -
+| Model | GPU | FP32 | +|----------|------------|------------------------------------| +| ResNet50 | BI-V100 x4 | Acc@1=76.27,FPS=80.37,BatchSize=64 | diff --git a/cv/classification/resnet50/pytorch/README.md b/cv/classification/resnet50/pytorch/README.md index 16c156d77..ce76b5930 100644 --- a/cv/classification/resnet50/pytorch/README.md +++ b/cv/classification/resnet50/pytorch/README.md @@ -1,10 +1,19 @@ # ResNet50 + ## Model Description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Step 1: Preparing +ResNet50 is a deep convolutional neural network with 50 layers, known for its innovative residual learning framework. It +introduces skip connections that bypass layers, enabling the training of very deep networks by addressing vanishing +gradient problems. This architecture achieved breakthrough performance in image classification tasks, winning the 2015 +ImageNet competition. ResNet50's efficient design and strong feature extraction capabilities make it widely used in +computer vision applications, serving as a backbone for various tasks like object detection and segmentation. + +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -22,45 +31,39 @@ imagenet └── val_list.txt ``` - - ## Model Training -### One single GPU ```bash +# One single GPU bash scripts/fp32_1card.sh --data-path /path/to/imagenet -``` -### One single GPU (AMP) -```bash + +# One single GPU (AMP) bash scripts/amp_1card.sh --data-path /path/to/imagenet -``` -### Multiple GPUs on one machine -```bash + +# Multiple GPUs on one machine bash scripts/fp32_4cards.sh --data-path /path/to/imagenet bash scripts/fp32_8cards.sh --data-path /path/to/imagenet -``` -### Multiple GPUs on one machine (AMP) -```bash + +# Multiple GPUs on one machine (AMP) bash scripts/amp_4cards.sh --data-path /path/to/imagenet bash scripts/amp_8cards.sh --data-path /path/to/imagenet -``` + ### Multiple GPUs on two machines -```bash bash scripts/fp32_16cards.sh --data-path /path/to/imagenet ``` -## Model Results on BI-V100 +## Model Results -| | FP32 | AMP+NHWC | -| ----------- | ----------------------------------------------- | --------------------------------------------- | -| single card | Acc@1=76.02,FPS=330,Time=4d3h,BatchSize=280 | Acc@1=75.56,FPS=550,Time=2d13h,BatchSize=300 | -| 4 cards | Acc@1=75.89,FPS=1233,Time=1d2h,BatchSize=300 | Acc@1=79.04,FPS=2400,Time=11h,BatchSize=512 | -| 8 cards | Acc@1=74.98,FPS=2150,Time=12h43m,BatchSize=300 | Acc@1=76.43,FPS=4200,Time=8h,BatchSize=480 | +| Model | GPU | FP32 | AMP+NHWC | +|----------|------------|-------------------------------------------------|-----------------------------------------------| +| ResNet50 | BI-V100 x1 | Acc@1=76.02,FPS=330,Time=4d3h,BatchSize=280 | Acc@1=75.56,FPS=550,Time=2d13h,BatchSize=300 | +| ResNet50 | BI-V100 x4 | Acc@1=75.89,FPS=1233,Time=1d2h,BatchSize=300 | Acc@1=79.04,FPS=2400,Time=11h,BatchSize=512 | +| ResNet50 | BI-V100 x8 | Acc@1=74.98,FPS=2150,Time=12h43m,BatchSize=300 | Acc@1=76.43,FPS=4200,Time=8h,BatchSize=480 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| | top1 75.9% | SDK V2.2,bs:512,8x,AMP | 5221 | 76.43% | 128\*8 | 0.97 | 29.1\*8 | 1 | - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/resnet50/tensorflow/README.md b/cv/classification/resnet50/tensorflow/README.md index 14881ba1b..c12d7e5e3 100644 --- a/cv/classification/resnet50/tensorflow/README.md +++ b/cv/classification/resnet50/tensorflow/README.md @@ -1,41 +1,41 @@ # ResNet50 ## Model Description -Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. -## Prepare +ResNet50 is a deep convolutional neural network with 50 layers, known for its innovative residual learning framework. It +introduces skip connections that bypass layers, enabling the training of very deep networks by addressing vanishing +gradient problems. This architecture achieved breakthrough performance in image classification tasks, winning the 2015 +ImageNet competition. ResNet50's efficient design and strong feature extraction capabilities make it widely used in +computer vision applications, serving as a backbone for various tasks like object detection and segmentation. -### Install packages +## Model Preparation -```shell -pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger -``` +### Prepare Resources -### Download datasets +Download and convert to TFRecord format following [ImageNet-to-TFrecord](https://github.com/kmonachopoulos/ImageNet-to-TFrecord). +Or [here](https://github.com/tensorflow/models/tree/master/research/slim#downloading-and-converting-to-tfrecord-format) -[Downloading and converting to TFRecord format](https://github.com/kmonachopoulos/ImageNet-to-TFrecord) or -[here](https://github.com/tensorflow/models/tree/master/research/slim#downloading-and-converting-to-tfrecord-format) -make a file named imagenet_tfrecord, and store imagenet datasest convert to imagenet_tfrecord +Make a file named imagenet_tfrecord, and store imagenet datasest convert to imagenet_tfrecord +### Install Dependencies +```shell +pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger +``` -## Training - -### Training on single card +## Model Training ```shell +# Training on single card bash run_train_resnet50_imagenette.sh -``` -### Training on mutil-cards -```shell +# Training on mutil-cards bash run_train_resnet50_multigpu_imagenette.sh ``` +## Model Results -## Result - -| | acc | fps | -| --- | --- | --- | -| multi_card | 0.9860 | 236.9 | +| Model | GPU | acc | fps | +|----------|------------|--------|-------| +| ResNet50 | BI-V100 x8 | 0.9860 | 236.9 | diff --git a/cv/classification/resnext101_32x8d/pytorch/README.md b/cv/classification/resnext101_32x8d/pytorch/README.md index bb96b91d4..fa33d3a43 100644 --- a/cv/classification/resnext101_32x8d/pytorch/README.md +++ b/cv/classification/resnext101_32x8d/pytorch/README.md @@ -1,14 +1,20 @@ # ResNeXt101_32x8d ## Model Description -A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeXt101 is a deep convolutional network that extends ResNet architecture by introducing cardinality as a new +dimension. The 32x8d variant uses 32 groups with 8-dimensional transformations in each block. This grouped convolution +approach improves feature representation while maintaining computational efficiency. ResNeXt101 achieves +state-of-the-art performance in image classification tasks by combining the benefits of residual learning with +multi-branch transformations. Its architecture is particularly effective for large-scale visual recognition tasks, +offering improved accuracy over standard ResNet models. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnext101_32x8d_amp_dist.sh ``` - - ## References -https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214 + +- [imgclsmob](https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214) diff --git a/cv/classification/resnext50_32x4d/mindspore/README.md b/cv/classification/resnext50_32x4d/mindspore/README.md index 3093a1646..6a1ea169b 100644 --- a/cv/classification/resnext50_32x4d/mindspore/README.md +++ b/cv/classification/resnext50_32x4d/mindspore/README.md @@ -1,29 +1,19 @@ # ResNeXt50_32x4d ## Model Description -A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. -## Model Preparation +ResNeXt50 is an enhanced version of ResNet50 that introduces cardinality as a new dimension alongside depth and width. +It uses grouped convolutions to create multiple parallel transformation paths within each block, improving feature +representation. The 32x4d variant has 32 groups with 4-dimensional transformations. This architecture achieves better +accuracy than ResNet50 with similar computational complexity, making it efficient for image classification tasks. +ResNeXt50's design has influenced many subsequent CNN architectures in computer vision. -### Install Dependencies -Install OpenMPI and mesa-libGL -```bash -wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz -tar -xvf openmpi-4.0.7.tar.gz -cd openmpi-4.0.7 -./configure --prefix=/usr/local/bin --with-orte -make all -make install -vim ~/.bashrc -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ -source ~/.bashrc -yum install openssh-server, openssh-clients -yum install mesa-libGL -``` +## Model Preparation -## Step 2:Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -41,20 +31,38 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +Install OpenMPI and mesa-libGL + +```bash +wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz +tar -xvf openmpi-4.0.7.tar.gz +cd openmpi-4.0.7 +./configure --prefix=/usr/local/bin --with-orte +make all +make install +vim ~/.bashrc +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/ +source ~/.bashrc +yum install openssh-server, openssh-clients +yum install mesa-libGL +``` + +## Model Training + set `/path/to/checkpoint` to save the model. -single gpu: + ```bash +# Single gpu export CUDA_VISIBLE_DEVICES=0 python3 train.py \ --run_distribute=0 \ --device_target="GPU" \ --data_path=/path/to/imagenet/train \ --output_path /path/to/checkpoint -``` -multi-gpu: -```bash +# Multi-gpu export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 mpirun --allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-stdout \ python3 train.py \ @@ -64,10 +72,12 @@ mpirun --allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-s --output_path /path/to/checkpoint ``` -validate: -the " model_data_dir " in checkpoint_file_path should look like: `2022-02-02_time_02_22_22`, you should fill in +The " model_data_dir " in checkpoint_file_path should look like: `2022-02-02_time_02_22_22`, you should fill in the value based on your actual situation. + ```bash +# Evaluation +export CUDA_VISIBLE_DEVICES=0 python3 eval.py \ --data_path=/path/to/imagenet/val \ --device_target="GPU" \ @@ -76,9 +86,10 @@ python3 eval.py \ ## Model Results -| GPUs | FPS | ACC(TOP1) | ACC(TOP5) | -|-------------|-----------|--------------|--------------| -| BI-V100 x 8 | 109.97 | 78.18% | 94.03% | +| Model | GPU | FPS | ACC(TOP1) | ACC(TOP5) | +|-----------|------------|--------|-----------|-----------| +| ResNeXt50 | BI-V100 x8 | 109.97 | 78.18% | 94.03% | ## References -https://gitee.com/mindspore/models/tree/master/research/cv/ResNeXt + +- [ResNeXt](https://gitee.com/mindspore/models/tree/master/research/cv/ResNeXt) diff --git a/cv/classification/resnext50_32x4d/pytorch/README.md b/cv/classification/resnext50_32x4d/pytorch/README.md index eca6864a7..011b82080 100644 --- a/cv/classification/resnext50_32x4d/pytorch/README.md +++ b/cv/classification/resnext50_32x4d/pytorch/README.md @@ -1,14 +1,19 @@ # ResNeXt50_32x4d ## Model Description -A ResNeXt repeats a building block that aggregates a set of transformations with the same topology. Compared to a ResNet, it exposes a new dimension, cardinality (the size of the set of transformations) , as an essential factor in addition to the dimensions of depth and width. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +ResNeXt50 is an enhanced version of ResNet50 that introduces cardinality as a new dimension alongside depth and width. +It uses grouped convolutions to create multiple parallel transformation paths within each block, improving feature +representation. The 32x4d variant has 32 groups with 4-dimensional transformations. This architecture achieves better +accuracy than ResNet50 with similar computational complexity, making it efficient for image classification tasks. +ResNeXt50's design has influenced many subsequent CNN architectures in computer vision. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,16 +31,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_resnext50_32x4d_amp_dist.sh ``` - - ## References -https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L200 + +- [imgclsmob](https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L200) diff --git a/cv/classification/se_resnet50_vd/paddlepaddle/README.md b/cv/classification/se_resnet50_vd/paddlepaddle/README.md index d5b1b29ec..9f74f52cb 100644 --- a/cv/classification/se_resnet50_vd/paddlepaddle/README.md +++ b/cv/classification/se_resnet50_vd/paddlepaddle/README.md @@ -2,23 +2,18 @@ ## Model Description -The SENet structure is a weighted average between graph channels that can be embedded into other network structures. SE_ResNet50_vd is a model that adds the senet structure to ResNet50, further learning the dependency relationships between graph channels to obtain better image features. +SE_ResNet50_vd is an enhanced version of ResNet50 that incorporates Squeeze-and-Excitation (SE) blocks and variant +downsampling. The SE blocks adaptively recalibrate channel-wise feature responses, improving feature representation. The +variant downsampling preserves more information during feature map reduction. This architecture achieves better accuracy +than standard ResNet50 while maintaining computational efficiency. SE_ResNet50_vd is particularly effective for image +classification tasks, offering improved performance through better feature learning and channel attention mechanisms. ## Model Preparation -### Install Dependencies - -``` -pip3 install -r requirements.txt -python3 -m pip install urllib3==1.26.6 -yum install -y mesa-libGL +### Prepare Resources -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git -``` - -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -36,22 +31,31 @@ ILSVRC2012 └── val_list.txt ``` -**Tips** - -For `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: +Tips: for `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` +directories: -* train_list.txt: train/n01440764/n01440764_10026.JPEG 0 -* val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 +- train_list.txt: train/n01440764/n01440764_10026.JPEG 0 +- val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 -``` +```bash # add "train/" and "val/" to head of lines sed -i 's#^#train/#g' train_list.txt sed -i 's#^#val/#g' val_list.txt ``` -## Step 3: Training +### Install Dependencies +```bash +pip3 install -r requirements.txt +python3 -m pip install urllib3==1.26.6 +yum install -y mesa-libGL + +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git ``` + +## Model Training + +```bash cd PaddleClas/ export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ./ppcls/configs/ImageNet/SENet/SE_ResNet50_vd.yaml @@ -59,9 +63,9 @@ python3 -m paddle.distributed.launch --gpus="0,1,2,3" tools/train.py -c ./ppcls/ ## Model Results -| GPUS | ACC | FPS | -| ---- | ------ | --------- | -| BI-V100 x8 | 79.20% | 139.63 samples/s | +| Model | GPU | ACC | FPS | +|----------------|------------|--------|------------------| +| SE_ResNet50_vd | BI-V100 x8 | 79.20% | 139.63 samples/s | ## References diff --git a/cv/classification/seresnext/pytorch/README.md b/cv/classification/seresnext/pytorch/README.md index d3cd83994..ffc3ffb75 100644 --- a/cv/classification/seresnext/pytorch/README.md +++ b/cv/classification/seresnext/pytorch/README.md @@ -1,12 +1,20 @@ # SEResNeXt + ## Model Description -SE ResNeXt is a variant of a ResNext that employs squeeze-and-excitation blocks to enable the network to perform dynamic channel-wise feature recalibration. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +SEResNeXt is an advanced convolutional neural network that combines ResNeXt's grouped convolution with +Squeeze-and-Excitation (SE) blocks. It introduces channel attention mechanisms to adaptively recalibrate feature +responses, improving feature representation. The architecture leverages multiple parallel transformation paths within +each block while maintaining computational efficiency. SEResNeXt achieves state-of-the-art performance in image +classification tasks by effectively combining multi-branch transformations with channel-wise attention, making it +particularly suitable for complex visual recognition problems. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -24,16 +32,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_seresnext101_32x4d_amp_dist.sh ``` - - ## References -https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214 + +- [imgclsmob](https://github.com/osmr/imgclsmob/blob/f2993d3ce73a2f7ddba05da3891defb08547d504/pytorch/pytorchcv/models/seresnext.py#L214) diff --git a/cv/classification/shufflenetv2/paddlepaddle/README.md b/cv/classification/shufflenetv2/paddlepaddle/README.md index d5df18cc3..65bdbbf95 100644 --- a/cv/classification/shufflenetv2/paddlepaddle/README.md +++ b/cv/classification/shufflenetv2/paddlepaddle/README.md @@ -1,26 +1,19 @@ # ShuffleNetv2 -## Model Description -ShuffleNet v2 is a convolutional neural network optimized for a direct metric (speed) rather than indirect metrics like FLOPs. It builds upon ShuffleNet v1, which utilised pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation. Differences are shown in the Figure to the right, including a new channel split operation and moving the channel shuffle operation further down the block.ShuffleNetv2 is an efficient convolutional neural network architecture for mobile devices. For more information check the paper: [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) - -## Model Preparation - -### Install Dependencies -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas +## Model Description -yum install mesa-libGL -y +ShuffleNetv2 is an efficient convolutional neural network designed specifically for mobile devices. It introduces +practical guidelines for CNN architecture design, focusing on direct speed optimization rather than indirect metrics +like FLOPs. The model features a channel split operation and optimized channel shuffle mechanism, improving both +accuracy and inference speed. ShuffleNetv2 achieves state-of-the-art performance in mobile image classification tasks +while maintaining low computational complexity, making it ideal for resource-constrained applications. -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 +## Model Preparation -python3 setup.py install -``` +### Prepare Resources -## Step 2: Preparing Datasets -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -38,11 +31,26 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +yum install -y mesa-libGL + +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas/ +pip3 install -r requirements.txt +python3 setup.py install + +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 + +``` + +## Model Training ```bash # Make sure your dataset path is the same as above -cd PaddleClas +cd PaddleClas/ # Link your dataset to default location ln -s /path/to/imagenet ./dataset/ILSVRC2012 @@ -55,9 +63,10 @@ python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/c ## Model Results -| GPUs | Top1 | Top5 |ips | -|-------------|-------------|----------------|----------------| -| BI-V100 x 4 | 0.684 | 0.881 | 1236 | +| Model | GPU | Top1 | Top5 | ips | +|--------------|------------|-------|-------|------| +| ShuffleNetv2 | BI-V100 x4 | 0.684 | 0.881 | 1236 | ## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/shufflenetv2/pytorch/README.md b/cv/classification/shufflenetv2/pytorch/README.md index ddcb41db8..1f28a20ca 100644 --- a/cv/classification/shufflenetv2/pytorch/README.md +++ b/cv/classification/shufflenetv2/pytorch/README.md @@ -1,12 +1,19 @@ # ShuffleNetV2 + ## Model Description -ShuffleNet v2 is a convolutional neural network optimized for a direct metric (speed) rather than indirect metrics like FLOPs. It builds upon ShuffleNet v1, which utilised pointwise group convolutions, bottleneck-like structures, and a channel shuffle operation. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +ShuffleNetv2 is an efficient convolutional neural network designed specifically for mobile devices. It introduces +practical guidelines for CNN architecture design, focusing on direct speed optimization rather than indirect metrics +like FLOPs. The model features a channel split operation and optimized channel shuffle mechanism, improving both +accuracy and inference speed. ShuffleNetv2 achieves state-of-the-art performance in mobile image classification tasks +while maintaining low computational complexity, making it ideal for resource-constrained applications. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -24,15 +31,21 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_shufflenet_v2_x2_0_amp_dist.sh ``` - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification#shufflenet-v2) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification#shufflenet-v2) diff --git a/cv/classification/squeezenet/pytorch/README.md b/cv/classification/squeezenet/pytorch/README.md index a004257d1..b084df03d 100644 --- a/cv/classification/squeezenet/pytorch/README.md +++ b/cv/classification/squeezenet/pytorch/README.md @@ -1,13 +1,20 @@ # SqueezeNet ## Model Description -SqueezeNet is a convolutional neural network that employs design strategies to reduce the number of parameters, notably with the use of fire modules that "squeeze" parameters using 1x1 convolutions. -## Step 1: Installing -```bash -pip3 install torch torchvision -``` -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +SqueezeNet is a lightweight convolutional neural network designed for efficient deployment on resource-constrained +devices. It achieves AlexNet-level accuracy with 50x fewer parameters through innovative "fire modules" that combine 1x1 +"squeeze" convolutions with 1x1 and 3x3 "expand" convolutions. The architecture focuses on model compression while +maintaining good classification performance. SqueezeNet is particularly suitable for mobile and embedded applications +where model size and computational efficiency are critical, offering a balance between accuracy and resource +requirements. + +## Model Preparation + +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -25,17 +32,22 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies -## Model Training -### One single GPU ```bash -python3 train.py --data-path /path/to/imagenet --model squeezenet1_0 --lr 0.001 +pip3 install torch torchvision ``` -### Multiple GPUs on one machine + +## Model Training + ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model squeezenet1_0 --lr 0.001 + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenette --model squeezenet1_0 --lr 0.001 ``` ## References -https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py + +- [vision](https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py) diff --git a/cv/classification/swin_transformer/paddlepaddle/README.md b/cv/classification/swin_transformer/paddlepaddle/README.md index 6035169ee..d63622abf 100644 --- a/cv/classification/swin_transformer/paddlepaddle/README.md +++ b/cv/classification/swin_transformer/paddlepaddle/README.md @@ -1,18 +1,20 @@ # Swin Transformer + ## Model Description -The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. -## Step 1: Installing +The Swin Transformer is a hierarchical vision transformer that introduces shifted windows for efficient self-attention +computation. It processes images in local windows, reducing computational complexity while maintaining global modeling +capabilities. The architecture builds hierarchical feature maps by merging image patches in deeper layers, making it +suitable for both image classification and dense prediction tasks. Swin Transformer achieves state-of-the-art +performance in various vision tasks, offering a powerful alternative to traditional convolutional networks with its +transformer-based approach. -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Download data +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,7 +32,15 @@ imagenet └── val_list.txt ``` -## Step 3: Run Swin-Transformer +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training ```bash cd PaddleClas @@ -42,6 +52,8 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -u -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml -o Arch.pretrained=False -o Global.device=gpu ``` -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | Acc@1=0.8024 | +## Model Results + +| Model | GPU | FP32 | +|------------------|------------|--------------| +| Swin Transformer | BI-V100 x8 | Acc@1=0.8024 | diff --git a/cv/classification/swin_transformer/pytorch/README.md b/cv/classification/swin_transformer/pytorch/README.md index 584600b3a..d62655a1a 100644 --- a/cv/classification/swin_transformer/pytorch/README.md +++ b/cv/classification/swin_transformer/pytorch/README.md @@ -1,17 +1,20 @@ # Swin Transformer + ## Model Description -The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. -## Step 1: Installing +The Swin Transformer is a hierarchical vision transformer that introduces shifted windows for efficient self-attention +computation. It processes images in local windows, reducing computational complexity while maintaining global modeling +capabilities. The architecture builds hierarchical feature maps by merging image patches in deeper layers, making it +suitable for both image classification and dense prediction tasks. Swin Transformer achieves state-of-the-art +performance in various vision tasks, offering a powerful alternative to traditional convolutional networks with its +transformer-based approach. -```bash -git clone https://github.com/microsoft/Swin-Transformer.git -git checkout f82860bfb5225915aca09c3227159ee9e1df874d -cd Swin-Transformer -pip install timm==0.4.12 yacs -``` +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -29,16 +32,27 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +git clone https://github.com/microsoft/Swin-Transformer.git +git checkout f82860bfb5225915aca09c3227159ee9e1df874d +cd Swin-Transformer +pip install timm==0.4.12 yacs +``` + ## Model Training -### Multiple GPUs on one machine + ```bash -# fix --local-rank for torch 2.x +# Multiple GPUs on one machine + +## fix --local-rank for torch 2.x sed -i 's/--local_rank/--local-rank/g' main.py python3 -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \ --cfg configs/swin/swin_tiny_patch4_window7_224.yaml --data-path /path/to/imagenet --batch-size 128 ``` -## Reference +## References - [Swin-Transformer](https://github.com/microsoft/Swin-Transformer) diff --git a/cv/classification/vgg/paddlepaddle/README.md b/cv/classification/vgg/paddlepaddle/README.md index c22107937..df26ef190 100644 --- a/cv/classification/vgg/paddlepaddle/README.md +++ b/cv/classification/vgg/paddlepaddle/README.md @@ -1,21 +1,19 @@ # VGG16 + ## Model Description -VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. -## Step 1: Installing +VGG is a classic convolutional neural network architecture known for its simplicity and depth. It uses small 3x3 +convolutional filters stacked in multiple layers, allowing for effective feature extraction. The architecture typically +includes 16 or 19 weight layers, with VGG16 being the most popular variant. VGG achieved state-of-the-art performance in +image classification tasks and became a benchmark for subsequent CNN architectures. Its uniform structure and deep +design have influenced many modern deep learning models in computer vision. -```bash -git clone https://github.com/PaddlePaddle/PaddleClas.git -``` +## Model Preparation -```bash -cd PaddleClas -pip3 install -r requirements.txt -``` - -## Step 2: Prepare Datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -33,8 +31,21 @@ imagenet └── val_list.txt ``` -## Step 3: Training -Notice:if use AMP, modify PaddleClas/ppcls/configs/ImageNet/VGG/VGG16.yaml, +### Install Dependencies + +```bash +git clone https://github.com/PaddlePaddle/PaddleClas.git +``` + +```bash +cd PaddleClas +pip3 install -r requirements.txt +``` + +## Model Training + +Notice:if use AMP, modify PaddleClas/ppcls/configs/ImageNet/VGG/VGG16.yaml, + ```yaml AMP: scale_loss: 128.0 @@ -54,4 +65,5 @@ python3 -u -m paddle.distributed.launch --gpus=0,1,2,3 tools/train.py -c ppcls/c ``` ## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/vgg/pytorch/README.md b/cv/classification/vgg/pytorch/README.md index 197800e7b..34ce4101f 100644 --- a/cv/classification/vgg/pytorch/README.md +++ b/cv/classification/vgg/pytorch/README.md @@ -1,19 +1,19 @@ # VGG16 + ## Model Description -VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. +VGG is a classic convolutional neural network architecture known for its simplicity and depth. It uses small 3x3 +convolutional filters stacked in multiple layers, allowing for effective feature extraction. The architecture typically +includes 16 or 19 weight layers, with VGG16 being the most popular variant. VGG achieved state-of-the-art performance in +image classification tasks and became a benchmark for subsequent CNN architectures. Its uniform structure and deep +design have influenced many modern deep learning models in computer vision. ## Model Preparation -### Install Dependencies - -```bash -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -31,7 +31,13 @@ imagenet └── val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` + +## Model Training ```bash # Set data path @@ -42,4 +48,5 @@ bash train_vgg16_amp_dist.sh ``` ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/vgg/tensorflow/README.md b/cv/classification/vgg/tensorflow/README.md index 9c583620b..0a1ec7786 100644 --- a/cv/classification/vgg/tensorflow/README.md +++ b/cv/classification/vgg/tensorflow/README.md @@ -2,21 +2,22 @@ ## Model Description -VGG is a classical convolutional neural network architecture. It was based on an analysis of how to increase the depth of such networks. The network utilises small 3 x 3 filters. Otherwise the network is characterized by its simplicity: the only other components being pooling layers and a fully connected layer. +VGG is a classic convolutional neural network architecture known for its simplicity and depth. It uses small 3x3 +convolutional filters stacked in multiple layers, allowing for effective feature extraction. The architecture typically +includes 16 or 19 weight layers, with VGG16 being the most popular variant. VGG achieved state-of-the-art performance in +image classification tasks and became a benchmark for subsequent CNN architectures. Its uniform structure and deep +design have influenced many modern deep learning models in computer vision. ## Model Preparation -### Install Dependencies - -```bash -pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger -``` - -## Step 2: Preparing datasets +### Prepare Resources You can get ImageNet 1K TFrecords ILSVRC2012 dataset directly from below links: -- [ImageNet 1K TFrecords ILSVRC2012 - part 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) -- [ImageNet 1K TFrecords ILSVRC2012 - part 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) + +- [ImageNet 1K TFrecords ILSVRC2012 - part + 0](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0) +- [ImageNet 1K TFrecords ILSVRC2012 - part + 1](https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1) The ImageNet TFrecords dataset path structure should look like: @@ -30,8 +31,16 @@ imagenet_tfrecord └── validation-00127-of-00128 ``` -## Step 3: Training -**Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link.** +### Install Dependencies + +```bash +pip3 install absl-py git+https://github.com/NVIDIA/dllogger#egg=dllogger +``` + +## Model Training + +Put the TFrecords data in "./imagenet_tfrecord" directory or create a soft link. + ```bash # 1 GPU bash run_train_vgg16_imagenet.sh @@ -42,9 +51,10 @@ bash run_train_vgg16_multigpu_imagenet.sh ## Model Results -| GPUS | acc | fps | -| ----------| --------------------------| ----- | -| BI V100×8 | acc@1=0.7160,acc@5=0.9040 | 435.9 | +| Model | GPU | acc | fps | +|-------|------------|---------------------------|-------| +| VGG16 | BI-V100 ×8 | acc@1=0.7160,acc@5=0.9040 | 435.9 | ## References -- [TensorFlow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) \ No newline at end of file + +- [tensorflow/benchmarks](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) diff --git a/cv/classification/wavemlp/pytorch/README.md b/cv/classification/wavemlp/pytorch/README.md index d4bd919a5..b9a884a00 100644 --- a/cv/classification/wavemlp/pytorch/README.md +++ b/cv/classification/wavemlp/pytorch/README.md @@ -2,17 +2,18 @@ ## Model Description -In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_mlp. +Wave-MLP is an innovative vision architecture that represents image tokens as wave functions with amplitude and phase +components. It dynamically modulates token relationships through phase terms, adapting to varying semantic information +in different images. This approach enhances feature aggregation in pure MLP architectures, outperforming traditional +CNNs and transformers in tasks like image classification and object detection. Wave-MLP offers efficient computation +while maintaining high accuracy, making it suitable for various computer vision applications. -## Step 1: Installing -```bash -pip install thop timm==0.4.5 torchprofile -git clone https://github.com/huawei-noah/Efficient-AI-Backbones.git -cd Efficient-AI-Backbones/wavemlp_pytorch/ -git checkout 25531f7fdcf61e300b47c52ba80973d0af8bb011 -``` +## Model Preparation + +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -30,9 +31,18 @@ imagenet └── val_list.txt ``` +### Install Dependencies + +```bash +pip install thop timm==0.4.5 torchprofile +git clone https://github.com/huawei-noah/Efficient-AI-Backbones.git +cd Efficient-AI-Backbones/wavemlp_pytorch/ +git checkout 25531f7fdcf61e300b47c52ba80973d0af8bb011 +``` + ## Model Training -### WaveMLP_T*: +### WaveMLP_T* ### Multiple GPUs on one machine @@ -52,22 +62,17 @@ python3 -m torch.distributed.launch --nproc_per_node 8 --nnodes=1 --node_rank=0 ### FP16 -| card-batchsize-AMP opt-level | 1 card | 8 cards | -| :-----| ----: | :----: | -| BI-bs126-O1 | 114.76 | 884.27 | - - -### FP32 - -| batch_size | 1 card | 8 cards | -| :-----| ----: | :----: | -| 128 | 140.48 | 1068.15 | +| Model | GPU | precision | batchsize | opt-level | fps | +|----------|-----------|-----------|-----------|-----------|---------| +| Wave-MLP | BI-V100x8 | FP16 | 128 | O1 | 884.27 | +| Wave-MLP | BI-V100x1 | FP16 | 128 | O1 | 114.76 | +| Wave-MLP | BI-V100x8 | FP32 | 128 | O1 | 1068.15 | +| Wave-MLP | BI-V100x1 | FP32 | 128 | O1 | 140.48 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| | 80.1 | SDK V2.2,bs:256,8x,fp32 | 1026 | 83.1 | 198\*8 | 0.98 | 29.4\*8 | 1 | +## References -## Reference - -- [wavemlp_pytorch](https://github.com/huawei-noah/Efficient-AI-Backbones/blob/master/wavemlp_pytorch/) +- [Efficient-AI-Backbones](https://github.com/huawei-noah/Efficient-AI-Backbones/blob/master/wavemlp_pytorch/) diff --git a/cv/classification/wide_resnet101_2/pytorch/README.md b/cv/classification/wide_resnet101_2/pytorch/README.md index 626597583..495bec852 100644 --- a/cv/classification/wide_resnet101_2/pytorch/README.md +++ b/cv/classification/wide_resnet101_2/pytorch/README.md @@ -1,14 +1,19 @@ # Wide_ResNet101_2 ## Model Description -Wide Residual Networks are a variant on ResNets where we decrease depth and increase the width of residual networks. This is achieved through the use of wide residual blocks. -## Step 1: Installing -```bash -pip3 install -r requirements.txt -``` +Wide_ResNet101_2 is an enhanced version of Wide_ResNet101 that further increases network width while maintaining +residual connections. It uses wider residual blocks with more filters per layer, enabling richer feature representation. +This architecture achieves superior performance in image classification tasks by balancing increased capacity with +efficient training. Wide_ResNet101_2 demonstrates improved accuracy over standard ResNet variants while maintaining +computational efficiency, making it suitable for complex visual recognition tasks requiring high performance. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,18 +31,21 @@ imagenet └── val_list.txt ``` -:beers: Done! +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` ## Model Training -### Multiple GPUs on one machine + Set data path by `export DATA_PATH=/path/to/imagenet`. The following command uses all cards to train: ```bash +# Multiple GPUs on one machine bash train_wide_resnet101_2_amp_dist.sh ``` -:beers: Done! - - ## References -- [torchvision](https://github.com/pytorch/vision/tree/main/references/classification) + +- [vision](https://github.com/pytorch/vision/tree/main/references/classification) diff --git a/cv/classification/xception/paddlepaddle/README.md b/cv/classification/xception/paddlepaddle/README.md index ab56b9bca..425e10787 100644 --- a/cv/classification/xception/paddlepaddle/README.md +++ b/cv/classification/xception/paddlepaddle/README.md @@ -2,27 +2,24 @@ ## Model Description -Xception is a convolutional neural network architecture that relies solely on depthwise separable convolution layers. +Xception is a deep convolutional neural network that extends the Inception architecture by replacing standard +convolutions with depthwise separable convolutions. This modification significantly reduces computational complexity +while maintaining high accuracy. Xception introduces extreme Inception modules that completely separate channel and +spatial correlations. The architecture achieves state-of-the-art performance in image classification tasks, offering an +efficient alternative to traditional CNNs. Its design is particularly suitable for applications requiring both high +accuracy and computational efficiency. ## Model Preparation -### Install Dependencies - -```bash -git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git -cd PaddleClas -pip3 install scikit-learn easydict visualdl==2.2.0 urllib3==1.26.6 -yum install -y mesa-libGL -``` +### Prepare Resources -## Step 2: Preparing datasets - -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `./PaddleClas/dataset/` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: ```bash -imagenet +ILSVRC2012 ├── train │ └── n01440764 │ ├── n01440764_10026.JPEG @@ -35,9 +32,9 @@ imagenet └── val_list.txt ``` -**Tips** +Tips: for `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` +directories: -For `PaddleClas` training, the image path in train_list.txt and val_list.txt must contain `train/` and `val/` directories: - train_list.txt: train/n01440764/n01440764_10026.JPEG 0 - val_list.txt: val/n01667114/ILSVRC2012_val_00000229.JPEG 35 @@ -47,7 +44,16 @@ sed -i 's#^#train/#g' train_list.txt sed -i 's#^#val/#g' val_list.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleClas.git +cd PaddleClas +pip3 install scikit-learn easydict visualdl==2.2.0 urllib3==1.26.6 +yum install -y mesa-libGL +``` + +## Model Training ```bash # Make sure your dataset path is the same as above @@ -60,9 +66,11 @@ python3 -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py -c ./ ``` ## Model Results -| GPUs | TOP1 | TOP5 | ips | -|:-----------:|:-----------:|:-----------:|:-----------:| -| BI-V100 x 8 |0.783 | 0.941 | 537.04 | + +| Model | GPU | TOP1 | TOP5 | ips | +|----------|------------|-------|-------|--------| +| Xception | BI-V100 x8 | 0.783 | 0.941 | 537.04 | ## References + - [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) diff --git a/cv/classification/xception/pytorch/README.md b/cv/classification/xception/pytorch/README.md index 1acdb3cf1..beb31b487 100755 --- a/cv/classification/xception/pytorch/README.md +++ b/cv/classification/xception/pytorch/README.md @@ -1,14 +1,20 @@ # Xception ## Model Description -Xception is a convolutional neural network architecture that relies solely on depthwise separable convolution layers. -## Step 1: Installing -```bash -pip3 install torch torchvision -``` +Xception is a deep convolutional neural network that extends the Inception architecture by replacing standard +convolutions with depthwise separable convolutions. This modification significantly reduces computational complexity +while maintaining high accuracy. Xception introduces extreme Inception modules that completely separate channel and +spatial correlations. The architecture achieves state-of-the-art performance in image classification tasks, offering an +efficient alternative to traditional CNNs. Its design is particularly suitable for applications requiring both high +accuracy and computational efficiency. + +## Model Preparation -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +### Prepare Resources + +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -26,15 +32,22 @@ imagenet └── val_list.txt ``` -## Model Training -### One single GPU +### Install Dependencies + ```bash -python3 train.py --data-path /path/to/imagenet --model xception +pip3 install torch torchvision ``` -### Multiple GPUs on one machine + +## Model Training + ```bash +# One single GPU +python3 train.py --data-path /path/to/imagenet --model xception + +# Multiple GPUs on one machine python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path /path/to/imagenet --model xception ``` ## References -https://github.com/tstandley/Xception-PyTorch + +- [Xception-PyTorch](https://github.com/tstandley/Xception-PyTorch) -- Gitee From 8f4f7ce71c858e72d1664623e24d535a975cd69d Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Fri, 14 Mar 2025 15:49:43 +0800 Subject: [PATCH 3/3] unify model readme format - cv/3d_xx --- .../hashnerf/pytorch/README.md | 35 ++-- cv/3d_detection/bevformer/pytorch/README.md | 118 ++++++------- cv/3d_detection/centerpoint/pytorch/README.md | 72 ++++---- cv/3d_detection/paconv/pytorch/README.md | 45 ++--- .../part_a2_anchor/pytorch/README.md | 64 +++---- .../part_a2_free/pytorch/README.md | 58 +++---- cv/3d_detection/pointnet2/pytorch/README.md | 46 ++++-- .../pointpillars/pytorch/README.md | 156 ++++++++++-------- cv/3d_detection/pointrcnn/pytorch/README.md | 87 +++++----- .../pointrcnn_iou/pytorch/README.md | 74 +++++---- cv/3d_detection/second/pytorch/README.md | 74 +++++---- cv/3d_detection/second_iou/pytorch/README.md | 75 +++++---- 12 files changed, 472 insertions(+), 432 deletions(-) diff --git a/cv/3d-reconstruction/hashnerf/pytorch/README.md b/cv/3d-reconstruction/hashnerf/pytorch/README.md index 3cdc74193..3b9044668 100644 --- a/cv/3d-reconstruction/hashnerf/pytorch/README.md +++ b/cv/3d-reconstruction/hashnerf/pytorch/README.md @@ -2,16 +2,15 @@ ## Model description -A PyTorch implementation (Hash) of the NeRF part (grid encoder, density grid ray sampler) in instant-ngp, as described -in Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. +HashNeRF is an efficient implementation of Neural Radiance Fields (NeRF) using a multiresolution hash encoding +technique. It accelerates 3D scene reconstruction and novel view synthesis by optimizing memory usage and computational +efficiency. Based on instant-ngp's approach, HashNeRF employs a grid encoder and density grid ray sampler to achieve +high-quality rendering results. The model supports various datasets and custom scenes, making it suitable for +applications in computer graphics, virtual reality, and 3D reconstruction tasks. -## Step 1: Installation +## Model Preparation -```sh -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources We use the same data format as instant-ngp, [fox](https://github.com/NVlabs/instant-ngp/tree/master/data/nerf/fox) and blender dataset [nerf_synthetic](https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1).Please @@ -23,19 +22,23 @@ For custom dataset, you should: 2. put the video under a path like ./data/custom/video.mp4 or the images under ./data/custom/images/*.jpg. 3. call the preprocess code: (should install ffmpeg and colmap first! refer to the file for more options) -```sh +```bash python3 scripts/colmap2nerf.py --video ./data/custom/video.mp4 --run_colmap # if use video python3 scripts/colmap2nerf.py --images ./data/custom/images/ --run_colmap # if use images ``` -## Step 3: Training and test +### Install Dependencies + +```bash +pip3 install -r requirements.txt +``` -### One single GPU +## Model Training First time running will take some time to compile the CUDA extensions. -```sh -# train with fox dataset +```bash +# train with fox dataset on One single GPU python3 main_nerf.py data/fox --workspace trial_nerf -O # data/fox is dataset path; --workspace means output path; @@ -43,22 +46,18 @@ python3 main_nerf.py data/fox --workspace trial_nerf -O # test mode python3 main_nerf.py data/fox --workspace trial_nerf -O --test -``` -```sh # train with the blender dataset, you should add `--bound 1.0 --scale 0.8 --dt_gamma 0` # --bound means the scene is assumed to be inside box[-bound, bound] # --scale adjusts the camera locaction to make sure it falls inside the above bounding box. # --dt_gamma controls the adaptive ray marching speed, set to 0 turns it off. python3 main_nerf.py data/nerf_synthetic/lego --workspace trial_nerf -O --bound 1.0 --scale 0.8 --dt_gamma 0 -``` -```sh # train with custom dataset(you'll need to tune the scale & bound if necessary): python3 main_nerf.py data/custom_data --workspace trial_nerf -O ``` -## Results +## Model Results | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| diff --git a/cv/3d_detection/bevformer/pytorch/README.md b/cv/3d_detection/bevformer/pytorch/README.md index 22d54f0e5..7783512dd 100755 --- a/cv/3d_detection/bevformer/pytorch/README.md +++ b/cv/3d_detection/bevformer/pytorch/README.md @@ -1,99 +1,99 @@ # BEVFormer -## Model description -In this work, the authors present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, the authors design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, the authors propose a temporal self-attention to recurrently fuse the history BEV information. -The proposed approach achieves the new state-of-the-art **56.9\%** in terms of NDS metric on the nuScenes test set, which is **9.0** points higher than previous best arts and on par with the performance of LiDAR-based baselines. +## Model Description +BEVFormer is a transformer-based framework for autonomous driving perception that learns unified Bird's Eye View (BEV) +representations. It combines spatial and temporal information through innovative attention mechanisms: spatial +cross-attention extracts features from camera views, while temporal self-attention fuses historical BEV data. This +approach achieves state-of-the-art performance on nuScenes dataset, matching LiDAR-based systems. BEVFormer supports +multiple perception tasks simultaneously, making it a versatile solution for comprehensive scene understanding in +autonomous driving applications. -## Prepare -**Install mmcv-full.** -```shell -cd mmcv -bash clean_mmcv.sh -bash build_mmcv.sh -bash install_mmcv.sh -``` +## Model Preparation -**Install mmdet and mmseg.** -```shell -pip3 install mmdet==2.25.0 -pip3 install mmsegmentation==0.25.0 -``` +### Prepare Resources -**Install mmdet3d from source code.** -```shell -cd ../mmdetection3d -pip3 install -r requirements.txt,OR pip3 install -r requirements/optional.txt,pip3 install -r requirements/runtime.txt,pip3 install -r requirements/tests.txt -python3 setup.py install -``` - -**Install timm.** -```shell -pip3 install timm -``` +Download nuScenes V1.0-mini data and CAN bus expansion data from [HERE](https://www.nuscenes.org/download). Prepare +nuscenes data by running. -## NuScenes -Download nuScenes V1.0-mini data and CAN bus expansion data [HERE](https://www.nuscenes.org/download). Prepare nuscenes data by running - - -**Download CAN bus expansion** -``` -cd .. +```bash mkdir data -cd data +cd data/ + # download 'can_bus.zip' unzip can_bus.zip + # move can_bus to data dir ``` -**Prepare nuScenes data** +Prepare nuScenes data. -*We genetate custom annotation files which are different from mmdet3d's* -``` -cd .. +We genetate custom annotation files which are different from mmdet3d's + +```bash +cd ../ python3 tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0-mini --canbus ./data ``` Using the above code will generate `nuscenes_infos_temporal_{train,val}.pkl`. -## Prepare pretrained models +Prepare pretrained models. + ```shell mkdir ckpts cd ckpts & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/bevformer_r101_dcn_24ep.pth -cd .. +cd ../ ``` -## Prerequisites +### Install Dependencies -**Please ensure you have prepared the environment and the nuScenes dataset.** +```shell +# Install mmcv-full +cd mmcv/ +bash clean_mmcv.sh +bash build_mmcv.sh +bash install_mmcv.sh -## Train and Test +# Install mmdet and mmseg +pip3 install mmdet==2.25.0 +pip3 install mmsegmentation==0.25.0 + +# Install mmdet3d from source code +cd ../mmdetection3d +pip3 install -r requirements.txt,OR pip3 install -r requirements/optional.txt,pip3 install -r requirements/runtime.txt,pip3 install -r requirements/tests.txt +python3 setup.py install -Train BEVFormer with 8 GPUs +# Install timm +pip3 install timm ``` + +## Model Training + +```bash +# Train BEVFormer with 8 GPUs ./tools/dist_train.sh ./projects/configs/bevformer/bevformer_base.py 8 -``` -Eval BEVFormer with 8 GPUs -``` +# Eval BEVFormer with 8 GPUs ./tools/dist_test.sh ./projects/configs/bevformer/bevformer_base.py ./path/to/ckpts.pth 8 ``` -Note: using 1 GPU to eval can obtain slightly higher performance because continuous video may be truncated with multiple GPUs. By default we report the score evaled with 8 GPUs. - +Note: using 1 GPU to eval can obtain slightly higher performance because continuous video may be truncated with multiple +GPUs. By default we report the score evaled with 8 GPUs. -## Using FP16 to train the model. -The above training script can not support FP16 training, +The above training script can not support FP16 training, and we provide another script to train BEVFormer with FP16. -``` +```bash +# Using FP16 to train the model ./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 8 ``` -## Results on BI-V100 -| GPUs | model | NDS | mAP | -|------|----------------|--------|--------| -| 1x8 | bevformer_base | 0.3516 | 0.3701 | +## Model Results + +| Model | GPU | model | NDS | mAP | +|-----------|------------|----------------|--------|--------| +| BEVFormer | BI-V100 x8 | bevformer_base | 0.3516 | 0.3701 | + +## References -## Reference: -[BEVFormer](https://github.com/fundamentalvision/BEVFormer/tree/master) +[BEVFormer](https://github.com/fundamentalvision/BEVFormer/tree/master) diff --git a/cv/3d_detection/centerpoint/pytorch/README.md b/cv/3d_detection/centerpoint/pytorch/README.md index e6de3a160..27a9d5d67 100644 --- a/cv/3d_detection/centerpoint/pytorch/README.md +++ b/cv/3d_detection/centerpoint/pytorch/README.md @@ -1,10 +1,35 @@ # CenterPoint -## Model description -Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions. +## Model Description + +CenterPoint is a state-of-the-art 3D object detection and tracking framework that represents objects as points rather +than bounding boxes. It first detects object centers using a keypoint detector, then regresses other attributes like +size, orientation, and velocity. A second stage refines these estimates using additional point features. This approach +simplifies 3D tracking to greedy closest-point matching, achieving top performance on nuScenes and Waymo datasets while +maintaining efficiency and simplicity in implementation. + +## Model Preparation + +### Prepare Resources + +Download nuScenes from . + +```bash +mkdir -p data/nuscenes +# For nuScenes Dataset +└── NUSCENES_DATASET_ROOT + ├── samples <-- key frames + ├── sweeps <-- frames without annotation + ├── maps <-- unused + ├── v1.0-trainval <-- metadata + +python3 tools/create_data.py nuscenes_data_prep --root-path ./data/nuscenes --version="v1.0-trainval" --nsweeps=10 -## Step 1: Installation ``` + +### Install Dependencies + +```bash ## install libGL and libboost yum install mesa-libGL yum install boost-devel @@ -30,48 +55,25 @@ bash setup.sh export PYTHONPATH="${PYTHONPATH}:PATH_TO_CENTERPOINT" ``` -## Step 2: Preparing datasets -Download nuScenes from https://www.nuscenes.org/download -``` -mkdir -p data/nuscenes -# For nuScenes Dataset -└── NUSCENES_DATASET_ROOT - ├── samples <-- key frames - ├── sweeps <-- frames without annotation - ├── maps <-- unused - ├── v1.0-trainval <-- metadata - -python3 tools/create_data.py nuscenes_data_prep --root-path ./data/nuscenes --version="v1.0-trainval" --nsweeps=10 - -``` - - -## Step 3: Training - -### Single GPU training +## Model Training ```bash +# Single GPU training python3 ./tools/train.py ./configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py -``` - -### Multiple GPU training -```bash +# Multiple GPU training python3 -m torch.distributed.launch --nproc_per_node=8 ./tools/train.py ./configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py -``` -### Evaluation - -```bash +# Evaluation python3 ./tools/dist_test.py ./configs/nusc/voxelnet/nusc_centerpoint_voxelnet_01voxel.py --work_dir work_dirs/nusc_centerpoint_voxelnet_01voxel --checkpoint work_dirs/nusc_centerpoint_voxelnet_01voxel/latest.pth ``` -## Results +## Model Results -GPUs | FPS | ACC ----- | --- | --- -BI-V100 x8 | 2.423 s/step | mAP: 0.5654 +| Model | GPU | FPS | ACC | +|-------------|------------|--------------|-------------| +| CenterPoint | BI-V100 x8 | 2.423 s/step | mAP: 0.5654 | +## References -## Reference - [CenterPoint](https://github.com/tianweiy/CenterPoint) diff --git a/cv/3d_detection/paconv/pytorch/README.md b/cv/3d_detection/paconv/pytorch/README.md index 3ff390de8..69a10d090 100644 --- a/cv/3d_detection/paconv/pytorch/README.md +++ b/cv/3d_detection/paconv/pytorch/README.md @@ -1,9 +1,24 @@ -# PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds +# PAConv -## Model description -We introduce Position Adaptive Convolution (PAConv), a generic convolution operation for 3D point cloud processing. The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet. In this way, the kernel is built in a data-driven manner, endowing PAConv with more flexibility than 2D convolutions to better handle the irregular and unordered point cloud data. Besides, the complexity of the learning process is reduced by combining weight matrices instead of brutally predicting kernels from point positions. Furthermore, different from the existing point convolution operators whose network architectures are often heavily engineered, we integrate our PAConv into classical MLP-based point cloud pipelines without changing network configurations. Even built on simple networks, our method still approaches or even surpasses the state-of-the-art models, and significantly improves baseline performance on both classification and segmentation tasks, yet with decent efficiency. Thorough ablation studies and visualizations are provided to understand PAConv. +## Model Description -## Step 1: Installation +PAConv (Position Adaptive Convolution) is an innovative convolution operation for 3D point cloud processing that +dynamically assembles convolution kernels. It constructs kernels by adaptively combining weight matrices from a Weight +Bank, with coefficients learned from point positions through ScoreNet. This data-driven approach provides flexibility to +handle irregular point cloud data efficiently. PAConv integrates seamlessly with existing MLP-based pipelines, achieving +state-of-the-art performance in classification and segmentation tasks while maintaining computational efficiency. + +## Model Preparation + +### Prepare Resources + +```bash +cd data/s3dis/ +``` + +Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. + +### Install Dependencies ```bash # Install libGL @@ -18,14 +33,7 @@ cd mmdetection3d pip install -v -e . ``` -## Step 2: Preparing datasets - -```bash -cd data/s3dis/ -``` -Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. - -## Step 3: Training +## Model Training ```bash # Single GPU training @@ -36,13 +44,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/paconv/paconv_cuda_ssg_8x8_cosine_200e_s3dis_seg-3d-13class.py 8 ``` -## Results +## Model Results -classes | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls ----------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|--------|---------|--------|--------|--------- -results | 0.9488 | 0.9838 | 0.8184 | 0.0000 | 0.1682 | 0.5836 | 0.7387 | 0.7782 | 0.8832 | 0.6101 | 0.7081 | 0.6876 | 0.5810 | 0.6530 | 0.8910 | 0.7131 +| Model | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls | fps | +|--------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|--------|---------|--------|--------|---------|------------------| +| PAConv | 0.9488 | 0.9838 | 0.8184 | 0.0000 | 0.1682 | 0.5836 | 0.7387 | 0.7782 | 0.8832 | 0.6101 | 0.7081 | 0.6876 | 0.5810 | 0.6530 | 0.8910 | 0.7131 | 65.3 samples/sec | -fps = batchsize*8/1batchtime = 65.3 samples/sec +## References -## Reference -[mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) +- [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) diff --git a/cv/3d_detection/part_a2_anchor/pytorch/README.md b/cv/3d_detection/part_a2_anchor/pytorch/README.md index 66b4549eb..3e45faedc 100644 --- a/cv/3d_detection/part_a2_anchor/pytorch/README.md +++ b/cv/3d_detection/part_a2_anchor/pytorch/README.md @@ -1,33 +1,16 @@ # Part-A2-Anchor -## Model description +## Model Description -3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part-A2 net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part-A2 net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data. +Part-A2-Anchor is an advanced 3D object detection framework for LiDAR point clouds, extending PointRCNN with enhanced +part-aware and aggregation capabilities. It operates in two stages: first, it predicts 3D proposals and intra-object +part locations using free part supervisions; second, it aggregates these parts to refine box scores and locations. This +approach effectively captures object geometry, achieving state-of-the-art performance on the KITTI dataset while +maintaining computational efficiency for practical applications. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -## switch to devtoolset-7 env -source /opt/rh/devtoolset-7/enable - -# Install spconv -cd toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh - -# Install openpcdet -cd toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -52,17 +35,36 @@ cd toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/PartA2.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +## switch to devtoolset-7 env +source /opt/rh/devtoolset-7/enable + +# Install spconv +cd toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh + +# Install openpcdet +cd toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools +python3 train.py --cfg_file cfgs/kitti_models/PartA2.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/PartA2.yaml ``` diff --git a/cv/3d_detection/part_a2_free/pytorch/README.md b/cv/3d_detection/part_a2_free/pytorch/README.md index e64bebc0e..189ba0b7e 100644 --- a/cv/3d_detection/part_a2_free/pytorch/README.md +++ b/cv/3d_detection/part_a2_free/pytorch/README.md @@ -1,30 +1,16 @@ # Part-A2-Free -## Model description +## Model Description -In this work, we propose the part-aware and aggregation neural network (PartA2-Net). The whole framework consists of the part-aware stage and the part-aggregation stage. Firstly, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. At the time of submission (July-9 2019), our PartA2-Net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection learderbaord by utilizing only the LiDAR point cloud data. +Part-A2-Free is an advanced 3D object detection framework for LiDAR point clouds, leveraging part-aware and aggregation +techniques. It operates in two stages: first predicting 3D proposals and intra-object part locations using free part +supervisions, then aggregating these parts to refine box scores and locations. This approach effectively captures object +geometry through a novel RoI-aware point cloud pooling module, achieving state-of-the-art performance on the KITTI +dataset while maintaining computational efficiency for practical applications. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install spconv -cd toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh - -# Install openpcdet -cd toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -49,17 +35,33 @@ cd toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/PartA2_free.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install spconv +cd toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh + +# Install openpcdet +cd toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/PartA2_free.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/PartA2_free.yaml ``` diff --git a/cv/3d_detection/pointnet2/pytorch/README.md b/cv/3d_detection/pointnet2/pytorch/README.md index 3b8dfcdc7..bd3547177 100644 --- a/cv/3d_detection/pointnet2/pytorch/README.md +++ b/cv/3d_detection/pointnet2/pytorch/README.md @@ -1,10 +1,25 @@ -# PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space -> [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](https://arxiv.org/abs/1706.02413) +# PointNet++ -## Model description -Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. +## Model Description + +PointNet++ is a hierarchical neural network for processing 3D point cloud data, extending the capabilities of PointNet. +It recursively applies PointNet on nested partitions of the input point set, enabling the learning of local features at +multiple scales. The network adapts to varying point densities through novel set learning layers, improving performance +on complex scenes. PointNet++ excels in tasks like 3D object classification and segmentation by effectively capturing +fine-grained geometric patterns in point clouds. + +## Model Preparation + +### Prepare Resources + +```bash +cd data/s3dis/ +``` + +Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. + +### Install Dependencies -## Installing packages ```bash # Install libGL ## CentOS @@ -18,13 +33,8 @@ cd mmdetection3d pip install -v -e . ``` -## Prepare S3DIS Data -``` -cd data/s3dis/ -``` -Enter the data/s3dis/ folder, then prepare the dataset according to readme instructions in data/s3dis/ folder. +## Model Training -## Training ```bash # Single GPU training python3 tools/train.py configs/pointnet2/pointnet2_msg_2xb16-cosine-80e_s3dis-seg.py @@ -34,10 +44,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/pointnet2/pointnet2_msg_2xb16-cosine-80e_s3dis-seg.py 8 ``` -## Training Results -| Classes | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls | -| --------| ------- | ----- | ------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ |------ | -| Results | 0.9147 | 0.9742 | 0.7800 | 0.0000 | 0.1881 | 0.5361 | 0.2265 | 0.6922 | 0.8249 | 0.3303 | 0.6585 | 0.5422 | 0.4607 | 0.5483 | 0.8490 | 0.6168 | +## Model Results + +| Model | ceiling | floor | wall | beam | column | window | door | table | chair | sofa | bookcase | board | clutter | miou | acc | acc_cls | +|------------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|----------|--------|---------|--------|--------|---------| +| PointNet++ | 0.9147 | 0.9742 | 0.7800 | 0.0000 | 0.1881 | 0.5361 | 0.2265 | 0.6922 | 0.8249 | 0.3303 | 0.6585 | 0.5422 | 0.4607 | 0.5483 | 0.8490 | 0.6168 | + +## References -## Reference -[mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) \ No newline at end of file +- [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.4.0) diff --git a/cv/3d_detection/pointpillars/pytorch/README.md b/cv/3d_detection/pointpillars/pytorch/README.md index 29d0d0c37..ce5c69321 100755 --- a/cv/3d_detection/pointpillars/pytorch/README.md +++ b/cv/3d_detection/pointpillars/pytorch/README.md @@ -1,13 +1,86 @@ # PointPillars -## Model description -A Simple PointPillars PyTorch Implenmentation for 3D Lidar(KITTI) Detection. +## Model Description -- It can be run without installing [mmcv](https://github.com/open-mmlab/mmcv), [Spconv](https://github.com/traveller59/spconv), [mmdet](https://github.com/open-mmlab/mmdetection) or [mmdet3d](https://github.com/open-mmlab/mmdetection3d). -- Only one detection network (PointPillars) was implemented in this repo, so the code may be more easy to read. -- Sincere thanks for the great open-souce architectures [mmcv](https://github.com/open-mmlab/mmcv), [mmdet](https://github.com/open-mmlab/mmdetection) and [mmdet3d](https://github.com/open-mmlab/mmdetection3d), which helps me to learn 3D detetion and implement this repo. +PointPillars is an efficient 3D object detection framework designed for LiDAR point cloud data. It organizes point +clouds into vertical columns (pillars) to create a pseudo-image representation, enabling the use of 2D convolutional +networks for processing. This approach balances accuracy and speed, making it suitable for real-time applications like +autonomous driving. PointPillars achieves state-of-the-art performance on the KITTI dataset while maintaining +computational efficiency through its pillar-based encoding and simplified network architecture. + +## Model Preparation + +### Prepare Resources + +Download: + +- [point cloud (29GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip) +- [images (12 GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip) +- [calibration files (16 MB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip) +- [labels (5 MB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip) + +Format the datasets as follows: + +```bash +kitti + |- ImageSets + |- train.txt + |- val.txt + |- test.txt + |- trainval.txt + |- training + |- calib (#7481 .txt) + |- image_2 (#7481 .png) + |- label_2 (#7481 .txt) + |- velodyne (#7481 .bin) + |- testing + |- calib (#7518 .txt) + |- image_2 (#7518 .png) + |- velodyne (#7418 .bin) +``` + +The train.txt、val.txt、test.txt and trainval.txt you can get from: + +```bash +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt +wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt +``` + +Pre-process KITTI datasets First. + +```bash +ln -s path/to/kitti/ImageSets ./dataset +python3 pre_process_kitti.py --data_root your_path_to_kitti +``` + +Now, we have datasets as follows: + +```bash +kitti + |- training + |- calib (#7481 .txt) + |- image_2 (#7481 .png) + |- label_2 (#7481 .txt) + |- velodyne (#7481 .bin) + |- velodyne_reduced (#7481 .bin) + |- testing + |- calib (#7518 .txt) + |- image_2 (#7518 .png) + |- velodyne (#7518 .bin) + |- velodyne_reduced (#7518 .bin) + |- kitti_gt_database (# 19700 .bin) + |- kitti_infos_train.pkl + |- kitti_infos_val.pkl + |- kitti_infos_trainval.pkl + |- kitti_infos_test.pkl + |- kitti_dbinfos_train.pkl + +``` + +### Install Dependencies -## [Compile] ```bash # Install libGL ## CentOS @@ -22,72 +95,13 @@ python3 setup.py build_ext --inplace pip install . ``` -## [Datasets] - -1. Download - - Download [point cloud](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip)(29GB), [images](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip)(12 GB), [calibration files](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip)(16 MB)和[labels](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip)(5 MB)。 - Format the datasets as follows: - ``` - kitti - |- ImageSets - |- train.txt - |- val.txt - |- test.txt - |- trainval.txt - |- training - |- calib (#7481 .txt) - |- image_2 (#7481 .png) - |- label_2 (#7481 .txt) - |- velodyne (#7481 .bin) - |- testing - |- calib (#7518 .txt) - |- image_2 (#7518 .png) - |- velodyne (#7418 .bin) - ``` - The train.txt、val.txt、test.txt and trainval.txt you can get from: - ``` - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt - wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt - ``` -2. Pre-process KITTI datasets First - - ``` - ln -s path/to/kitti/ImageSets ./dataset - python3 pre_process_kitti.py --data_root your_path_to_kitti - ``` - - Now, we have datasets as follows: - ``` - kitti - |- training - |- calib (#7481 .txt) - |- image_2 (#7481 .png) - |- label_2 (#7481 .txt) - |- velodyne (#7481 .bin) - |- velodyne_reduced (#7481 .bin) - |- testing - |- calib (#7518 .txt) - |- image_2 (#7518 .png) - |- velodyne (#7518 .bin) - |- velodyne_reduced (#7518 .bin) - |- kitti_gt_database (# 19700 .bin) - |- kitti_infos_train.pkl - |- kitti_infos_val.pkl - |- kitti_infos_trainval.pkl - |- kitti_infos_test.pkl - |- kitti_dbinfos_train.pkl - - ``` - -## [Training] - -### Single GPU training +## Model Training + ```bash +# Single GPU training python3 train.py --data_root your_path_to_kitti ``` -## Reference -[PointPillars](https://github.com/zhulf0804/PointPillars/tree/620e6b0d07e4cb37b7b0114f26b934e8be92a0ba) \ No newline at end of file +## References + +- [PointPillars](https://github.com/zhulf0804/PointPillars/tree/620e6b0d07e4cb37b7b0114f26b934e8be92a0ba) diff --git a/cv/3d_detection/pointrcnn/pytorch/README.md b/cv/3d_detection/pointrcnn/pytorch/README.md index 573be6585..42d36c3dd 100644 --- a/cv/3d_detection/pointrcnn/pytorch/README.md +++ b/cv/3d_detection/pointrcnn/pytorch/README.md @@ -1,31 +1,22 @@ # PointRCNN -## Model description -PointRCNN 3D object detector to directly generated accurate 3D box proposals from raw point cloud in a bottom-up manner, which are then refined in the canonical coordinate by the proposed bin-based 3D box regression loss. To the best of our knowledge, PointRCNN is the first two-stage 3D object detector for 3D object detection by using only the raw point cloud as input. PointRCNN is evaluated on the KITTI dataset and achieves state-of-the-art performance on the KITTI 3D object detection leaderboard among all published works at the time of submission. +## Model Description -## Step 1: Installation -```bash -## install libGL -yum install -y mesa-libGL +PointRCNN is a two-stage 3D object detection framework that directly processes raw point cloud data. In the first stage, +it generates accurate 3D box proposals in a bottom-up manner. The second stage refines these proposals using a bin-based +3D box regression loss in canonical coordinates. As the first two-stage detector using only raw point clouds, PointRCNN +achieves state-of-the-art performance on the KITTI dataset, demonstrating superior accuracy in 3D object detection +tasks. -pip3 install easydict tensorboardX shapely fire scikit-image +## Model Preparation -bash build_and_install.sh +### Prepare Resources -## install numba -pushd numba/ -bash clean_numba.sh -bash build_numba.sh -bash install_numba.sh -popd -``` - -## Step 2: Preparing datasets Download the kitti dataset from Download the "planes" subdataset from -``` +```bash PointRCNN ├── data │ ├── KITTI @@ -41,60 +32,64 @@ PointRCNN ``` Generate gt database + ```bash pushd tools/ python3 generate_gt_database.py --class_name 'Car' --split train popd ``` - -## Step 3: Training -### Training of RPN stage +### Install Dependencies ```bash -pushd tools/ +## install libGL +yum install -y mesa-libGL -# Single GPU training -export CUDA_VISIBLE_DEVICES=0 -python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 16 --train_mode rpn --epochs 200 +pip3 install easydict tensorboardX shapely fire scikit-image -# Multiple GPU training -CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rpn --epochs 200 --mgpus +bash build_and_install.sh +## install numba +pushd numba/ +bash clean_numba.sh +bash build_numba.sh +bash install_numba.sh popd ``` -### Training of RCNN stage +## Model Training ```bash -pushd tools/ +cd tools/ -# Single GPU training +# Training of RPN stage +## Single GPU training export CUDA_VISIBLE_DEVICES=0 -python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth +python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 16 --train_mode rpn --epochs 200 -# Multiple GPU training -CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth --mgpus +## Multiple GPU training +CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rpn --epochs 200 --mgpus -popd -``` -## Step 4: Evaluation +# Training of RCNN stage +## Single GPU training +export CUDA_VISIBLE_DEVICES=0 +python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth -```bash -pushd tools/ +## Multiple GPU training +CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_rcnn.py --cfg_file cfgs/default.yaml --batch_size 32 --train_mode rcnn --epochs 70 --ckpt_save_interval 2 --rpn_ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth --mgpus +# Evaluation python3 eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt ../output/rpn/default/ckpt/checkpoint_epoch_200.pth --batch_size 4 --eval_mode rpn python3 eval_rcnn.py --cfg_file cfgs/default.yaml --ckpt ../output/rcnn/default/ckpt/checkpoint_epoch_70.pth --batch_size 4 --eval_mode rcnn - -popd ``` -## Results +## Model Results + +| Model | GPU | Stage | FPS | ACC | +|-----------|------------|-------|-------------|-----------------------| +| PointRCNN | BI-V100 x8 | RPN | 127.56 s/it | iou avg: 0.5417 | +| PointRCNN | BI-V100 x8 | RCNN | 975.71 s/it | avg detections: 7.243 | -GPUs|Stage|FPS|ACC -----|-----|---|--- -BI-V100 x8|RPN| 127.56 s/it | iou avg: 0.5417 -BI-V100 x8|RCNN| 975.71 s/it | avg detections: 7.243 +## References -## Reference - [PointRCNN](https://github.com/sshaoshuai/PointRCNN) diff --git a/cv/3d_detection/pointrcnn_iou/pytorch/README.md b/cv/3d_detection/pointrcnn_iou/pytorch/README.md index 3c3feb003..fe3b2e578 100644 --- a/cv/3d_detection/pointrcnn_iou/pytorch/README.md +++ b/cv/3d_detection/pointrcnn_iou/pytorch/README.md @@ -1,38 +1,16 @@ # PointRCNN-IoU -## Model description +## Model Description -PointRCNN-IoU is an extension of the PointRCNN object detection framework that incorporates Intersection over Union (IoU) as a metric for evaluation. IoU is a common metric used in object detection tasks to measure the overlap between predicted bounding boxes and ground truth bounding boxes. +PointRCNN-IoU is an enhanced version of the PointRCNN framework that incorporates Intersection over Union (IoU) +optimization for 3D object detection. It processes raw point cloud data in two stages: first generating 3D proposals, +then refining them with IoU-aware regression. This approach improves bounding box accuracy by directly optimizing the +overlap between predicted and ground truth boxes. PointRCNN-IoU maintains the efficiency of its predecessor while +achieving higher precision in 3D object detection tasks. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install numba -pushd /toolbox/numba -python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log -pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl -popd - -# Install spconv -pushd /toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh -popd - -# Install openpcdet -pushd /toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -popd -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -57,17 +35,41 @@ cd /toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/pointrcnn_iou.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install numba +pushd /toolbox/numba +python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log +pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl +popd + +# Install spconv +pushd /toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh +popd + +# Install openpcdet +pushd /toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh +popd ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/pointrcnn_iou.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/pointrcnn_iou.yaml ``` diff --git a/cv/3d_detection/second/pytorch/README.md b/cv/3d_detection/second/pytorch/README.md index 52d028ddd..e28fe30e9 100644 --- a/cv/3d_detection/second/pytorch/README.md +++ b/cv/3d_detection/second/pytorch/README.md @@ -1,38 +1,16 @@ # SECOND -## Model description +## Model Description -LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed. +SECOND is an efficient 3D object detection framework for LiDAR point cloud data, utilizing sparse convolutional networks +to enhance information retention. It introduces improved sparse convolution methods for faster training and inference, +along with novel angle loss regression for better orientation estimation. The framework also incorporates a unique data +augmentation approach to boost convergence speed and performance. SECOND achieves state-of-the-art results on the KITTI +benchmark while maintaining rapid inference, making it suitable for real-time applications like autonomous driving. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install numba -pushd /toolbox/numba -python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log -pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl -popd - -# Install spconv -pushd /toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh -popd - -# Install openpcdet -pushd /toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -popd -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -57,17 +35,41 @@ cd /toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/second.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install numba +pushd /toolbox/numba +python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log +pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl +popd + +# Install spconv +pushd /toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh +popd + +# Install openpcdet +pushd /toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh +popd ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/second.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/second.yaml ``` diff --git a/cv/3d_detection/second_iou/pytorch/README.md b/cv/3d_detection/second_iou/pytorch/README.md index aaef1ec2c..a36cabb15 100644 --- a/cv/3d_detection/second_iou/pytorch/README.md +++ b/cv/3d_detection/second_iou/pytorch/README.md @@ -1,38 +1,17 @@ # SECOND-IoU -## Model description +## Model Description -we present a novel approach called SECOND (Sparsely Embedded CONvolutional Detection), which addresses these challenges in 3D convolution-based detection by maximizing the use of the rich 3D information present in point cloud data. This method incorporates several improvements to the existing convolutional network architecture. Spatially sparse convolutional networks are introduced for LiDAR-based detection and are used to extract information from the z-axis before the 3D data are downsampled to something akin to 2D image data. +SECOND-IoU is an enhanced version of the SECOND framework that incorporates Intersection over Union (IoU) optimization +for 3D object detection from LiDAR point clouds. It leverages sparse convolutional networks to efficiently process 3D +data while maintaining spatial information. The model introduces IoU-aware regression to improve bounding box accuracy +and orientation estimation. SECOND-IoU achieves state-of-the-art performance on 3D detection benchmarks, offering faster +inference speeds and better precision than traditional methods, making it suitable for real-time applications like +autonomous driving. -## Step 1: Installation +## Model Preparation -```bash -## install libGL and libboost -yum install mesa-libGL -yum install boost-devel - -# Install numba -pushd /toolbox/numba -python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log -pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl -popd - -# Install spconv -pushd /toolbox/spconv -bash clean_spconv.sh -bash build_spconv.sh -bash install_spconv.sh -popd - -# Install openpcdet -pushd /toolbox/openpcdet -pip3 install -r requirements.txt -bash build_openpcdet.sh -bash install_openpcdet.sh -popd -``` - -## Step 2: Preparing datasets +### Prepare Resources Download the kitti dataset from @@ -57,17 +36,41 @@ cd /toolbox/openpcdet python3 -m pcdet.datasets.kitti.kitti_dataset create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml ``` -## Step 3: Training - -### Single GPU training +### Install Dependencies ```bash -cd tools -python3 train.py --cfg_file cfgs/kitti_models/second_iou.yaml +## install libGL and libboost +yum install mesa-libGL +yum install boost-devel + +# Install numba +pushd /toolbox/numba +python3 setup.py bdist_wheel -d build_pip 2>&1 | tee compile.log +pip3 install build_pip/numba-0.56.4-cp310-cp310-linux_x86_64.whl +popd + +# Install spconv +pushd /toolbox/spconv +bash clean_spconv.sh +bash build_spconv.sh +bash install_spconv.sh +popd + +# Install openpcdet +pushd /toolbox/openpcdet +pip3 install -r requirements.txt +bash build_openpcdet.sh +bash install_openpcdet.sh +popd ``` -### Multiple GPU training +## Model Training ```bash +# Single GPU training +cd tools/ +python3 train.py --cfg_file cfgs/kitti_models/second_iou.yaml + +# Multiple GPU training bash scripts/dist_train.sh 16 --cfg_file cfgs/kitti_models/second_iou.yaml ``` -- Gitee