diff --git a/README.md b/README.md index 4af94fb0fcae0d8c6d3eede474bd40c53db9e15b..471ea88c5122ca46d418aba54edae45b65191d6a 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 - [Classification](#classification) - [Detection](#detection) + - [3D-Detection](#3d-detection) - [OCR](#ocr) - [Point Cloud](#point-cloud) - [Pose](#pose) @@ -18,6 +19,8 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 - [Multimodal](#multimodal模型列表) +- [GNN](#gnn模型列表) + - Natural Language Processing(NLP) - [Cloze Test](#cloze-test) @@ -64,13 +67,17 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 [MobileNetV2](cv/classification/mobilenetv2/pytorch/README.md) | PyTorch | ImageNet [MobileNetV3](cv/classification/mobilenetv3/pytorch/README.md) | PyTorch | ImageNet [MobileNetV3](cv/classification/mobilenetv3/paddlepaddle/README.md) | PaddlePaddle | ImageNet +[RepVGG](cv/classification/repvgg/pytorch/README.md) | PyTorch | ImageNet +[RepVGG](cv/classification/repvgg/paddlepaddle/README.md) | PaddlePaddle | ImageNet [ResNeSt14](cv/classification/resnest14/pytorch/README.md) | PyTorch | ImageNet [ResNeSt50](cv/classification/resnest50/pytorch/README.md) | PyTorch | ImageNet +[ResNeSt50](cv/classification/resnest50/paddlepaddle/README.md) | PaddlePaddle | ImageNet [ResNeSt101](cv/classification/resnest101/pytorch/README.md) | PyTorch | ImageNet [ResNeSt269](cv/classification/resnest269/pytorch/README.md) | PyTorch | ImageNet [ResNet18](cv/classification/resnet18/pytorch/README.md) | PyTorch | ImageNet [ResNet50](cv/classification/resnet50/pytorch/README.md) | PyTorch | ImageNet [ResNet50](cv/classification/resnet50/paddlepaddle/README.md) | PaddlePaddle | ImageNet +[ResNet50](cv/classification/resnet50/tensorflow/README.md) | TensorFlow | ImageNet [ResNet101](cv/classification/resnet101/pytorch/README.md) | PyTorch | ImageNet [ResNet152](cv/classification/resnet152/pytorch/README.md) | PyTorch | ImageNet [ResNeXt50_32x4d](cv/classification/resnext50_32x4d/pytorch/README.md) | PyTorch | ImageNet @@ -79,41 +86,73 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 [ShuffleNetV2](cv/classification/shufflenetv2/pytorch/README.md) | PyTorch | ImageNet [SqueezeNet](cv/classification/squeezenet/pytorch/README.md) | PyTorch | ImageNet [Swin Transformer](cv/classification/swin_transformer/pytorch/README.md) | PyTorch | ImageNet +[Swin Transformer](cv/classification/swin_transformer/paddlepaddle/README.md) | PaddlePaddle | ImageNet [VGG16](cv/classification/vgg/pytorch/README.md) | PyTorch | ImageNet [VGG16](cv/classification/vgg/paddlepaddle/README.md) | PaddlePaddle | ImageNet [Wave-MLP](cv/classification/wavemlp/pytorch/README.md) | PyTorch | ImageNet [Wide_ResNet101_2](cv/classification/wide_resnet101_2/pytorch/README.md) | PyTorch | ImageNet [Xception](cv/classification/xception/pytorch/README.md) | PyTorch | ImageNet + +### Image Generation + +模型名称 | 框架 | 数据集 +-------- | ------ | ---- +[DCGAN](cv/image_generation/dcgan/MindSpore/README.md) | MindSpore | ImageNet + ### Detection 模型名称 | 框架 | 数据集 -------- | ------ | ---- [AutoAssign](cv/detection/autoassign/pytorch/README.md) | PyTorch | COCO [CenterNet](cv/detection/centernet/pytorch/README.md) | PyTorch | COCO +[CenterNet](cv/detection/centernet/paddlepaddle/README.md) | PaddlePaddle | COCO +[DeepSORT](cv/tracking/deep_sort/pytorch/README.md) | PyTorch | Market-1501 +[DETR](cv/detection/detr/paddlepaddle/README.md) | PaddlePaddle | COCO [Faster R-CNN](cv/detection/fasterrcnn/pytorch/README.md) | PyTorch | COCO +[FCOS](cv/detection/fcos/paddlepaddle/README.md) | PaddlePaddle | COCO +[FCOS](cv/detection/fcos/pytorch/README.md) | PyTorch | COCO [Mask R-CNN](cv/detection/maskrcnn/pytorch/README.md) | PyTorch | COCO [Mask R-CNN](cv/detection/maskrcnn/paddlepaddle/README.md) | PaddlePaddle | COCO +[PP-YOLOE](cv/detection/pp-yoloe/paddlepaddle/README.md) | PaddlePaddle | COCO [PVANet](cv/detection/pvanet/pytorch/README.md) | PyTorch | COCO +[RetinaFace](cv/face/retinaface/pytorch/README.md) | PyTorch | WiderFace [RetinaNet](cv/detection/retinanet/pytorch/README.md) | PyTorch | COCO +[RetinaNet](cv/detection/retinanet/paddlepaddle/README.md) | PaddlePaddle | COCO [SSD](cv/detection/ssd/pytorch/README.md) | PyTorch | COCO [SSD](cv/detection/ssd/paddlepaddle/README.md) | PaddlePaddle | COCO +[SSD](cv/detection/ssd/tensorflow/README.md) | TensorFlow | COCO +[SSD](cv/detection/ssd/MindSpore/README.md) | MindSpore | COCO +[SOLO](cv/instance_segmentation/SOLO/pytorch/README.md) | PyTorch | COCO +[SOLOv2](cv/detection/solov2/paddlepaddle/README.md) | PaddlePaddle | COCO +[YOLACT++](cv/instance_segmentation/yolact/pytorch/README.md) | PyTorch | COCO [YOLOF](cv/detection/yolof/pytorch/README.md) | PyTorch | COCO [YOLOv3](cv/detection/yolov3/pytorch/README.md) | PyTorch | COCO [YOLOv3](cv/detection/yolov3/paddlepaddle/README.md) | PaddlePaddle | COCO +[YOLOv3](cv/detection/yolov3/tensorflow/README.md) | TensorFlow | COCO [YOLOv5](cv/detection/yolov5/pytorch/README.md) | PyTorch | COCO -[PP-YOLOE](cv/detection/pp-yoloe/paddlepaddle/README.md) | PaddlePaddle | COCO +[YOLOv7](cv/detection/yolov7/pytorch/README.md) | PyTorch | COCO + +### 3D-Detection + +模型名称 | 框架 | 数据集 +-------- | ------ | ---- +[BEVFormer](cv/3d_detection/BEVFormer/pytorch/README.md) | PyTorch | nuScenes&CAN bus +[PointPillars](cv/3d_detection/pointpillars/pytorch/README.md) | PyTorch | KITTI ### OCR 模型名称 | 框架 | 数据集 -------- | ------ | ---- -[SAR](cv/ocr/sar/pytorch/README.md) | PyTorch | OCR_Recog -[SATRN](cv/ocr/satrn/pytorch/base/README.md) | PyTorch | OCR_Recog -[PSE](cv/ocr/pse/paddlepaddle/README.md) | PaddlePaddle | OCR_Recog [CRNN](cv/ocr/crnn/paddlepaddle/README.md) | PaddlePaddle | LMDB +[DBNet](cv/ocr/dbnet/pytorch/README.md) | PyTorch | ICDAR2015 [PP-OCR-DB](cv/ocr/pp-ocr-db/paddlepaddle/README.md) | PaddlePaddle | ICDAR2015 +[PSE](cv/ocr/pse/paddlepaddle/README.md) | PaddlePaddle | OCR_Recog +[SAR](cv/ocr/sar/pytorch/README.md) | PyTorch | OCR_Recog +[SATRN](cv/ocr/satrn/pytorch/base/README.md) | PyTorch | OCR_Recog + + ### Point Cloud 模型名称 | 框架 | 数据集 @@ -134,14 +173,18 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 -------- | ------ | ---- [APCNet](cv/semantic_segmentation/apcnet/pytorch/README.md) | PyTorch | CityScapes [BiSeNet](cv/semantic_segmentation/bisenet/pytorch/README.md) | PyTorch | COCO +[BiSeNetV2](cv/semantic_segmentation/bisenetv2/paddlepaddle/README.md) | PaddlePaddle | CityScapes [CGNet](cv/semantic_segmentation/cgnet/pytorch/README.md) | PyTorch | COCO [ContextNet](cv/semantic_segmentation/contextnet/pytorch/README.md) | PyTorch | COCO [DabNet](cv/semantic_segmentation/dabnet/pytorch/README.md) | PyTorch | COCO [DANet](cv/semantic_segmentation/danet/pytorch/README.md) | PyTorch | COCO [DeepLab](cv/semantic_segmentation/deeplabv3/pytorch/README.md) | PyTorch | COCO [DeepLab](cv/semantic_segmentation/deeplabv3/paddlepaddle/README.md) | PaddlePaddle | COCO +[DeepLabV3](cv/semantic_segmentation/deeplabv3/MindSpore/README.md) | MindSpore | VOC +[DeepLabV3+](cv/semantic_segmentation/deeplabv3plus/paddlepaddle/README.md) | PaddlePaddle | COCO [DenseASPP](cv/semantic_segmentation/denseaspp/pytorch/README.md) | PyTorch | COCO [DFANet](cv/semantic_segmentation/dfanet/pytorch/README.md) | PyTorch | COCO +[dnlnet](cv/semantic_segmentation/dnlnet/paddlepaddle/README.md) | PaddlePaddle | CityScapes [DUNet](cv/semantic_segmentation/dunet/pytorch/README.md) | PyTorch | COCO [EncNet](cv/semantic_segmentation/encnet/pytorch/README.md) | PyTorch | COCO [ENet](cv/semantic_segmentation/enet/pytorch/README.md) | PyTorch | COCO @@ -162,8 +205,9 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 [SegNet](cv/semantic_segmentation/segnet/pytorch/README.md) | PyTorch | COCO [UNet](cv/semantic_segmentation/unet/pytorch/README.md) | PyTorch | COCO [UNet](cv/semantic_segmentation/unet/paddlepaddle/README.md) | PaddlePaddle | CityScapes +[VNet](cv/semantic_segmentation/vnet/tensorflow/README.md) | TensorFlow | Hippocampus [3D-UNet](cv/semantic_segmentation/unet3d/pytorch/README.md) | PyTorch | kits19 -[dnlnet](cv/semantic_segmentation/dnlnet/paddlepaddle/README.md) | PaddlePaddle | CityScapes + ### Super Resolution @@ -198,13 +242,21 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 -------- | ------ | ---- [CLIP](multimodal/Language-Image_Pre-Training/clip/pytorch/README.md) | PyTorch | CIFAR100 [L-Verse](multimodal/Language-Image_Pre-Training/L-Verse/pytorch/README.md) | PyTorch | ImageNet +[Stable Diffusion](multimodal/diffusion/stable-diffusion/training/README.md) | PyTorch | pokemon-images + + +## GNN模型列表 + +### Text Classification +模型名称 | 框架 | 数据集 +-------- | ------ | ---- +[GCN](gnn/GCN/README.md) | MindSpore | Cora&Citeseer ## NLP模型列表 ### Cloze Test - 模型名称 | 框架 | 数据集 -------- | ------ | ---- [GLM](nlp/cloze_test/glm/pytorch/GLMForMultiTokenCloze/README.md) | PyTorch | GLMForMultiTokenCloze @@ -220,8 +272,14 @@ DeepSparkHub甄选上百个应用算法和模型,覆盖AI和通用计算各领 模型名称 | 框架 | 数据集 -------- | ------ | ---- +[BERT NER](nlp/ner/bert/pytorch/README.md) | PyTorch | CoNLL-2003 [BERT Pretraining](nlp/language_model/bert/pytorch/README.md) | PyTorch | MLCommon Wikipedia (2048_shards_uncompressed) [BERT Pretraining](nlp/language_model/bert/paddlepaddle/README.md) | PaddlePaddle | MNLI +[BERT Pretraining](nlp/language_model/bert/tensorflow/base/README.md) | TensorFlow | MNLI +[BERT Pretraining](nlp/language_model/bert/MindSpore/README.md) | MindSpore | SQuAD +[BERT Text Classification](nlp/text_classification/bert/pytorch/README.md) |PyTorch | GLUE +[BERT Text Summerization](nlp/text_summarisation/bert/pytorch/README.md) | PyTorch | cnn_dailymail +[BERT Question Answering](nlp/question_answering/bert/pytorch/README.md) | PyTorch | SQuAD ### Text Correction 模型名称 | 框架 | 数据集 diff --git a/cv/3d_detection/BEVFormer/pytorch/README.md b/cv/3d_detection/BEVFormer/pytorch/README.md index e11c0eb7f83df86220dcd60968b8d2f015b5f189..cc6fa0f63586cac4524fe98465f99fcae68392be 100755 --- a/cv/3d_detection/BEVFormer/pytorch/README.md +++ b/cv/3d_detection/BEVFormer/pytorch/README.md @@ -1,14 +1,15 @@ - # BEVFormer: a Cutting-edge Baseline for Camera-based Detection +# BEVFormer: a Cutting-edge Baseline for Camera-based Detection + +## Model description > **BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers**, ECCV 2022 > - [Paper in arXiv](http://arxiv.org/abs/2203.17270) -# Abstract In this work, the authors present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, the authors design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, the authors propose a temporal self-attention to recurrently fuse the history BEV information. The proposed approach achieves the new state-of-the-art **56.9\%** in terms of NDS metric on the nuScenes test set, which is **9.0** points higher than previous best arts and on par with the performance of LiDAR-based baselines. - -**c. Install mmcv-full.** +## Prepare +**Install mmcv-full.** ```shell cd mmcv bash clean_mmcv.sh @@ -16,20 +17,20 @@ bash build_mmcv.sh bash install_mmcv.sh ``` -**d. Install mmdet and mmseg.** +**Install mmdet and mmseg.** ```shell pip3 install mmdet==2.25.0 pip3 install mmsegmentation==0.25.0 ``` -**e. Install mmdet3d from source code.** +**Install mmdet3d from source code.** ```shell cd ../mmdetection3d pip3 install -r requirements.txt,OR pip3 install -r requirements/optional.txt,pip3 install -r requirements/runtime.txt,pip3 install -r requirements/tests.txt python3 setup.py install ``` -**f. Install timm.** +**Install timm.** ```shell pip3 install timm ``` @@ -58,18 +59,18 @@ python3 tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./da Using the above code will generate `nuscenes_infos_temporal_{train,val}.pkl`. -# Prepare pretrained models +## Prepare pretrained models ```shell mkdir ckpts cd ckpts & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/bevformer_r101_dcn_24ep.pth cd .. ``` -# Prerequisites +## Prerequisites **Please ensure you have prepared the environment and the nuScenes dataset.** -# Train and Test +## Train and Test Train BEVFormer with 8 GPUs ``` @@ -84,7 +85,7 @@ Note: using 1 GPU to eval can obtain slightly higher performance because continu -# Using FP16 to train the model. +## Using FP16 to train the model. The above training script can not support FP16 training, and we provide another script to train BEVFormer with FP16. diff --git a/cv/3d_detection/pointpillars/pytorch/README.md b/cv/3d_detection/pointpillars/pytorch/README.md index 038037e3efc3b602c1ac88471e88c1fac8ad9ed7..258691056e301c765b9f1c764a95446a1e887e56 100755 --- a/cv/3d_detection/pointpillars/pytorch/README.md +++ b/cv/3d_detection/pointpillars/pytorch/README.md @@ -1,5 +1,6 @@ # [PointPillars: Fast Encoders for Object Detection from Point Clouds](https://arxiv.org/abs/1812.05784) +## Model description A Simple PointPillars PyTorch Implenmentation for 3D Lidar(KITTI) Detection. - It can be run without installing [mmcv](https://github.com/open-mmlab/mmcv), [Spconv](https://github.com/traveller59/spconv), [mmdet](https://github.com/open-mmlab/mmdetection) or [mmdet3d](https://github.com/open-mmlab/mmdetection3d). diff --git a/cv/classification/repvgg/pytorch/README.md b/cv/classification/repvgg/pytorch/README.md index be9cb1ea0edfc12c90723fb39c055796b3969645..e4e4397f17bab890fe5fa76d5ac5caa2688a1ec2 100755 --- a/cv/classification/repvgg/pytorch/README.md +++ b/cv/classification/repvgg/pytorch/README.md @@ -1,4 +1,3 @@ - # RepVGG ## Model description A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. diff --git a/cv/classification/resnet50/tensorflow/README.md b/cv/classification/resnet50/tensorflow/README.md index 66083d9525952a2446fb9c0a4baea5842808d16a..e5151215cca78ee03a987c3544781569aba9e5d3 100644 --- a/cv/classification/resnet50/tensorflow/README.md +++ b/cv/classification/resnet50/tensorflow/README.md @@ -1,3 +1,7 @@ +# ResNet50 + +## Model description +Residual Networks, or ResNets, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. ## Prepare @@ -34,4 +38,4 @@ bash run_train_resnet50_multigpu_imagenette.sh | | acc | fps | | --- | --- | --- | -| multi_card | 0.9860 | 236.9 | \ No newline at end of file +| multi_card | 0.9860 | 236.9 | diff --git a/cv/classification/swin_transformer/paddlepaddle/README.md b/cv/classification/swin_transformer/paddlepaddle/README.md index def17535ba792f87fe7ff3e0ce3ba0eb231a8d69..47d906ab435f64f9306c0cb5b125aefc102ac7af 100644 --- a/cv/classification/swin_transformer/paddlepaddle/README.md +++ b/cv/classification/swin_transformer/paddlepaddle/README.md @@ -1,4 +1,4 @@ -# Swin-Transformer +# Swin Transformer ## Model description The Swin Transformer is a type of Vision Transformer. It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. diff --git a/cv/detection/fcos/paddlepaddle/README.md b/cv/detection/fcos/paddlepaddle/README.md index f9a57b0eb25173b5ded0fdd5fbb32659ac9351be..1924f0ecfd95a6f2064578f810d1d36713baa34b 100644 --- a/cv/detection/fcos/paddlepaddle/README.md +++ b/cv/detection/fcos/paddlepaddle/README.md @@ -1,15 +1,15 @@ -# Fcos +# FCOS ## Model description FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free object detection framework with strong performance. -## 克隆代码 +## Get PaddleDetection source code ``` git clone https://github.com/PaddlePaddle/PaddleDetection.git ``` -## 安装PaddleDetection +## Install PaddleDetection ``` cd PaddleDetection @@ -17,13 +17,13 @@ pip install -r requirements.txt python3 setup.py install ``` -## 下载COCO数据集 +## Prepare datasets ``` python3 dataset/coco/download_coco.py ``` -## 运行代码 +## Train ``` # GPU多卡训练 @@ -44,4 +44,4 @@ python3 tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --eval | GPUs | FPS | Train Epochs | Box AP | |------|-----|--------------|------| -| 1x8 | 8.24 | 12 | 39.7 | \ No newline at end of file +| 1x8 | 8.24 | 12 | 39.7 | diff --git a/cv/detection/fcos/pytorch/README.md b/cv/detection/fcos/pytorch/README.md index 6e7b561ce3a6fe90518b14848ca14917a120aa88..144111befb78b45858b52fd07f674acda38fa813 100755 --- a/cv/detection/fcos/pytorch/README.md +++ b/cv/detection/fcos/pytorch/README.md @@ -1,16 +1,17 @@ -# FCOS: Fully Convolutional One-Stage Object Detection +# FCOS + ## Model description FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free object detection framework with strong performance. The full paper is available at: [https://arxiv.org/abs/1904.01355](https://arxiv.org/abs/1904.01355). -## install +## Install requirements ``` pip3 install -r requirements.txt python3 setup.py develop ``` -## datasets +## Prepare datasets ``` mkdir data cd data diff --git a/cv/detection/pp-yoloe/paddlepaddle/README.md b/cv/detection/pp-yoloe/paddlepaddle/README.md index b64c707293083e88d1ad82c81c5842bd2723e07f..e303e32e54a6f91c69cece6ffdc5261cd6588d3a 100644 --- a/cv/detection/pp-yoloe/paddlepaddle/README.md +++ b/cv/detection/pp-yoloe/paddlepaddle/README.md @@ -1,4 +1,5 @@ # PP-YOLOE + ## Model description PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware. @@ -36,4 +37,4 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ``` ## Reference -- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) \ No newline at end of file +- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) diff --git a/cv/detection/retinanet/paddlepaddle/README.md b/cv/detection/retinanet/paddlepaddle/README.md index 6cf0fe36f2e5c63897e35f2da0739c5f2ab65043..6a680eb6e03711f2c10fe0e360edfd43c8a01397 100644 --- a/cv/detection/retinanet/paddlepaddle/README.md +++ b/cv/detection/retinanet/paddlepaddle/README.md @@ -3,13 +3,13 @@ ## Model description The paper proposes a method to convert a deep learning object detector into an equivalent spiking neural network. The aim is to provide a conversion framework that is not constrained to shallow network structures and classification problems as in state-of-the-art conversion libraries. The results show that models of higher complexity, such as the RetinaNet object detector, can be converted with limited loss in performance. -## 克隆代码 +## Get PaddleDetection source code ``` git clone https://github.com/PaddlePaddle/PaddleDetection.git ``` -## 安装PaddleDetection +## Install PaddleDetection ``` cd PaddleDetection @@ -17,13 +17,13 @@ pip install -r requirements.txt python3 setup.py install ``` -## 下载COCO数据集 +## Prepare datasets ``` python3 dataset/coco/download_coco.py ``` -## 运行代码 +## Train ``` # GPU多卡训练 @@ -44,4 +44,4 @@ python3 tools/train.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --eval | GPUs | FPS | Train Epochs | Box AP | |------|-----|--------------|------| -| 1x8 | 6.58 | 12 | 37.3 | \ No newline at end of file +| 1x8 | 6.58 | 12 | 37.3 | diff --git a/cv/detection/ssd/tensorflow/README.md b/cv/detection/ssd/tensorflow/README.md index 0a130af1089b2aad31a20bdcb1143fa22136f65f..a36dbc7ce8f540acabe86677661c229035c78364 100644 --- a/cv/detection/ssd/tensorflow/README.md +++ b/cv/detection/ssd/tensorflow/README.md @@ -1,3 +1,11 @@ +# SSD + +## Model description + +We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd . + +## Prepare + ### Download the VOC dataset ``` cd dataset @@ -39,4 +47,4 @@ python3 train_ssd.py --batch_size 16 | | acc | fps | | --- | --- | --- | -| multi_card | 0.783513 | 3.177 | \ No newline at end of file +| multi_card | 0.783513 | 3.177 | diff --git a/cv/detection/yolov3/tensorflow/README.md b/cv/detection/yolov3/tensorflow/README.md index c7ab776c360c95ef509c6f7dbb33e2de748d5c05..4f1618d0c2d991d991e506213bbd64528693c4b6 100644 --- a/cv/detection/yolov3/tensorflow/README.md +++ b/cv/detection/yolov3/tensorflow/README.md @@ -1,11 +1,17 @@ +# YOLOv3 + +## Model description + +We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/. + ## Prepare + ``` bash init_tf.sh ``` ## Download dataset and checkpoint -## Download dataset and checkpoint ### Download VOC PASCAL trainval and test data ``` diff --git a/cv/detection/yolov7/pytorch/README.md b/cv/detection/yolov7/pytorch/README.md index d27d8037cba552c0e9ff96d61c4bea703f557345..4ec70a204ba2b587dfab40c2792c2c32ff65ea99 100644 --- a/cv/detection/yolov7/pytorch/README.md +++ b/cv/detection/yolov7/pytorch/README.md @@ -1,5 +1,6 @@ -# Official YOLOv7 +# YOLOv7 +## Model description Implementation of paper - [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696) ## Step 1: Installing packages @@ -74,4 +75,4 @@ python3 detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source infere ## Reference -https://github.com/WongKinYiu/yolov7 \ No newline at end of file +https://github.com/WongKinYiu/yolov7 diff --git a/cv/image_generation/dcgan/MindSpore/README.md b/cv/image_generation/dcgan/MindSpore/README.md index a12eb335f1e114481c765d2c66edf3b46d522ba6..b45be88341d363f88ee46027550a2586036fea09 100644 --- a/cv/image_generation/dcgan/MindSpore/README.md +++ b/cv/image_generation/dcgan/MindSpore/README.md @@ -1,5 +1,5 @@ - # DCGAN + ## Model description The deep convolutional generative adversarial networks (DCGANs) first introduced CNN into the GAN structure, and the strong feature extraction ability of convolution layer was used to improve the generation effect of GAN. diff --git a/cv/instance_segmentation/yolact/pytorch/README.md b/cv/instance_segmentation/yolact/pytorch/README.md index dc04d7924beafcdbff36ef6f8311b31f6ab3d72f..35b94871b095cc7c90aaa10b440f854d00a3a270 100644 --- a/cv/instance_segmentation/yolact/pytorch/README.md +++ b/cv/instance_segmentation/yolact/pytorch/README.md @@ -1,12 +1,6 @@ -# **Y**ou **O**nly **L**ook **A**t **C**oefficien**T**s -``` - ██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗ - ╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝ - ╚████╔╝ ██║ ██║██║ ███████║██║ ██║ - ╚██╔╝ ██║ ██║██║ ██╔══██║██║ ██║ - ██║ ╚██████╔╝███████╗██║ ██║╚██████╗ ██║ - ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ -``` +# YOLACT + +## Model description A simple, fully convolutional model for real-time instance segmentation. This is the code for papers: - [YOLACT: Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689) - [YOLACT++: Better Real-time Instance Segmentation](https://arxiv.org/abs/1912.06218) @@ -83,4 +77,4 @@ python3 train.py --config=yolact_base_config --batch_size 64 --lr 0.000125 | 550 | Resnet50-FPN |22.63 | ## Reference -https://github.com/dbolya/yolact \ No newline at end of file +https://github.com/dbolya/yolact diff --git a/cv/ocr/dbnet/pytorch/README.md b/cv/ocr/dbnet/pytorch/README.md index d92a6cfcab2e892ede674e6526ed1daab5f844ea..6f0bf3b73a327a7e524754362eaeb023a9751142 100755 --- a/cv/ocr/dbnet/pytorch/README.md +++ b/cv/ocr/dbnet/pytorch/README.md @@ -1,4 +1,5 @@ -# DBnet +# DBNet + ## Model description Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. ## Step 2: Preparing datasets diff --git a/cv/ocr/pse/paddlepaddle/README.md b/cv/ocr/pse/paddlepaddle/README.md index f9297786233b607eb2ff2ebbd24508afb440fb17..18e6078e3e7182e37cc97523c497fd33cbdf79a8 100644 --- a/cv/ocr/pse/paddlepaddle/README.md +++ b/cv/ocr/pse/paddlepaddle/README.md @@ -1,5 +1,8 @@ # PSE +## Model description +[Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473) Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai CVPR, 2019 + ## Step 1: Installing ```bash git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git diff --git a/cv/ocr/sar/pytorch/README.md b/cv/ocr/sar/pytorch/README.md index 1d792b27b92e049f8c1b386fda3a48b46be841ea..0535e3aa993d3c44fe0f28d7e6eb83d53e4d0f79 100755 --- a/cv/ocr/sar/pytorch/README.md +++ b/cv/ocr/sar/pytorch/README.md @@ -1,4 +1,4 @@ -# sar +# SAR ## Model description diff --git a/cv/semantic_segmentation/deeplabv3/MindSpore/README.md b/cv/semantic_segmentation/deeplabv3/MindSpore/README.md index e293df7fb5058d62fcccbbf04cf470a5d8192545..52aee14e11db1245a041b9365131e186019a6547 100755 --- a/cv/semantic_segmentation/deeplabv3/MindSpore/README.md +++ b/cv/semantic_segmentation/deeplabv3/MindSpore/README.md @@ -1,5 +1,5 @@ - # DeepLabV3 + ## Model description DeepLab is a series of image semantic segmentation models, DeepLabV3 improves significantly over previous versions. Two keypoints of DeepLabV3: Its multi-grid atrous convolution makes it better to deal with segmenting objects at multiple scales, and augmented ASPP makes image-level features available to capture long range information. This repository provides a script and recipe to DeepLabV3 model and achieve state-of-the-art performance. diff --git a/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md b/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md index a038e832cf750bb52217096758cab16a49fc4185..c23f2e0ae52536920ebea60d77b1ee3f3a5996e0 100644 --- a/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md +++ b/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md @@ -1,5 +1,8 @@ # dnlnet +## Model description +[Disentangled Non-Local Neural Networks](https://arxiv.org/abs/2006.06668) Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu + ## Step 1: Installing ``` diff --git a/cv/semantic_segmentation/vnet/tensorflow/README.md b/cv/semantic_segmentation/vnet/tensorflow/README.md index ca6535d847c43bf576c41117470a3d0c54bf7274..56c2d8cc5d4e336d20eeb6f62a66cf49f0aadefe 100644 --- a/cv/semantic_segmentation/vnet/tensorflow/README.md +++ b/cv/semantic_segmentation/vnet/tensorflow/README.md @@ -1,3 +1,7 @@ +# VNet + +## Model description +[V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation](https://arxiv.org/abs/1606.04797) Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi ## Prepare ``` @@ -26,4 +30,4 @@ python3 examples/vnet_train_and_evaluate.py --gpus 8 --batch_size 8 --base_lr 0. | | background_dice | anterior_dice | posterior_dice | | --- | --- | --- | --- | -| multi_card | 0.9912699 | 0.83743376 | 0.81537557 | \ No newline at end of file +| multi_card | 0.9912699 | 0.83743376 | 0.81537557 | diff --git a/cv/tracking/deep_sort/pytorch/README.md b/cv/tracking/deep_sort/pytorch/README.md index 98180bc42fae201fe01ffb08a40e7d48858d8d1c..5a263a014b16440286b120f5f17cf56322ddada4 100644 --- a/cv/tracking/deep_sort/pytorch/README.md +++ b/cv/tracking/deep_sort/pytorch/README.md @@ -1,4 +1,6 @@ -## Introduction +# DeepSORT + +## Model description This is an implement of MOT tracking algorithm deep sort. Deep sort is basicly the same with sort but added a CNN model to extract features in image of human part bounded by a detector. This CNN model is indeed a RE-ID model and the detector used in [PAPER](https://arxiv.org/abs/1703.07402) is FasterRCNN , and the original source code is [HERE](https://github.com/nwojke/deep_sort). However in original code, the CNN model is implemented with tensorflow, which I'm not familier with. SO I re-implemented the CNN feature extraction model with PyTorch, and changed the CNN model a little bit. Also, I use **YOLOv3** to generate bboxes instead of FasterRCNN. diff --git a/gnn/GCN/README.md b/gnn/GCN/README.md index f3d096bab12615c8602b10e1387488b3350f7ca0..8e66455df4ef32000684b8fcd172bae29a649ea2 100755 --- a/gnn/GCN/README.md +++ b/gnn/GCN/README.md @@ -1,4 +1,5 @@ # GCN + ## Model description GCN(Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured data. A scalable approach based on an efficient variant of convolutional neural networks which operate directly on graphs was presented. The model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. @@ -24,21 +25,22 @@ Note that you can run the scripts based on the dataset mentioned in original pap cd scripts bash train_gcn_1p.sh ``` -### [Evaluation] +## Evaluation ```bash cd .. python3 eval.py --data_dir=scripts/data_mr/cora --device_target="GPU" --model_ckpt scripts/train/ckpt/ckpt_gcn-200_1.ckpt &> eval.log & ``` -### [Evaluation result] -### 性能数据:BI -## Results on BI-V100 + +## Evaluation result + +### Results on BI-V100 | GPUs | per step time | Acc | |------|-------------- |-------| | 1 | 4.454 | 0.8711| -### 性能数据:NV -## Results on NV-V100s + +### Results on NV-V100s | GPUs | per step time | Acc | |------|-------------- |-------| diff --git a/multimodal/diffusion/stable-diffusion/training/README.md b/multimodal/diffusion/stable-diffusion/training/README.md index 3ad91a916d4e6aa0ed40112ae6e0ee6cbd5e6b60..5692785588a419aca16bf35a0591df54bb57464c 100755 --- a/multimodal/diffusion/stable-diffusion/training/README.md +++ b/multimodal/diffusion/stable-diffusion/training/README.md @@ -1,16 +1,17 @@ # Stable Diffusion +## Model description Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI, LAION and RunwayML. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM. See the model card for more information. -# Training +## Prepare -## setup env +### Setup env ```bash pip3 install -r requirements.txt ``` -## download +### Download ```bash $ wget http://10.150.9.95/swapp/datasets/multimodal/stable_diffusion/pokemon-images.zip @@ -19,9 +20,9 @@ $ wget http://10.150.9.95/swapp/pretrained/multimodal/stable-diffusion/stable-di $ unzip stable-diffusion-v1-4.zip ``` -## train +## Train -### step 1 使用accelerate初始化训练环境 +### step 1 使用accelerate初始化训练环境 ```bash accelerate config # 这里可以选择单卡或者多卡训练 @@ -44,10 +45,11 @@ multi-gpu accelerate launch --mixed_precision="fp16" train_text_to_image.py --pretrained_model_name_or_path=./stable-diffusion-v1-4 --use_ema --resolution=512 --center_crop --random_flip --train_batch_size=1 --gradient_accumulation_steps=4 --gradient_checkpointing --max_train_steps=15000 --learning_rate=1e-05 --max_grad_norm=1 --lr_scheduler="constant" --lr_warmup_steps=0 --output_dir="sd-pokemon-model" --caption_column 'additional_feature' --train_data_dir pokemon-images/datasets/images/train ``` -## test +## Test ```bash python3 test.py ``` prompt:A pokemon with green eyes and red legs -result: + +## Result ![image](IMG/pokemon.png) diff --git a/nlp/language_model/bert/MindSpore/README.md b/nlp/language_model/bert/MindSpore/README.md index 6b5e32c6cadb65b359d097d0b7ae6e3dbda7369d..ad49d268befb40bb420b3db0cca0ae282356b2b4 100644 --- a/nlp/language_model/bert/MindSpore/README.md +++ b/nlp/language_model/bert/MindSpore/README.md @@ -1,4 +1,5 @@ # BERT + ## Model description The BERT network was proposed by Google in 2018. The network has made a breakthrough in the field of NLP. The network uses pre-training to achieve a large network structure without modifying, and only by adding an output layer to achieve multiple text-based tasks in fine-tuning. The backbone code of BERT adopts the Encoder structure of Transformer. The attention mechanism is introduced to enable the output layer to capture high-latitude global semantic information. The pre-training uses denoising and self-encoding tasks, namely MLM(Masked Language Model) and NSP(Next Sentence Prediction). No need to label data, pre-training can be performed on massive text data, and only a small amount of data to fine-tuning downstream tasks to obtain good results. The pre-training plus fune-tuning mode created by BERT is widely adopted by subsequent NLP networks. diff --git a/nlp/language_model/bert/tensorflow/base/README.md b/nlp/language_model/bert/tensorflow/base/README.md index b3378ee7c84a78f0c8b2f1e65421b6148804832d..a77ae4cec2e61233508fcc956d835fad549a17a3 100644 --- a/nlp/language_model/bert/tensorflow/base/README.md +++ b/nlp/language_model/bert/tensorflow/base/README.md @@ -1,3 +1,7 @@ +# BERT Pretraining + +## Model description +BERT, or Bidirectional Encoder Representations from Transformers, improves upon standard Transformers by removing the unidirectionality constraint by using a masked language model (MLM) pre-training objective. The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. Unlike left-to-right language model pre-training, the MLM objective enables the representation to fuse the left and the right context, which allows us to pre-train a deep bidirectional Transformer. In addition to the masked language model, BERT uses a next sentence prediction task that jointly pre-trains text-pair representations. ## Prepare @@ -42,4 +46,4 @@ bash run_multi_card_FPS.sh | | acc | fps | | --- | --- | --- | -| multi_card | 0.424126 | 0.267241| \ No newline at end of file +| multi_card | 0.424126 | 0.267241| diff --git a/nlp/ner/bert/pytorch/README.md b/nlp/ner/bert/pytorch/README.md index 258e95f877b56b41dddef2aac8d1e8bbee247fa0..c528ec1b09fa3cdc45f435d56394017dc26b4b23 100644 --- a/nlp/ner/bert/pytorch/README.md +++ b/nlp/ner/bert/pytorch/README.md @@ -1,8 +1,8 @@ -# Bert-base ner +# BERT NER ## Model description -Bert-base ner task Fine-tuning +BERT-base NER task Fine-tuning ## Step 1: Installing packages @@ -32,4 +32,4 @@ bash run_dist.sh | 1x8 | 252 | 0.0688 | ## Reference -https://github.com/huggingface/ \ No newline at end of file +https://github.com/huggingface/ diff --git a/nlp/question_answering/bert/pytorch/README.md b/nlp/question_answering/bert/pytorch/README.md index f2a99c7a57d0cd5bf182c980879560ca1d003927..5adbdb524303e48bdb00c9d94303819d87668813 100644 --- a/nlp/question_answering/bert/pytorch/README.md +++ b/nlp/question_answering/bert/pytorch/README.md @@ -1,8 +1,8 @@ -# Bert-base squad +# BERT Question Answering ## Model description -Bert-base squad task Fine-tuning +BERT-base SQuAD task Fine-tuning ## Step 1: Installing packages diff --git a/nlp/text_classification/bert/pytorch/README.md b/nlp/text_classification/bert/pytorch/README.md index c82b996d19e04b68a6c9e98d74af3b18315f4ff1..f1c8e77a6e8939a74178acf265dcdd16b07e8cdc 100644 --- a/nlp/text_classification/bert/pytorch/README.md +++ b/nlp/text_classification/bert/pytorch/README.md @@ -1,10 +1,8 @@ -# Text Classification - -# Bert-base WNLI +# BERT Text Classification ## Model description -Bert-base WNLI task Fine-tuning +BERT-base WNLI task Fine-tuning ## Step 1: Installing packages diff --git a/nlp/text_summarisation/bert/pytorch/README.md b/nlp/text_summarisation/bert/pytorch/README.md index e26daf415e5a2b45eb3abda27cb6f7107e660ee8..91358b787f4317cf9f6a48e57cceff594e3d9e43 100644 --- a/nlp/text_summarisation/bert/pytorch/README.md +++ b/nlp/text_summarisation/bert/pytorch/README.md @@ -1,8 +1,8 @@ -# Bert-base summarization +# BERT Text Summarization ## Model description -Bert-base summarization task Fine-tuning +BERT-base summarization task Fine-tuning ## Step 1: Installing packages