# sar-pytorch **Repository Path**: wuzetian/sar-pytorch ## Basic Information - **Project Name**: sar-pytorch - **Description**: ResNet + 2D attentional LSTM的文本识别 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2020-12-11 - **Last Updated**: 2022-05-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Show, Attend and Read - A PyTorch Implementation Implementation of Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition in AAAI 2019, with PyTorch >= v1.4.0. ## Task - [x] Backbone model - [x] Encoder model - [x] Decoder model - [x] Integrated model - [x] Data processing - [x] Training pipeline - [x] Inference pipeline ## Supported Dataset - [x] Street View Text: http://vision.ucsd.edu/~kai/svt/ - [x] IIIT5K: https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset - [x] Syn90k: https://www.robots.ox.ac.uk/~vgg/data/text/ - [x] SynthText: https://www.robots.ox.ac.uk/~vgg/data/scenetext/ ## Command ### Training `` python train.py --batch 32 --epoch 5000 --dataset ./svt --dataset_type svt --gpu True `` ### Inference `` python inference.py --batch 32 --input input_folder --model model_path --gpu True `` ## Results ### SVT ![Statstics for SVT training](https://github.com/liuch37/sar-pytorch/blob/master/misc/svt_results.png) ### IIIT5K ![Statstics for IIIT5K training](https://github.com/liuch37/sar-pytorch/blob/master/misc/iiit5k_results.png) Input: ![Input image](https://github.com/liuch37/sar-pytorch/blob/master/misc/iiit_0.jpg) Output attention map per character: ![Attention map for char 0](https://github.com/liuch37/sar-pytorch/blob/master/misc/iiit_0_0.png) ![Attention map for char 1](https://github.com/liuch37/sar-pytorch/blob/master/misc/iiit_0_1.png) ### Syn90K (10k for training/3k for testing) ![Statstics for Syn90K training](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_results.png) Input: ![Input image](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0.jpg) Output attention map per character: ![Attention map for char 0](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_0.png) ![Attention map for char 1](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_1.png) ![Attention map for char 2](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_2.png) ![Attention map for char 3](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_3.png) ![Attention map for char 4](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_4.png) ![Attention map for char 5](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_5.png) ![Attention map for char 6](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_6.png) ![Attention map for char 7](https://github.com/liuch37/sar-pytorch/blob/master/misc/syn90k_0_7.png) ### SynthText (80k for training/20k for testing) ![Statstics for SynthText training](https://github.com/liuch37/sar-pytorch/blob/master/misc/synthtext_results.png) Input: ![Input image](https://github.com/liuch37/sar-pytorch/blob/master/misc/synthtext_0.jpg) Output attention map per character: ![Attention map for char 0](https://github.com/liuch37/sar-pytorch/blob/master/misc/synthtext_0_0.png) ![Attention map for char 1](https://github.com/liuch37/sar-pytorch/blob/master/misc/synthtext_0_1.png) ![Attention map for char 2](https://github.com/liuch37/sar-pytorch/blob/master/misc/synthtext_0_2.png) ## Source [1] Original paper: https://arxiv.org/abs/1811.00751 [2] Official code by the authors in torch: https://github.com/wangpengnorman/SAR-Strong-Baseline-for-Text-Recognition [3] A TensorFlow implementation: https://github.com/Pay20Y/SAR_TF