# AttentionOCR **Repository Path**: atari/AttentionOCR ## Basic Information - **Project Name**: AttentionOCR - **Description**: 同步 https://github.com/zhang0jhon/AttentionOCR - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2020-08-07 - **Last Updated**: 2021-03-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AttentionOCR for Arbitrary-Shaped Scene Text Recognition ## Introduction This is the **ranked No.1** tensorflow based scene text spotting algorithm on [__ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text__](https://rrc.cvc.uab.es/?ch=14) (Latin Only, Latin and Chinese), futhermore, the algorithm is also adopted in [__ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling__](https://rrc.cvc.uab.es/?ch=16) and [__ICDAR2019 Robust Reading Challenge on Reading Chinese Text on Signboard__](https://rrc.cvc.uab.es/?ch=12). Scene text detection algorithm is modified from [__Tensorpack FasterRCNN__](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN), and we only open source code in this repository for scene text recognition. I upload ICDAR2019 ArT competition model to docker hub, please refer to [Docker](#Docker). For more details, please refer to our [__arXiv technical report__](https://arxiv.org/abs/1912.04561). Our text recognition algorithm not only recognizes Latin and Non-Latin characters, but also supports horizontal and vertical text recognition in one model. It is convenient for multi-lingual arbitrary-shaped text recognition. **Note that the competition model in docker container as described in [__our technical report__](https://arxiv.org/abs/1912.04561) is slightly different from the recognition model trained from this updated repository.** ## Dependencies ``` python 3 tensorflow-gpu 1.14 tensorpack 0.9.8 pycocotools ``` ## Usage First download and extract multiple text datasets in base text dir, please refer to dataset.py for dataset preprocess and multiple datasets. ### Multiple Datasets ``` $(base_dir)/lsvt $(base_dir)/art $(base_dir)/rects $(base_dir)/icdar2017rctw ``` You can also synthesize text recognition data for data augmentation, please refer to [__TextRecognitionDataGenerator__](https://github.com/Belval/TextRecognitionDataGenerator). It is helpful for long text recognition and attention-based language model because you can directly synthesize text images from NLP corpus. Then you should rewrite dataset.py for synthetic text dataset. ``` $(base_dir)/synthetic_text ``` ### Train First, download pretrained [__inception v4 checkpoint__](https://github.com/tensorflow/models/tree/master/research/slim) and put it in ./pretrain folder. Then you can modify your gpu lists in config.py for specified gpus and then run: ``` python train.py ``` You can visualize your training steps via tensorboard: ``` tensorboard --logdir='./checkpoint' ``` ![](imgs/loss.png) Use ICDAR2019-LSVT, ICDAR2019-ArT, ICDAR2019-ReCTS for default training, you can change it with your own training data. ### Evaluation ``` python eval.py --checkpoint_path=$(Your model path) ``` Use ICDAR2017RCTW for default evaluation with Normalized Edit Distance metric(1-N.E.D specifically), you can change it with your own evaluation data. ### Export Export checkpoint to tensorflow pb model for inference. ``` python export.py --pb_path=$(Your tensorflow pb model save path) --checkpoint_path=$(Your trained model path) ``` ### Test Load tensorflow pb model for text recognition. ``` python test.py --pb_path=$(Your tensorflow pb model save path) --img_folder=$(Your test img folder) ``` Default use ICDAR2019-ArT for test, you can change it with your own test data. ## Visualization Scene text detection and recognition result: ![](imgs/viz.png) Scene text recognition attention maps: ![](imgs/attention_maps_gt_1.jpg) ![](imgs/attention_maps_gt_8454.jpg) To learn more about attention mechanism, please refer to [Attention Mechanism in Deep Learning](reference/Attention_Mechanism_in_Deep_Learning.pdf). ## Docker I upload ICDAR2019 scene text recognition model include text detection and recognition to [__Docker Hub__](https://hub.docker.com/repository/docker/zhang0jhon/demo). After [__nvidia-docker__](https://github.com/NVIDIA/nvidia-docker) installed, run: ``` docker pull zhang0jhon/demo:ocr docker run -it -p 5000:5000 --gpus all zhang0jhon/demo:ocr bash cd /ocr/ocr python flaskapp.py ``` Then you can test with your data via browser: ``` $(localhost or remote server ip address):5000 ``` ![](imgs/web.png)