# TextBoxes_plusplus

**Repository Path**: celllab/TextBoxes_plusplus

## Basic Information

- **Project Name**: TextBoxes_plusplus
- **Description**: TextBoxes++: A Single-Shot Oriented Scene Text Detector
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2019-12-05
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# TextBoxes++: A Single-Shot Oriented Scene Text Detector

### Introduction
This is an application for scene text detection (TextBoxes++) and recognition (CRNN).

TextBoxes++ is a unified framework for oriented scene text detection with a single network. It is an extended work of [TextBoxes](https://github.com/MhLiao/TextBoxes). [CRNN](https://github.com/bgshih/crnn) is an open-source text recognizer. 
The code of TextBoxes++ is based on [SSD](https://github.com/weiliu89/caffe/tree/ssd) and [TextBoxes](https://github.com/MhLiao/TextBoxes). The code of CRNN is modified from [CRNN](https://github.com/bgshih/crnn).


For more details, please refer to our [arXiv paper](https://arxiv.org/abs/1801.02765). 
### Citing the related works

Please cite the related works in your publications if it helps your research:

    @article{Liao2018Text,
      title = {{TextBoxes++}: A Single-Shot Oriented Scene Text Detector},
      author = {Minghui Liao, Baoguang Shi and Xiang Bai},
      journal = {{IEEE} Transactions on Image Processing},
      doi  = {10.1109/TIP.2018.2825107},
      url = {https://doi.org/10.1109/TIP.2018.2825107},
      volume = {27},
      number = {8},
      pages = {3676--3690},
      year = {2018}
    }
    
    @inproceedings{LiaoSBWL17,
      author    = {Minghui Liao and
                   Baoguang Shi and
                   Xiang Bai and
                   Xinggang Wang and
                   Wenyu Liu},
      title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
      booktitle = {AAAI},
      year      = {2017}
    }
    
    @article{ShiBY17,
      author    = {Baoguang Shi and
                   Xiang Bai and
                   Cong Yao},
      title     = {An End-to-End Trainable Neural Network for Image-Based Sequence Recognition
                   and Its Application to Scene Text Recognition},
      journal   = {{IEEE} TPAMI},
      volume    = {39},
      number    = {11},
      pages     = {2298--2304},
      year      = {2017}
    }

### Contents

1. [Requirements](#requirements)
2. [Installation](#installation)
3. [Docker](#docker)
4. [Models](#models)
5. [Demo](#demo)
6. [Train](#train)


### Requirements

**NOTE** There is partial support for a docker image. See `docker/README.md`. (Thank you for the PR from [@mdbenito](https://github.com/mdbenito))

    Torch7 for CRNN; 
    g++-5; cuda8.0; cudnn V5.1 (cudnn 6 and cudnn 7 may fail); opencv3.0
  
Please refer to [Caffe Installation](http://caffe.berkeleyvision.org/install_apt.html) to ensure other dependencies;


### Installation

1. compile TextBoxes++ (This is a modified version of caffe so you do not need to install the official caffe)
  ```Shell
  # Modify Makefile.config according to your Caffe installation.
  cp Makefile.config.example Makefile.config
  make -j8
  # Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
  make py
  ```
2. compile CRNN (Please refer to [CRNN](https://github.com/bgshih/crnn) if you have trouble with the compilation.)
  ```Shell
  cd crnn/src/
  sh build_cpp.sh
  ```

### Docker
(Thanks for the PR from [@idotobi](https://github.com/idotobi))

Build Docke Image

    docker build -t tbpp_crnn:gpu .

This can take +1h, so go get a coffee ;)

Once this is done you can start a container via `nvidia-docker`. 

    nvidia-docker run -it --rm tbpp_crnn:gpu bash

To check if the GPU is available inside the docker container you can run `nvidia-smi`.

It's recommendable to mount the `./models` and `./crnn/model/` directories to include the downloaded [models](#models).

    nvidia-docker run -it \
                      --rm \
                      -v ${PWD}/models:/opt/caffe/models \ 
                      -v ${PWD}/crrn/model:/opt/caffe/crrn/model \
                      tbpp_crnn:gpu bash

For convenince this command is executed when running `./run.bash`.
  

### Models

1. pre-trained model on SynthText (used for training):
[Dropbox](https://www.dropbox.com/s/kpv17f3syio95vn/model_pre_train_syn.caffemodel?dl=0); 
[BaiduYun](https://pan.baidu.com/s/1htV2j4K)

2. model trained on ICDAR 2015 Incidental Text (used for testing):
[Dropbox](https://www.dropbox.com/s/9znpiqpah8rir9c/model_icdar15.caffemodel?dl=0); 
[BaiduYun](https://pan.baidu.com/s/1bqekTun)
    
    Please place the above models in "./models/"
    
    If your data is hugely different from ICDAR 2015 Incidental Text，you'd better train it on your own data based on the pre-trained model on SynthText.

3. CRNN model:
[Dropbox](https://www.dropbox.com/s/kmi62qxm9z08o6h/model_crnn.t7?dl=0);
[BaiduYun](https://pan.baidu.com/s/1jJwmneI)

    Please place the crnn model in "./crnn/model/"


### Demo 

Download the ICDAR 2015 model and place it in "./models/"
  ```Shell
  python examples/text/demo.py
  ```
The detection results and recognition results are in "./demo_images"

### Train

#### Create lmdb data

1. convert ground truth into "xml" form: [example.xml](./data/example.xml)
    
2. create train/test lists (train.txt / test.txt) in "./data/text/" with the following form: 

        path_to_example1.jpg path_to_example1.xml
        path_to_example2.jpg path_to_example2.xml
            
3. Run "./data/text/creat_data.sh"
    
#### Start training
    
    1. modify the lmdb path in modelConfig.py
    2. Run "python examples/text/train.py"