# ScanSSD
**Repository Path**: macqueen/ScanSSD
## Basic Information
- **Project Name**: ScanSSD
- **Description**: ScanSSD
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-08-05
- **Last Updated**: 2020-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# ScanSSD: Scanning Single Shot Detector for Math in Document Images
A [PyTorch](http://pytorch.org/) implementation of ScanSSD [Scanning Single Shot MultiBox Detector](https://paragmali.me/scanning-single-shot-detector-for-math-in-document-images/) by [**Parag Mali**](https://github.com/MaliParag/). It was developed using SSD implementation by [**Max deGroot**](https://github.com/amdegroot).
Developed using Cuda 9.1.85 and Pytorch 1.1.0
## Table of Contents
- Installation
- Code Organization
- Training
- Testing
- Performance
## Installation
- Install [PyTorch](http://pytorch.org/)
- Clone this repository. Requires Python3
- Download the dataset by following the instructions on (https://github.com/MaliParag/TFD-ICDAR2019).
- Install [Visdom](https://github.com/facebookresearch/visdom) for real-time loss visualization during training!
* To use Visdom in the browser:
```Shell
# First install Python server and client
pip install visdom
# Start the server (probably in a screen or tmux)
python -m visdom.server
```
* Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details).
## Code Organization
SSD model is built in `ssd.py`. Training and testing the SSD is managed in `train.py` and `test.py`. All the training code is in `layers` directory. Hyper-parameters for training and testing can be specified through command line and through `config.py` file inside `data` directory.
`data` directory also contains `gtdb_new.py` data reader that uses sliding windows to generates sub-images of page for training. All the scripts regarding stitching the sub-image level detections are in `gtdb` directory.
Functions for data augmentation, visualization of bounding boxes and heatmap are in `utils`.
## Setting up data for training
If you are not sure how to setup data, use [dir_struct](https://github.com/MaliParag/ScanSSD/blob/master/dir_struct) file. It has the one of the possible directory structure that you can use for setting up data for training and testing.
To generate .pmath files or .pchar files you can use [this](https://github.com/MaliParag/ScanSSD/blob/master/gtdb/split_annotations_per_page.py) script.
## Training ScanSSD
- First download the fc-reduced [VGG-16](https://arxiv.org/abs/1409.1556) PyTorch base network weights [here](https://drive.google.com/file/d/1GqiyZ1TglNW5GrNQfXQ72S8mChhJ4_sD/view?usp=sharing)
- By default, we assume you have downloaded the file in the `scanssd/weights` dir:
- Run command
```Shell
python3 train.py
--dataset GTDB
--dataset_root ~/data/GTDB/
--cuda True
--visdom True
--batch_size 16
--num_workers 4
--exp_name IOU512_iter1
--model_type 512
--training_data training_data
--cfg hboxes512
--loss_fun ce
--kernel 1 5
--padding 0 2
--neg_mining True
--pos_thresh 0.75
```
- Note:
* For training, an NVIDIA GPU is strongly recommended for speed.
* For instructions on Visdom usage/installation, see the Installation section.
* You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see `train.py` for options)
## Pre-Trained weights
For quick testing, pre-trained weights are available [here.](https://drive.google.com/file/d/1bGNvg9uLCTbVE9hk1yWE-2tLgX1eg_me/view?usp=sharing)
## Testing
To test a trained network:
```Shell
python3 test.py
--dataset_root ../
--trained_model HBOXES512_iter1GTDB.pth
--visual_threshold 0.25
--cuda True
--exp_name test_real_world_iter1
--test_data testing_data
--model_type 512
--cfg hboxes512
--padding 3 3
--kernel 1 1
--batch_size 8
```
You can specify the parameters listed in the `eval.py` file by flagging them or manually changing them.
## Stitching the patch level results
```
python3 /ssd/gtdb/stitch_patches_pdf.py
--data_file /train_pdf
--output_dir /ssd/eval/stitched_HBOXES512_e4/
--math_dir /ssd/eval/test_HBOXES512_e4/
--stitching_algo equal
--algo_threshold 30
--num_workers 8
--postprocess True
--home_images /images/
```
math_dir is output dir generated by test.py
output_dir is where you want to generate the final output
## Evaluate
```
python3 /ICDAR2019/TFD-ICDAR2019v2/Evaluation/IOULib/IOUevaluater.py
--ground_truth /ICDAR2019/TFD-ICDAR2019v2/Train/math_gt/
--detections /ssd/eval/stitched_HBOXES512_e4/
```
## Performance
#### TFD-ICDAR 2019 Version1 Test
| Metric | Precision | Recall | F-score |
|:-:|:-:|:-:|:-:|
| IOU50 | 85.05 % | 75.85% | 80.19% |
| IOU75 | 77.38 % | 69.01% | 72.96% |
##### FPS
**GTX 1080:** ~27 FPS for 512 * 512 input images
## Related publications
Mali, Parag, et al. “ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images.” ArXiv:2003.08005 [Cs], Mar. 2020. arXiv.org, http://arxiv.org/abs/2003.08005.
P. S. Mali, ["Scanning Single Shot Detector for Math in Document Images."](https://scholarworks.rit.edu/theses/10210/) Order No. 22622391, Rochester Institute of Technology, Ann Arbor, 2019.
M. Mahdavi, R. Zanibbi, H. Mouchere, and Utpal Garain (2019). [ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection.](https://www.cs.rit.edu/~rlaz/files/CROHME+TFD%E2%80%932019.pdf) Proc. International Conference on Document Analysis and Recognition, Sydney, Australia (to appear).
## Acknowledgements
- [**Max deGroot**](https://github.com/amdegroot) for providing open-source SSD code