# ScanSSD **Repository Path**: macqueen/ScanSSD ## Basic Information - **Project Name**: ScanSSD - **Description**: ScanSSD - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-05 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ScanSSD: Scanning Single Shot Detector for Math in Document Images A [PyTorch](http://pytorch.org/) implementation of ScanSSD [Scanning Single Shot MultiBox Detector](https://paragmali.me/scanning-single-shot-detector-for-math-in-document-images/) by [**Parag Mali**](https://github.com/MaliParag/). It was developed using SSD implementation by [**Max deGroot**](https://github.com/amdegroot). Developed using Cuda 9.1.85 and Pytorch 1.1.0     ## Table of Contents - Installation - Code Organization - Training - Testing - Performance     ## Installation - Install [PyTorch](http://pytorch.org/) - Clone this repository. Requires Python3 - Download the dataset by following the instructions on (https://github.com/MaliParag/TFD-ICDAR2019). - Install [Visdom](https://github.com/facebookresearch/visdom) for real-time loss visualization during training! * To use Visdom in the browser: ```Shell # First install Python server and client pip install visdom # Start the server (probably in a screen or tmux) python -m visdom.server ``` * Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details). ## Code Organization SSD model is built in `ssd.py`. Training and testing the SSD is managed in `train.py` and `test.py`. All the training code is in `layers` directory. Hyper-parameters for training and testing can be specified through command line and through `config.py` file inside `data` directory. `data` directory also contains `gtdb_new.py` data reader that uses sliding windows to generates sub-images of page for training. All the scripts regarding stitching the sub-image level detections are in `gtdb` directory. Functions for data augmentation, visualization of bounding boxes and heatmap are in `utils`. ## Setting up data for training If you are not sure how to setup data, use [dir_struct](https://github.com/MaliParag/ScanSSD/blob/master/dir_struct) file. It has the one of the possible directory structure that you can use for setting up data for training and testing. To generate .pmath files or .pchar files you can use [this](https://github.com/MaliParag/ScanSSD/blob/master/gtdb/split_annotations_per_page.py) script. ## Training ScanSSD - First download the fc-reduced [VGG-16](https://arxiv.org/abs/1409.1556) PyTorch base network weights [here](https://drive.google.com/file/d/1GqiyZ1TglNW5GrNQfXQ72S8mChhJ4_sD/view?usp=sharing) - By default, we assume you have downloaded the file in the `scanssd/weights` dir: - Run command ```Shell python3 train.py --dataset GTDB --dataset_root ~/data/GTDB/ --cuda True --visdom True --batch_size 16 --num_workers 4 --exp_name IOU512_iter1 --model_type 512 --training_data training_data --cfg hboxes512 --loss_fun ce --kernel 1 5 --padding 0 2 --neg_mining True --pos_thresh 0.75 ``` - Note: * For training, an NVIDIA GPU is strongly recommended for speed. * For instructions on Visdom usage/installation, see the Installation section. * You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see `train.py` for options) ## Pre-Trained weights For quick testing, pre-trained weights are available [here.](https://drive.google.com/file/d/1bGNvg9uLCTbVE9hk1yWE-2tLgX1eg_me/view?usp=sharing) ## Testing To test a trained network: ```Shell python3 test.py --dataset_root ../ --trained_model HBOXES512_iter1GTDB.pth --visual_threshold 0.25 --cuda True --exp_name test_real_world_iter1 --test_data testing_data --model_type 512 --cfg hboxes512 --padding 3 3 --kernel 1 1 --batch_size 8 ``` You can specify the parameters listed in the `eval.py` file by flagging them or manually changing them. ## Stitching the patch level results ``` python3 /ssd/gtdb/stitch_patches_pdf.py --data_file /train_pdf --output_dir /ssd/eval/stitched_HBOXES512_e4/ --math_dir /ssd/eval/test_HBOXES512_e4/ --stitching_algo equal --algo_threshold 30 --num_workers 8 --postprocess True --home_images /images/ ``` math_dir is output dir generated by test.py output_dir is where you want to generate the final output ## Evaluate ``` python3 /ICDAR2019/TFD-ICDAR2019v2/Evaluation/IOULib/IOUevaluater.py --ground_truth /ICDAR2019/TFD-ICDAR2019v2/Train/math_gt/ --detections /ssd/eval/stitched_HBOXES512_e4/ ``` ## Performance #### TFD-ICDAR 2019 Version1 Test | Metric | Precision | Recall | F-score | |:-:|:-:|:-:|:-:| | IOU50 | 85.05 % | 75.85% | 80.19% | | IOU75 | 77.38 % | 69.01% | 72.96% | ##### FPS **GTX 1080:** ~27 FPS for 512 * 512 input images ## Related publications Mali, Parag, et al. “ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images.” ArXiv:2003.08005 [Cs], Mar. 2020. arXiv.org, http://arxiv.org/abs/2003.08005. P. S. Mali, ["Scanning Single Shot Detector for Math in Document Images."](https://scholarworks.rit.edu/theses/10210/) Order No. 22622391, Rochester Institute of Technology, Ann Arbor, 2019. M. Mahdavi, R. Zanibbi, H. Mouchere, and Utpal Garain (2019). [ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection.](https://www.cs.rit.edu/~rlaz/files/CROHME+TFD%E2%80%932019.pdf) Proc. International Conference on Document Analysis and Recognition, Sydney, Australia (to appear). ## Acknowledgements - [**Max deGroot**](https://github.com/amdegroot) for providing open-source SSD code