# labelme_converter **Repository Path**: monkeycc/labelme_converter ## Basic Information - **Project Name**: labelme_converter - **Description**: LabelMe to MsCOCO, PascalVOC, Yolo https://github.com/VladimirSinitsin/labelme_converter - **Primary Language**: Python - **License**: GPL-3.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-04-01 - **Last Updated**: 2023-06-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README English | [Русский](README_ru.md) ![GitHub top language](https://img.shields.io/github/languages/top/VladimirSinitsin/labelme_converter) ![GitHub](https://img.shields.io/github/license/VladimirSinitsin/labelme_converter) ![GitHub search hit counter](https://img.shields.io/github/search/VladimirSinitsin/labelme_converter/goto?color=gree) # LABELME-CONVERTER _**Converter LabelMe to MsCOCO, PascalVOC, Yolo formats. Also, possible: splitting dataset into training, test and validation samples; augmentation of converted dataset; output and recording of statistics on images; transformation of the whole dataset to a single resolution.**_ ``` Installing the necessary packages >> pip install -r requirements.txt ``` The list of classes in the order they appear in the statistics is written in the `.txt` file (by default in `labels.txt`). The name of the file name is specified in `config.py`. ## Converting ### Start The file `labelme_converter.py` with arguments is used for converting: - `--input` - directory with dataset in LabelMe format - `--output` - (_optional_) directory for storing converted dataset (default: `./current_dataset`) - `--format` - format for conversion (`yolo`, `voc`, `coco`) - `--create-marked` - (_optional_) indicated if you need pictures with the markup visualization - `--poly` - (_optional_) specified if the visualization should be with polygons (not rectangles) ### Example ``` Converting dataset from ./meters01_labelme to Yolo format with creation of markup visualization >> python labelme_converter.py --input meters01_labelme --output current_dataset --format yolo --create-marked Input path: labelme Output path: current_dataset Reading labelme files: 100%|██████████████████████████████████████████| 24/24 [00:00<00:00, 98.69it/s] Create marked images: 100%|██████████████████████████████████████████| 24/24 [00:00<00:00, 50.44it/s] Converting labelme to Yolo: 100%|██████████████████████████████████████████| 24/24 [00:00<00:00, 8058.87it/s] ``` ## Split on train, test, val and trainval ### Start The file `split_dataset.py` with arguments is used for partitioning: - `--input` - directory with dataset in one of the partitioning formats: MsCOCO, PascalVOC, Yolo - `--output` - (_optional_) directory for storing a split dataset (default: `./splitted_dataset`) - `--train` - percentage of the total number of pictures which should get into the training set - `--test` - the percentage of the total number of pictures, which should go into the test set - `--val` - the percentage of the total number of pictures that should go into the validation set - `--seed` - (_optional_) value of the initial number of the random number generator (default: `42`) ### Example ``` Partitioning dataset from ./current_dataset in the ratio 80/15/5 with seed equal to 101 >> python split_dataset.py --input current_dataset --train 80 --test 15 --val 5 --seed 101 Dataset on current_dataset is being split! Output split dataset path: splitted_dataset Dataset is splitted! ``` ## Dataset augmentation ### Start The file `augment_dataset.py` with arguments is used for augmentation: - `--input` - a directory with a dataset in one of the markup formats: MsCOCO, PascalVOC, Yolo (dataset can be split) - `--output` - (_optional_) directory for storing a partitioned dataset (default: `./augmented_dataset`) - `--full` - (_optional_) if you augment all sets in a split dataset, just specify this argument - `--train` - (_optional_) augmenting a training set in a split dataset - `--test` - (_optional_) augmentation of a test set in a split dataset - `--val` - (_optional_) augmentation of validation set in a split dataset - `--count` - number of augmented copies of one image ### Examples ``` Augmentation of the entire unsplit set stored in ./current_dataset with 3 copies of each image >> python augment_dataset.py --input current_dataset --count 3 Augmentation train_test_val set: 100%|██████████████████████████████████████████| 24/24 [00:06<00:00, 3.52it/s] (meters) ``` ``` Augmentation of training and test sets from a split dataset (trainval set changes) >> python augment_dataset.py --input splitted_dataset --train --test --count 5 Augmentation train set: 100%|██████████████████████████████████████████| 19/19 [00:07<00:00, 2.50it/s] Augmentation test set: 100%|██████████████████████████████████████████| 3/3 [00:01<00:00, 2.62it/s] ``` ## Creating statistics on datasets ### Start The file `statistics_dataset.py` with arguments is used to output and record statistics: - `--input` - a directory with a dataset in one of the markup formats: MsCOCO, PascalVOC, Yolo (dataset can be split or augmented) - `--save` - (_optional_) if you want to save statistics to a file (`stat.txt`) - `--save_path` - (_optional_) if you want to save the file to a specific directory (by default it is saved in the dataset directory) ### Examples ``` Display statistics for a converted dataset with saving to the file >> python statistics_dataset.py --input current_dataset --save +------------------------------------------------------------------+ | train_test_val | count of images: 24 | +---------+-------------------+-----------+------------+-----------+ | class | number of objects | avg_width | avg_height | avg_area | +---------+-------------------+-----------+------------+-----------+ | meter | 24 | 410.25 | 417.42 | 182052.79 | | value | 24 | 191.29 | 51.21 | 11516.71 | | seal2 | 29 | 82.45 | 72.07 | 7140.14 | | model | 17 | 119.24 | 26.94 | 3488.65 | | serial | 23 | 115.52 | 18.70 | 2652.00 | | seal | 23 | 164.30 | 154.57 | 24377.17 | | mag | 18 | 34.50 | 38.56 | 1516.61 | | breaker | 16 | 155.88 | 196.06 | 31059.19 | +---------+-------------------+-----------+------------+-----------+ ``` ``` Display statistics for a split dataset without saving to the file >> python statistics_dataset.py --input splitted_dataset +------------------------------------------------------------------+ | train | count of images: 19 | +---------+-------------------+-----------+------------+-----------+ | class | number of objects | avg_width | avg_height | avg_area | +---------+-------------------+-----------+------------+-----------+ | meter | 19 | 395.58 | 411.05 | 171485.26 | | value | 19 | 183.05 | 51.11 | 10799.26 | | seal | 16 | 155.19 | 151.06 | 21259.50 | | mag | 13 | 30.62 | 33.54 | 1071.46 | | seal2 | 24 | 78.75 | 70.04 | 6600.00 | | model | 14 | 107.64 | 25.93 | 2955.64 | | serial | 18 | 110.44 | 17.33 | 2226.56 | | breaker | 12 | 160.50 | 184.33 | 31833.42 | +---------+-------------------+-----------+------------+-----------+ +------------------------------------------------------------------+ | test | count of images: 3 | +---------+-------------------+-----------+------------+-----------+ | class | number of objects | avg_width | avg_height | avg_area | +---------+-------------------+-----------+------------+-----------+ | meter | 3 | 420.33 | 366.33 | 163746.67 | | value | 3 | 217.33 | 47.00 | 13863.67 | | seal | 5 | 146.20 | 151.40 | 22713.60 | | mag | 3 | 38.33 | 50.67 | 2490.00 | | model | 2 | 158.00 | 26.00 | 4578.00 | | serial | 3 | 123.33 | 19.33 | 3868.33 | | breaker | 4 | 142.00 | 231.25 | 28736.50 | | seal2 | 3 | 65.33 | 52.67 | 3784.67 | +---------+-------------------+-----------+------------+-----------+ +-----------------------------------------------------------------+ | val | count of images: 2 | +--------+-------------------+-----------+------------+-----------+ | class | number of objects | avg_width | avg_height | avg_area | +--------+-------------------+-----------+------------+-----------+ | meter | 2 | 534.50 | 554.50 | 309903.50 | | value | 2 | 230.50 | 58.50 | 14812.00 | | seal | 2 | 282.50 | 190.50 | 53477.50 | | mag | 2 | 54.00 | 53.00 | 2950.00 | | seal2 | 2 | 152.50 | 125.50 | 18655.00 | | serial | 2 | 149.50 | 30.00 | 4656.50 | | model | 1 | 204.00 | 43.00 | 8772.00 | +--------+-------------------+-----------+------------+-----------+ +------------------------------------------------------------------+ | trainval | count of images: 21 | +---------+-------------------+-----------+------------+-----------+ | class | number of objects | avg_width | avg_height | avg_area | +---------+-------------------+-----------+------------+-----------+ | meter | 21 | 408.81 | 424.71 | 184667.95 | | value | 21 | 187.57 | 51.81 | 11181.43 | | seal | 18 | 169.33 | 155.44 | 24839.28 | | mag | 15 | 33.73 | 36.13 | 1321.93 | | seal2 | 26 | 84.42 | 74.31 | 7527.31 | | model | 15 | 114.07 | 27.07 | 3343.40 | | serial | 20 | 114.35 | 18.60 | 2469.55 | | breaker | 12 | 160.50 | 184.33 | 31833.42 | +---------+-------------------+-----------+------------+-----------+ ``` ``` Outputs statistics for an augmented unsplit dataset with the number of copies equal to 5. The statistics are also saved in the path ./stat/stat.txt >> python statistics_dataset.py --input augmented_dataset --save --save_path stat +------------------------------------------------------------------+ | train_test_val | count of images: 120 | +---------+-------------------+-----------+------------+-----------+ | class | number of objects | avg_width | avg_height | avg_area | +---------+-------------------+-----------+------------+-----------+ | meter | 120 | 426.29 | 441.95 | 201052.84 | | value | 120 | 197.42 | 60.44 | 14066.77 | | seal2 | 145 | 85.43 | 74.40 | 7367.63 | | model | 85 | 123.21 | 32.36 | 4433.02 | | serial | 115 | 118.58 | 23.93 | 3487.95 | | seal | 115 | 172.86 | 164.56 | 27600.59 | | mag | 90 | 35.52 | 41.13 | 1628.20 | | breaker | 80 | 163.86 | 193.43 | 32002.40 | +---------+-------------------+-----------+------------+-----------+ ``` ## Transformation of all dataset images to the same resolution (with transformation of markup coordinates accordingly). ### Start The file `resize_dataset.py` with arguments is used for the transformation: - `--input` - directory with dataset in one of the markup formats: MsCOCO, PascalVOC, Yolo (dataset can be split or augmented) - `--output` - (_optional_) directory for saving a new dataset (default: `./resized_dataset`) - `--new_w` - width of images in pixels - `--new_h` - image height in pixels ### Example ``` Resize split dataset to 512x512 (in MsCOCO format three process bars are output for split dataset, in others - one) Save to default directory: ./resized_dataset >> python resize_dataset.py --input splitted_dataset --new_w 512 --new_h 512 Resize images in Train set: 100%|██████████████████████████████████████████| 19/19 [00:00<00:00, 27.08it/s] Resize images in Test set: 100%|██████████████████████████████████████████| 3/3 [00:00<00:00, 25.54it/s] Resize images in Val set: 100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 28.34it/s] Resize images in TrainVal set: 100%|██████████████████████████████████████████| 21/21 [00:00<00:00, 33.62it/s] ```