# data-gradients **Repository Path**: qianyiz/data-gradients ## Basic Information - **Project Name**: data-gradients - **Description**: synced from https://github.com/Deci-AI/data-gradients#label-adapter - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2023-07-31 - **Last Updated**: 2024-01-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DataGradients

DataGradients is an open-source python based library specifically designed for computer vision dataset analysis. It automatically extracts features from your datasets and combines them all into a single user-friendly report. **Detect Common Data Issues** - corrupted data, labeling errors, underlying biases, data leakage, duplications, faulty augmentations, and disparities between train and validation sets. **Extract Insights for Better Model Design** - take informed decisions when designing your model, based on data characteristics such as object size and location distributions, number of object in an image and high frequency details. **Reduce Guesswork Searching For The Right Hyperparameters** - define the correct NMS and filtering parameters, identify class distribution issues and define loss function weights accordingly, define proper augmentations according to data variability, and calibrate metrics to monitor your unique dataset.

To better understand how to tackle the data issues highlighted by DataGradients, explore our comprehensive online course on analyzing computer vision datasets.

- [Features](#features) - [Installation](#installation) - [Quick Start](#quick-start) - [Prerequisites](#prerequisites) - [Dataset Analysis](#dataset-analysis) - [Report](#report) - [Feature Configuration](#feature-configuration) - [Dataset Adapters](#dataset-adapters) - [Image Adapter](#image-adapter) - [Label Adapter](#label-adapter) - [Example](#example) - [License](#license) ## Features - Image-Level Evaluation: DataGradients evaluates key image features such as resolution, color distribution, and average brightness. - Class Distribution: The library extracts stats allowing you to know which classes are the most used, how many objects do you have per image, how many image without any label, ... - Heatmap Generation: DataGradients produces heatmaps of bounding boxes or masks, allowing you to understand if the objects are positioned in the right area. - And [many more](./documentation/feature_description.md)!

Example of pages from the Report

Example of specific features

## Examples [COCO 2017 Detection report](documentation/assets/Report_COCO.pdf) [Cityscapes Segmentation report](documentation/assets/Report_CityScapes.pdf)

Example notebook on Colab

## Installation You can install DataGradients directly from the GitHub repository. ``` pip install data-gradients ``` ## Quick Start ### Prerequisites - **Dataset**: Includes a **Train** set and a **Validation** or a **Test** set. - **Class Names**: A list of the unique categories present in your dataset. - **Iterable**: A method to iterate over your Dataset providing images and labels. Can be any of the following: - PyTorch Dataloader - PyTorch Dataset - Generator that yields image/label pairs - Any other iterable you use for model training/validation Please ensure all the points above are checked before you proceed with **DataGradients**. **Good to Know**: DataGradients will try to find out how the dataset returns images and labels. - If something cannot be automatically determined, you will be asked to provide some extra information through a text input. - In some extreme cases, the process will crash and invite you to implement a custom dataset adapter (see relevant section) **Heads up**: We currently don't provide out-of-the-box dataset/dataloader implementation. You can find multiple dataset implementations in [PyTorch](https://pytorch.org/vision/stable/datasets.html) or [SuperGradients](https://docs.deci.ai/super-gradients/src/super_gradients/training/datasets/Dataset_Setup_Instructions.html). **Example** ``` python from torchvision.datasets import CocoDetection train_data = CocoDetection(...) val_data = CocoDetection(...) class_names = ["person", "bicycle", "car", "motorcycle", ...] ``` ### Dataset Analysis You are now ready to go, chose the relevant analyzer for your task and run it over your datasets! **Object Detection** ```python from data_gradients.managers.detection_manager import DetectionAnalysisManager train_data = ... val_data = ... class_names = ... analyzer = DetectionAnalysisManager( report_title="Testing Data-Gradients Object Detection", train_data=train_data, val_data=val_data, class_names=class_names, ) analyzer.run() ``` **Semantic Segmentation** ```python from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager train_data = ... val_data = ... class_names = ... analyzer = SegmentationAnalysisManager( report_title="Testing Data-Gradients Segmentation", train_data=train_data, val_data=val_data, class_names=class_names, ) analyzer.run() ``` **Example** You can test the segmentation analysis tool in the following [example](https://github.com/Deci-AI/data-gradients/blob/master/examples/segmentation_example.py) which does not require you to download any additional data. ### Report Once the analysis is done, the path to your pdf report will be printed. ## Feature Configuration The feature configuration allows you to run the analysis on a subset of features or adjust the parameters of existing features. If you are interested in customizing this configuration, you can check out the [documentation](documentation/feature_configuration.md) on that topic. ## Dataset Adapters Before implementing a Dataset Adapter try running without it, in many cases DataGradient will support your dataset without any code. Two type of Dataset Adapters are available: `images_extractor` and `labels_extractor`. These functions should be passed to the main Analyzer function init. ```python from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager train_data = ... val_data = ... # Let Assume that in this case, the train_data and val_data return data in this format: # (image, {"masks", "bboxes"}) images_extractor = lambda data: data[0] # Extract the image labels_extractor = lambda data: data[1]['masks'] # Extract the masks # In case of segmentation. SegmentationAnalysisManager( report_title="Test with Adapters", train_data=train_data, val_data=val_data, images_extractor=images_extractor, labels_extractor=labels_extractor, ) # For Detection, just change the Manager and the label_extractor definition. ``` ### Image Adapter Image Adapter functions should respect the following: `images_extractor(data: Any) -> torch.Tensor` - `data` being the output of the dataset/dataloader that you provided. - The function should return a Tensor representing your image(s). One of: - `(BS, C, H, W)`, `(BS, H, W, C)`, `(BS, H, W)` for batch - `(C, H, W)`, `(H, W, C)`, `(H, W)` for single image - With `C`: number of channels (3 for RGB) ### Label Adapter Label Adapter functions should respect the following: `labels_extractor(data: Any) -> torch.Tensor` - `data` being the output of the dataset/dataloader that you provided. - The function should return a Tensor representing your labels(s): - For **Segmentation**, one of: - `(BS, C, H, W)`, `(BS, H, W, C)`, `(BS, H, W)` for batch - `(C, H, W)`, `(H, W, C)`, `(H, W)` for single image - `BS`: Batch Size - `C`: number of channels - 3 for RGB - `H`, `W`: Height and Width - For **Detection**, one of: - `(BS, N, 5)`, `(N, 6)` for batch - `(N, 5)` for single image - `BS`: Batch Size - `N`: Padding size - The last dimension should include your `class_id` and `bbox` - `class_id, x, y, x, y` for instance ### Example Let's imagine that your dataset returns a couple of `(image, annotation)` with `annotation` as below: ``` python annotation = [ {"bbox_coordinates": [1.08, 187.69, 611.59, 285.84], "class_id": 51}, {"bbox_coordinates": [5.02, 321.39, 234.33, 365.42], "class_id": 52}, ... ] ``` Because this dataset includes a very custom type of `annotation`, you will need to implement your own custom `labels_extractor` as below: ``` python from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager def labels_extractor(data: Tuple[PIL.Image.Image, List[Dict]]) -> torch.Tensor: _image, annotations = data[:2] labels = [] for annotation in annotations: class_id = annotation["class_id"] bbox = annotation["bbox_coordinates"] labels.append((class_id, *bbox)) return torch.Tensor(labels) SegmentationAnalysisManager( ..., labels_extractor=labels_extractor ) ``` ## Community

Click here to join our Discord Community

## License This project is released under the [Apache 2.0 license](LICENSE.md).