# sotabench-eval **Repository Path**: skyarn/sotabench-eval ## Basic Information - **Project Name**: sotabench-eval - **Description**: Easily evaluate machine learning models on public benchmarks - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-03-28 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

-------------------------------------------------------------------------------- [![PyPI version](https://badge.fury.io/py/sotabencheval.svg)](https://badge.fury.io/py/sotabencheval) [![Generic badge](https://img.shields.io/badge/Documentation-Here-.svg)](https://paperswithcode.github.io/sotabench-eval/) `sotabencheval` is a framework-agnostic library that contains a collection of deep learning benchmarks you can use to benchmark your models. It can be used in conjunction with the [sotabench](https://www.sotabench.com) service to record results for models, so the community can compare model performance on different tasks, as well as a continuous integration style service for your repository to benchmark your models on each commit. ## Benchmarks Supported - [ADE20K](https://paperswithcode.github.io/sotabench-eval/ade20k/) (Semantic Segmentation) - [COCO](https://paperswithcode.github.io/sotabench-eval/coco/) (Object Detection) - [ImageNet](https://paperswithcode.github.io/sotabench-eval/imagenet/) (Image Classification) - [SQuAD](https://paperswithcode.github.io/sotabench-eval/squad/) (Question Answering) - [WikiText-103](https://paperswithcode.github.io/sotabench-eval/wikitext-103/) (Language Modelling) - [WMT](https://paperswithcode.github.io/sotabench-eval/wmt/) (Machine Translation) PRs welcome for further benchmarks! ## Installation Requires Python 3.6+. ```bash pip install sotabench-eval ``` ## Get Benching! 🏋️ You should read the [full documentation here](https://paperswithcode.github.io/sotabench-eval/index.html), which contains guidance on getting started and connecting to [sotabench](https://www.sotabench.com). Integration is lightweight. For example, if you are evaluating an ImageNet model, you initialize an Evaluator object and (optionally) link to any linked paper: ```python from sotabencheval.image_classification import ImageNetEvaluator evaluator = ImageNetEvaluator( model_name='FixResNeXt-101 32x48d', paper_arxiv_id='1906.06423') ``` Then for each batch of predictions your model makes on ImageNet, pass a dictionary of keys as image IDs and values as a `np.ndarray`s of logits to the `evaluator.add` method: ```python evaluator.add(output_dict=dict(zip(image_ids, batch_output))) ``` The evaluation logic just needs to be written in a `sotabench.py` file and sotabench will run it on each commit and record the results:

## Contributing All contributions welcome!