# MedMNIST

**Repository Path**: sjw917/MedMNIST

## Basic Information

- **Project Name**: MedMNIST
- **Description**: MedMNIST：上海交大发布医学影像领域的MNIST
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 2
- **Forks**: 0
- **Created**: 2020-10-31
- **Last Updated**: 2023-03-11

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# MedMNIST

We present *MedMNIST*, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28 * 28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets. 

![MedMNIST_Decathlon](MedMNIST_Decathlon.png)

More details, please refer to our paper:

**MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis**

Jiancheng Yang, Rui Shi, Bingbing Ni

[arXiv preprint](https://arxiv.org/abs/2010.14925), 2020.
([project page](https://medmnist.github.io/))

# Code Structure
* [`medmnist/`](medmnist/):
    * [`dataset.py`](medmnist/dataset.py): dataloaders of medmnist.
    * [`models.py`](medmnist/models.py): *ResNet-18* and *ResNet-50* models.
    * [`evaluator.py`](medmnist/evaluator.py): evaluate metrics.
    * [`environ.py`](medmnist/environ.py): roots.
* [`train.py`](train.py): the training script.

# Requirements
* Python 3 (Anaconda 3.6.3 specifically)
* PyTorch\==0.3.1
* numpy\==1.18.5, pandas\==0.25.3, scikit-learn\==0.22.2

Higher versions should also work (perhaps with minor modifications).


# Dataset

Our MedMNIST dataset is available on [Dropbox](https://www.dropbox.com/sh/upxrsyb5v8jxbso/AADOV0_6pC9Tb3cIACro1uUPa?dl=0).

The dataset contains ten subsets, and each subset (e.g., `pathmnist.npz`) is comprised of `train_images`, `train_labels`, `val_images`, `val_labels`, `test_images` and `test_labels`.

# How to run the experiments

* Download Dataset  [MedMNIST](https://www.dropbox.com/sh/upxrsyb5v8jxbso/AADOV0_6pC9Tb3cIACro1uUPa?dl=0).

* Modify the paths

  Specify `dataroot` and `outputroot` in  [./medmnist/environ.py](./medmnist/environ.py) 

  `dataroot` is the root where you save our `npz` datasets

  `outputroot` is the root where you want to save testing results

* Run our [`train.py`](./train.py) script in terminal. 

  First, change directory to where train.py locates. Then, use command `python train.py xxxmnist` to run the experiments, where `xxxmnist` is subset of our MedMNIST (e.g., `pathmnist`).

# LICENSE
The code is under Apache-2.0 License.