Reproducible ImageNet training with Ignite

In this example, we provide script and tools to perform reproducible experiments on training neural networks on ImageNet dataset.

Features:

Distributed training with mixed precision by nvidia/apex
Experiments tracking with MLflow or Polyaxon

tb_dashboard

There are two possible options: 1) Experiments tracking with MLflow or 2) Experiments tracking with Polyaxon. Experiments tracking with MLflow is more suitable for a local machine with GPUs. For experiments tracking with Polyaxon user needs to have Polyaxon installed on a machine/cluster/cloud and can schedule experiments with polyaxon-cli. User can choose one option and skip the descriptions of another option.

Notes for experiments tracking with MLflow
Notes for experiments tracking with Polyaxon

Implementation details

Files tree description:

code
  |___ dataflow : module privides data loaders and various transformers
  |___ scripts : executable training scripts
  |___ utils : other helper modules

configs
  |___ train : training python configuration files  
  
experiments 
  |___ mlflow : MLflow related files
  |___ plx : Polyaxon related files
 
notebooks : jupyter notebooks to check specific parts from code modules

Code and configs

py_config_runner

We use py_config_runner package to execute python scripts with python configuration files.

Training scripts

Training scripts are located code/scripts and contains

mlflow_training.py, training script with MLflow experiments tracking
plx_training.py, training script with Polyaxon experiments tracking
common_training.py, common training code used by above files

Training scripts contain run method required by py_config_runner to run a script with a configuration. Training logic is setup inside training method and configures a distributed trainer, 2 evaluators and various logging handlers to tensorboard, mlflow/polyaxon logger and tqdm.

Configurations

baseline_resnet50.py : trains ResNet50

Results

Model	Training Top-1 Accuracy	Training Top-5 Accuracy	Test Top-1 Accuracy	Test Top-5 Accuracy
ResNet-50	78%	92%	77%	94%

Acknowledgements

Part of trainings was done within Tesla GPU Test Drive on 2 Nvidia V100 GPUs.

tb_dashboard_images

Pytorch-Mirror/ignite

Reproducible ImageNet training with Ignite

Implementation details

Code and configs

py_config_runner

Training scripts

Configurations

Results

Acknowledgements

简介

发行版

贡献者

近期动态

Pytorch-Mirror/ignite .gitee-modal { width: 500px !important; }

Reproducible ImageNet training with Ignite

Implementation details

Code and configs

py_config_runner

Training scripts

Configurations

Results

Acknowledgements

简介

发行版

贡献者

近期动态

搜索帮助

Pytorch-Mirror/ignite