In this example, we provide script and tools to perform reproducible experiments on training neural networks on ImageNet dataset.
Features:
There are two possible options: 1) Experiments tracking with MLflow or 2) Experiments tracking with Polyaxon.
Experiments tracking with MLflow is more suitable for a local machine with GPUs. For experiments tracking with Polyaxon
user needs to have Polyaxon installed on a machine/cluster/cloud and can schedule experiments with polyaxon-cli
.
User can choose one option and skip the descriptions of another option.
Files tree description:
code
|___ dataflow : module privides data loaders and various transformers
|___ scripts : executable training scripts
|___ utils : other helper modules
configs
|___ train : training python configuration files
experiments
|___ mlflow : MLflow related files
|___ plx : Polyaxon related files
notebooks : jupyter notebooks to check specific parts from code modules
We use py_config_runner package to execute python scripts with python configuration files.
Training scripts are located code/scripts and contains
mlflow_training.py
, training script with MLflow experiments trackingplx_training.py
, training script with Polyaxon experiments trackingcommon_training.py
, common training code used by above filesTraining scripts contain run
method required by py_config_runner to
run a script with a configuration. Training logic is setup inside training
method and configures a distributed trainer,
2 evaluators and various logging handlers to tensorboard, mlflow/polyaxon logger and tqdm.
Model | Training Top-1 Accuracy | Training Top-5 Accuracy | Test Top-1 Accuracy | Test Top-5 Accuracy |
---|---|---|---|---|
ResNet-50 | 78% | 92% | 77% | 94% |
Part of trainings was done within Tesla GPU Test Drive on 2 Nvidia V100 GPUs.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。