# on-policy **Repository Path**: cicoa/on-policy ## Basic Information - **Project Name**: on-policy - **Description**: No description available - **Primary Language**: Python - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-01-21 - **Last Updated**: 2022-01-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implements MAPPO, a multi-agent variant of PPO. The implementation in this repositorory is used in the paper "The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games" (https://arxiv.org/abs/2103.01955). This repository is heavily based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. ## Environments supported: - [StarCraftII (SMAC)](https://github.com/oxwhirl/smac) - [Hanabi](https://github.com/deepmind/hanabi-learning-environment) - [Multiagent Particle-World Environments (MPEs)](https://github.com/openai/multiagent-particle-envs) ## 1. Usage **WARNING: by default all experiments assume a shared policy by all agents i.e. there is one neural network shared by all agents** All core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code for MAPPO. * The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi. * Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment. * Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered. * Python training scripts for each environment can be found in the scripts/train/ folder. * The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used. ## 2. Installation Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the [PyTorch website](https://pytorch.org/get-started/locally/). ``` Bash # create conda environment conda create -n marl python==3.6.1 conda activate marl pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html ``` ``` # install on-policy package cd on-policy pip install -e . ``` Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet. ### 2.1 Install StarCraftII [4.10](http://blzdistsc2-a.akamaihd.net/Linux/SC2.4.10.zip) ``` Bash unzip SC2.4.10.zip # password is iagreetotheeula echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc ``` * download SMAC Maps, and move it to `~/StarCraftII/Maps/`. * To use a stableid, copy `stableid.json` from https://github.com/Blizzard/s2client-proto.git to `~/StarCraftII/`. ### 2.2 Hanabi Environment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here. To install, execute the following: ``` Bash pip install cffi cd envs/hanabi mkdir build & cd build cmake .. make -j ``` Here are all hanabi [models](https://drive.google.com/drive/folders/1RIcP_rG9NY9UzaWfFsIncDcjASk5h4Nx?usp=sharing). ### 2.3 Install MPE ``` Bash # install this package first pip install seaborn ``` There are 3 Cooperative scenarios in MPE: * simple_spread * simple_speaker_listener, which is 'Comm' scenario in paper * simple_reference ## 3.Train Here we use train_mpe.sh as an example: ``` cd onpolicy/scripts chmod +x ./train_mpe.sh ./train_mpe.sh ``` Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official [documentation](https://docs.wandb.ai/). Adding the `--use_wandb` in command line or in the .sh file will use Tensorboard instead of Weights & Biases. We additionally provide `./eval_hanabi_forward.sh` for evaluating the hanabi score over 100k trials. ## 4. Publication If you find this repository useful, please cite our [paper](https://arxiv.org/abs/2103.01955): ``` @misc{yu2021surprising, title={The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games}, author={Chao Yu and Akash Velu and Eugene Vinitsky and Yu Wang and Alexandre Bayen and Yi Wu}, year={2021}, eprint={2103.01955}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```