# ppo-implementation-details

**Repository Path**: dragon515/ppo-implementation-details

## Basic Information

- **Project Name**: ppo-implementation-details
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-23
- **Last Updated**: 2025-11-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# The 37 Implementation Details of Proximal Policy Optimization

This repo contains the source code for the blog post *The 37 Implementation Details of Proximal Policy Optimization*

* Blog post url: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
* Tracked Weights and Biases experiments: https://wandb.ai/vwxyzjn/ppo-details

If you like this repo, consider checking out CleanRL (https://github.com/vwxyzjn/cleanrl), the RL library that we used to build this repo.


## Get started

Prerequisites:
* Python 3.8+
* [Poetry](https://python-poetry.org)

Install dependencies:
```
poetry install
```

Train agents:
```
poetry run python ppo.py
```

Train agents with experiment tracking:
```
poetry run python ppo.py --track --capture-video
```

### Atari
Install dependencies:
```
poetry install -E atari
```
Train agents:
```
poetry run python ppo_atari.py
```
Train agents with experiment tracking:
```
poetry run python ppo_atari.py --track --capture-video
```


### Pybullet
Install dependencies:
```
poetry install -E pybullet
```
Train agents:
```
poetry run python ppo_continuous_action.py
```
Train agents with experiment tracking:
```
poetry run python ppo_continuous_action.py --track --capture-video
```


### Gym-microrts (MultiDiscrete)

Install dependencies:
```
poetry install -E gym-microrts
```
Train agents:
```
poetry run python ppo_multidiscrete.py
```
Train agents with experiment tracking:
```
poetry run python ppo_multidiscrete.py --track --capture-video
```
Train agents with invalid action masking:
```
poetry run python ppo_multidiscrete_mask.py
```
Train agents with invalid action masking and experiment tracking:
```
poetry run python ppo_multidiscrete_mask.py --track --capture-video
```

### Atari with Envpool

Install dependencies:
```
poetry install -E envpool
```
Train agents:
```
poetry run python ppo_atari_envpool.py
```
Train agents with experiment tracking:
```
poetry run python ppo_atari_envpool.py --track
```
Solve `Pong-v5` in 5 mins:
```
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3
```
400 game scores in `Breakout-v5` with PPO in ~1 hour (side-effects-free 3-4x speed up compared to `ppo_atari.py` with `SyncVectorEnv`):
```
poetry run python ppo_atari_envpool.py --gym-id Breakout-v5
```


### Procgen

Install dependencies:
```
poetry install -E procgen
```
Train agents:
```
poetry run python ppo_procgen.py
```
Train agents with experiment tracking:
```
poetry run python ppo_procgen.py --track
```

## Reproduction of all of our results

To reproduce the results run with `openai/baselines`, install our fork at [hhttps://github.com/vwxyzjn/baselines](hhttps://github.com/vwxyzjn/baselines). Then follow the scripts in `scripts/baselines`. To reproduce our results, follow the scripts in `scripts/ours`.


## Citation

```bibtex
@inproceedings{shengyi2022the37implementation,
  author = {Huang, Shengyi and Dossa, Rousslan Fernand Julien and Raffin, Antonin and Kanervisto, Anssi and Wang, Weixun},
  title = {The 37 Implementation Details of Proximal Policy Optimization},
  booktitle = {ICLR Blog Track},
  year = {2022},
  note = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/},
  url  = {https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/}
}
```