# TF2-RL
**Repository Path**: TOUtheeng/TF2-RL
## Basic Information
- **Project Name**: TF2-RL
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-12-04
- **Last Updated**: 2020-12-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Reinforcement Learning Agents
Implemented for Tensorflow 2.0+
## New Updates!
- DDPG with prioritized replay
- Primal-Dual DDPG for CMDP
## Future Plans
- SAC Discrete
## Usage
- Install dependancies imported ([my tf2 conda env as reference](https://github.com/anita-hu/TF2-RL/blob/master/mytf2env.txt))
- Each file contains example code that runs training on CartPole env
- Training: `python3 TF2_DDPG_LSTM.py`
- Tensorboard: `tensorboard --logdir=DDPG/logs`
## Hyperparameter tuning
- Install hyperopt https://github.com/hyperopt/hyperopt
- Optional: switch agent used and configure param space in `hyperparam_tune.py`
- Run: `python3 hyperparam_tune.py`
## Agents
Agents tested using CartPole env.
| Name | On/off policy | Model | Action space support |
| --- | --- | --- | --- |
| [DQN](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) | off-policy | Dense, LSTM | discrete |
| [DDPG](https://arxiv.org/pdf/1509.02971.pdf) | off-policy | Dense, LSTM | discrete, continuous |
| [AE-DDPG](https://arxiv.org/pdf/1903.00827.pdf) | off-policy | Dense | discrete, continuous |
| [SAC:bug:](https://arxiv.org/pdf/1812.05905.pdf) | off-policy | Dense | continuous |
| [PPO](https://arxiv.org/pdf/1707.06347.pdf) | on-policy | Dense | discrete, continuous |
#### Contrained MDP
| Name | On/off policy | Model | Action space support |
| --- | --- | --- | --- |
| [Primal-Dual DDPG](https://arxiv.org/pdf/1802.06480.pdf) | off-policy | Dense | discrete, continuous|
## Models
Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs
## Demos
| DQN Basic, time step = 4, 500 reward | DQN LSTM, time step = 4, 500 reward |
| --- | --- |
|
|
|
| DDPG Basic, 500 reward | DDPG LSTM, time step = 5, 500 reward |
| --- | --- |
|
|
|
| AE-DDPG Basic, 500 reward | PPO Basic, 500 reward |
| --- | -- |
|
|
|