# tf2rl **Repository Path**: zhangtning/tf2rl ## Basic Information - **Project Name**: tf2rl - **Description**: TensorFlow2.0 Reinforcement Learning - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-02-13 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![Build Status](https://travis-ci.org/keiohta/tf2rl.svg?branch=master)](https://travis-ci.org/keiohta/tf2rl) [![Coverage Status](https://coveralls.io/repos/github/keiohta/tf2rl/badge.svg?branch=master)](https://coveralls.io/github/keiohta/tf2rl?branch=master) [![MIT License](http://img.shields.io/badge/license-MIT-blue.svg?style=flat)](LICENSE) [![GitHub issues open](https://img.shields.io/github/issues/keiohta/tf2rl.svg)]() [![PyPI version](https://badge.fury.io/py/tf2rl.svg)](https://badge.fury.io/py/tf2rl) # TF2RL TF2RL is a deep reinforcement learning library that implements various deep reinforcement learning algorithms using TensorFlow 2.0. ## Algorithms Following algorithms are supported: | Algorithm | Dicrete action | Continuous action | Support | Category | | :----------------------------------------------------------: | :------------: | :---------------: | :----------------------------------------: | ------------------------ | | [VPG](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf), [PPO]() | ✓ | ✓ | [GAE](https://arxiv.org/abs/1506.02438) | Model-free On-policy RL | | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) (including [DDQN](https://arxiv.org/abs/1509.06461), [Prior. DQN](https://arxiv.org/abs/1511.05952), [Duel. DQN](https://arxiv.org/abs/1511.06581), [Distrib. DQN](), [Noisy DQN]()) | ✓ | - | [ApeX]() | Model-free Off-policy RL | | [DDPG](https://arxiv.org/abs/1509.02971) (including [TD3](), [BiResDDPG]()) | - | ✓ | [ApeX]() | Model-free Off-policy RL | | [SAC]() | ✓ | ✓ | [ApeX]() | Model-free Off-policy RL | | [GAIL](), [GAIfO](), [VAIL]() (including [Spectral Normalization]()) | ✓ | ✓ | - | Imitation Learning | Following papers have been implemented in tf2rl: - Model-free On-policy RL - [Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf), [code]() - [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438), [code]() - [Proximal Policy Optimization Algorithms](), [code]() - Model-free Off-policy RL - [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), [code]() - [Human-level control through Deep Reinforcement Learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf), [code]() - [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461), [code]() - [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952), [code]() - [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581), [code]() - [A Distributional Perspective on Reinforcement Learning](), [code]() - [Noisy Networks for Exploration](), [code]() - [Distributed Prioritized Experience Replay](), [code]() - [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971), [code]() - [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](), [Soft Actor-Critic Algorithms and Applications](https://arxiv.org/abs/1812.05905), [code]() - [Addressing Function Approximation Error in Actor-Critic Methods](), [code]() - [Deep Residual Reinforcement Learning](), [code]() - [Soft Actor-Critic for Discrete Action Settings](https://arxiv.org/abs/1910.07207v1), [code]() - Imitation Learning - [Generative Adversarial Imitation Learning](), [code]() - [Spectral Normalization for Generative Adversarial Networks](), [code]() - [Generative Adversarial Imitation from Observation](), [code]() - [Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow](), [code]() ## Installation You can install `tf2rl` from PyPI: ```bash $ pip install tf2rl ``` or, you can also install from source: ```bash $ git clone https://github.com/keiohta/tf2rl.git tf2rl $ cd tf2rl $ pip install . ``` ## Getting started Here is a quick example of how to train DDPG agent on a Pendulum environment: ```python import gym from tf2rl.algos.ddpg import DDPG from tf2rl.experiments.trainer import Trainer parser = Trainer.get_argument() parser = DDPG.get_argument(parser) args = parser.parse_args() env = gym.make("Pendulum-v0") test_env = gym.make("Pendulum-v0") policy = DDPG( state_shape=env.observation_space.shape, action_dim=env.action_space.high.size, gpu=-1, # Run on CPU. If you want to run on GPU, specify GPU number memory_capacity=10000, max_action=env.action_space.high[0], batch_size=32, n_warmup=500) trainer = Trainer(policy, env, args, test_env=test_env) trainer() ``` You can check implemented algorithms in [examples](https://github.com/keiohta/tf2rl/tree/master/examples). For example if you want to train DDPG agent: ```bash # You must change directory to avoid importing local files $ cd examples # For options, please specify --help or read code for options $ python run_ddpg.py [options] ``` You can see the training progress/results from TensorBoard as follows: ```bash # When executing `run_**.py`, its logs are automatically generated under `./results` $ tensorboard --logdir results ```