# Coach **Repository Path**: mirrors/Coach ## Basic Information - **Project Name**: Coach - **Description**: Coach 是 Intel Nervana 开源的一个强化学习研究框架,包含许多最先进的算法的实现 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: https://www.oschina.net/p/intel-coach - **GVP Project**: No ## Statistics - **Stars**: 7 - **Forks**: 2 - **Created**: 2017-10-23 - **Last Updated**: 2025-12-06 ## Categories & Tags **Categories**: machine-learning **Tags**: None ## README > :warning: **DISCONTINUATION OF PROJECT** - > *This project will no longer be maintained by Intel. > Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.* > **Intel no longer accepts patches to this project.** > *If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.* # Coach [](https://github.com/IntelLabs/coach/blob/master/LICENSE) [](https://intellabs.github.io/coach/) [](https://doi.org/10.5281/zenodo.1134898) [](https://pepy.tech/project/rl-coach) [](https://pepy.tech/project/rl-coach-slim)

### Distributed Multi-Node Coach
As of release 0.11.0, Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11.0 this was tested on the ClippedPPO and DQN agents.
For usage instructions please refer to the documentation [here](https://intellabs.github.io/coach/dist_usage.html).
### Batch Reinforcement Learning
Training and evaluating an agent from a dataset of experience, where no simulator is available, is supported in Coach.
There are [example](https://github.com/IntelLabs/coach/blob/master/rl_coach/presets/CartPole_DDQN_BatchRL.py) [presets](https://github.com/IntelLabs/coach/blob/master/rl_coach/presets/Acrobot_DDQN_BCQ_BatchRL.py) and a [tutorial](https://github.com/IntelLabs/coach/blob/master/tutorials/4.%20Batch%20Reinforcement%20Learning.ipynb).
## Supported Environments
* *OpenAI Gym:*
Installed by default by Coach's installer (see note on MuJoCo version [below](#note-on-mujoco-version)).
* *ViZDoom:*
Follow the instructions described in the ViZDoom repository -
https://github.com/mwydmuch/ViZDoom
Additionally, Coach assumes that the environment variable VIZDOOM_ROOT points to the ViZDoom installation directory.
* *Roboschool:*
Follow the instructions described in the roboschool repository -
https://github.com/openai/roboschool
* *GymExtensions:*
Follow the instructions described in the GymExtensions repository -
https://github.com/Breakend/gym-extensions
Additionally, add the installation directory to the PYTHONPATH environment variable.
* *PyBullet:*
Follow the instructions described in the [Quick Start Guide](https://docs.google.com/document/d/10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA) (basically just - 'pip install pybullet')
* *CARLA:*
Download release 0.8.4 from the CARLA repository -
https://github.com/carla-simulator/carla/releases
Install the python client and dependencies from the release tarball:
```
pip3 install -r PythonClient/requirements.txt
pip3 install PythonClient
```
Create a new CARLA_ROOT environment variable pointing to CARLA's installation directory.
A simple CARLA settings file (```CarlaSettings.ini```) is supplied with Coach, and is located in the ```environments``` directory.
* *Starcraft:*
Follow the instructions described in the PySC2 repository -
https://github.com/deepmind/pysc2
* *DeepMind Control Suite:*
Follow the instructions described in the DeepMind Control Suite repository -
https://github.com/deepmind/dm_control
* *Robosuite:*
**__Note:__ To use Robosuite-based environments, please install Coach from the latest cloned repository. It is not yet available as part of the `rl_coach` package on PyPI.**
Follow the instructions described in the [robosuite documentation](https://robosuite.ai/docs/installation.html) (see note on MuJoCo version [below](#note-on-mujoco-version)).
### Note on MuJoCo version
OpenAI Gym supports MuJoCo only up to version 1.5 (and corresponding mujoco-py version 1.50.x.x). The Robosuite simulation framework, however, requires MuJoCo version 2.0 (and corresponding mujoco-py version 2.0.2.9, as of robosuite version 1.2). Therefore, if you wish to run both Gym-based MuJoCo environments and Robosuite environments, it's recommended to have a separate virtual environment for each.
Please note that all Gym-Based MuJoCo presets in Coach (`rl_coach/presets/Mujoco_*.py`) have been validated _**only**_ with MuJoCo 1.5 (including the reported [benchmark results](benchmarks)).
## Supported Algorithms
### Value Optimization Agents
* [Deep Q Network (DQN)](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) ([code](rl_coach/agents/dqn_agent.py))
* [Double Deep Q Network (DDQN)](https://arxiv.org/pdf/1509.06461.pdf) ([code](rl_coach/agents/ddqn_agent.py))
* [Dueling Q Network](https://arxiv.org/abs/1511.06581)
* [Mixed Monte Carlo (MMC)](https://arxiv.org/abs/1703.01310) ([code](rl_coach/agents/mmc_agent.py))
* [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860) ([code](rl_coach/agents/pal_agent.py))
* [Categorical Deep Q Network (C51)](https://arxiv.org/abs/1707.06887) ([code](rl_coach/agents/categorical_dqn_agent.py))
* [Quantile Regression Deep Q Network (QR-DQN)](https://arxiv.org/pdf/1710.10044v1.pdf) ([code](rl_coach/agents/qr_dqn_agent.py))
* [N-Step Q Learning](https://arxiv.org/abs/1602.01783) | **Multi Worker Single Node** ([code](rl_coach/agents/n_step_q_agent.py))
* [Neural Episodic Control (NEC)](https://arxiv.org/abs/1703.01988) ([code](rl_coach/agents/nec_agent.py))
* [Normalized Advantage Functions (NAF)](https://arxiv.org/abs/1603.00748.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/naf_agent.py))
* [Rainbow](https://arxiv.org/abs/1710.02298) ([code](rl_coach/agents/rainbow_dqn_agent.py))
### Policy Optimization Agents
* [Policy Gradients (PG)](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/policy_gradients_agent.py))
* [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/abs/1602.01783) | **Multi Worker Single Node** ([code](rl_coach/agents/actor_critic_agent.py))
* [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | **Multi Worker Single Node** ([code](rl_coach/agents/ddpg_agent.py))
* [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf) ([code](rl_coach/agents/ppo_agent.py))
* [Clipped Proximal Policy Optimization (CPPO)](https://arxiv.org/pdf/1707.06347.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/clipped_ppo_agent.py))
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86))
* [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://arxiv.org/abs/1611.01224) | **Multi Worker Single Node** ([code](rl_coach/agents/acer_agent.py))
* [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) ([code](rl_coach/agents/soft_actor_critic_agent.py))
* [Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) ([code](rl_coach/agents/td3_agent.py))
### General Agents
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py))
### Imitation Learning Agents
* Behavioral Cloning (BC) ([code](rl_coach/agents/bc_agent.py))
* [Conditional Imitation Learning](https://arxiv.org/abs/1710.02410) ([code](rl_coach/agents/cil_agent.py))
### Hierarchical Reinforcement Learning Agents
* [Hierarchical Actor Critic (HAC)](https://arxiv.org/abs/1712.00948.pdf) ([code](rl_coach/agents/hac_ddpg_agent.py))
### Memory Types
* [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495.pdf) ([code](rl_coach/memories/episodic/episodic_hindsight_experience_replay.py))
* [Prioritized Experience Replay (PER)](https://arxiv.org/abs/1511.05952) ([code](rl_coach/memories/non_episodic/prioritized_experience_replay.py))
### Exploration Techniques
* E-Greedy ([code](rl_coach/exploration_policies/e_greedy.py))
* Boltzmann ([code](rl_coach/exploration_policies/boltzmann.py))
* Ornstein–Uhlenbeck process ([code](rl_coach/exploration_policies/ou_process.py))
* Normal Noise ([code](rl_coach/exploration_policies/additive_noise.py))
* Truncated Normal Noise ([code](rl_coach/exploration_policies/truncated_normal.py))
* [Bootstrapped Deep Q Network](https://arxiv.org/abs/1602.04621) ([code](rl_coach/agents/bootstrapped_dqn_agent.py))
* [UCB Exploration via Q-Ensembles (UCB)](https://arxiv.org/abs/1706.01502) ([code](rl_coach/exploration_policies/ucb.py))
* [Noisy Networks for Exploration](https://arxiv.org/abs/1706.10295) ([code](rl_coach/exploration_policies/parameter_noise.py))
## Citation
If you used Coach for your work, please use the following citation:
```
@misc{caspi_itai_2017_1134899,
author = {Caspi, Itai and
Leibovich, Gal and
Novik, Gal and
Endrawis, Shadi},
title = {Reinforcement Learning Coach},
month = dec,
year = 2017,
doi = {10.5281/zenodo.1134899},
url = {https://doi.org/10.5281/zenodo.1134899}
}
```
## Contact
We'd be happy to get any questions or contributions through GitHub issues and PRs.
Please make sure to take a look [here](CONTRIBUTING.md) before filing an issue or proposing a PR.
The Coach development team can also be contacted over [email](mailto:coach@intel.com)
## Disclaimer
Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product.
Additional algorithms and environments are planned to be added to the framework. Feedback and contributions from the open source and RL research communities are more than welcome.