# DI-engine **Repository Path**: gujinjiang/DI-engine ## Basic Information - **Project Name**: DI-engine - **Description**: OpenDILab决策智能引擎 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 24 - **Created**: 2021-10-19 - **Last Updated**: 2023-01-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
--- [](https://pypi.org/project/DI-engine/)            [](https://codecov.io/gh/opendilab/DI-engine)  [](https://github.com/opendilab/DI-engine/stargazers) [](https://github.com/opendilab/DI-engine/network)  [](https://github.com/opendilab/DI-engine/issues) [](https://github.com/opendilab/DI-engine/pulls) [](https://github.com/opendilab/DI-engine/graphs/contributors) [](https://github.com/opendilab/DI-engine/blob/master/LICENSE) Updated on 2021.09.30 DI-engine-v0.2.0 (beta) ## Introduction to DI-engine (beta) DI-engine is a generalized Decision Intelligence engine. It supports most basic deep reinforcement learning (DRL) algorithms, such as DQN, PPO, SAC, and domain-specific algorithms like QMIX in multi-agent RL, GAIL in inverse RL, and RND in exploration problems. Various training pipelines and customized decision AI applications are also supported. Have fun with exploration and exploitation. ### Application - [DI-star](https://github.com/opendilab/DI-star) - [DI-drive](https://github.com/opendilab/DI-drive) ### Environment - [GoBigger](https://github.com/opendilab/GoBigger) ### System Optimization and Design - [DI-orchestrator](https://github.com/opendilab/DI-orchestrator) - [DI-hpc](https://github.com/opendilab/DI-hpc) - [DI-store](https://github.com/opendilab/DI-store) ### Other - [DI-engine-docs](https://github.com/opendilab/DI-engine-docs) - [treevalue](https://github.com/opendilab/treevalue) - [DI-treetensor](https://github.com/opendilab/DI-treetensor) (preview) ## Installation You can simply install DI-engine from PyPI with the following command: ```bash pip install DI-engine ``` If you use Anaconda or Miniconda, you can install DI-engine from conda-forge through the following command: ```bash conda install -c opendilab di-engine ``` For more information about installation, you can refer to [installation](https://opendilab.github.io/DI-engine/installation/index.html). And our dockerhub repo can be found [here](https://hub.docker.com/repository/docker/opendilab/ding),we prepare `base image` and `env image` with common RL environments. - base: opendilab/ding:nightly - atari: opendilab/ding:nightly-atari - mujoco: opendilab/ding:nightly-mujoco - smac: opendilab/ding:nightly-smac ## Documentation The detailed documentation are hosted on [doc](https://opendilab.github.io/DI-engine/)([中文文档](https://di-engine-docs.readthedocs.io/en/main-zh/)). ## Quick Start [3 Minutes Kickoff](https://opendilab.github.io/DI-engine/quick_start/index.html) [3 Minutes Kickoff(colab)](https://colab.research.google.com/drive/1J29voOD2v9_FXjW-EyTVfRxY_Op_ygef#scrollTo=MIaKQqaZCpGz) [3 分钟上手中文版(kaggle)](https://www.kaggle.com/shenzhenperson/di-engine) **Bonus: Train RL agent in one line code:** ```bash ding -m serial -e cartpole -p dqn -s 0 ``` ## Feature ### Algorithm Versatility | No | Algorithm | Label | Implementation | Runnable Demo | | :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) |  | [policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 | | 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) |  | [policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 | | 3 | [QRDQN](https://arxiv.org/pdf/1710.10044.pdf) |  | [policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 | | 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) |  | [policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 | | 5 | [Rainbow](https://arxiv.org/pdf/1710.02298.pdf) |  | [policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 | | 6 | [SQL](https://arxiv.org/pdf/1702.08165.pdf) |  | [policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 | | 7 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) |  | [policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 | | 8 | [A2C](https://arxiv.org/pdf/1602.01783.pdf) |  | [policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 | | 9 | [PPO](https://arxiv.org/abs/1707.06347) |  | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 | | 10 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) |  | [policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py | | 11 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) |  | [policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 | | 12 | [IMPALA](https://arxiv.org/abs/1802.01561) |  | [policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 | | 13 | [DDPG](https://arxiv.org/pdf/1509.02971.pdf) |  | [policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 | | 14 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) |  | [policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 | | 15 | [SAC](https://arxiv.org/abs/1801.01290) |  | [policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 | | 16 | [QMIX](https://arxiv.org/pdf/1803.11485.pdf) |  | [policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 | | 17 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) |  | [policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 | | 18 | [QTran](https://arxiv.org/abs/1905.05408) |  | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 | | 19 | [WQMIX](https://arxiv.org/abs/2006.10800) |  | [policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 | | 20 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) |  | [policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 | | 21 | [GAIL](https://arxiv.org/pdf/1606.03476.pdf) |  | [reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_reward_model -c cartpole_dqn_config.py -s 0 | | 22 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) |  | [entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sqil_config.py -s 0 | | 23 | [HER](https://arxiv.org/pdf/1707.01495.pdf) |  | [reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py | | 24 | [RND](https://arxiv.org/abs/1810.12894) |  | [reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_ppo_rnd_main.py | | 25 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) |  | [policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | python3 -u d4rl_cql_main.py | | 26 | [PER](https://arxiv.org/pdf/1511.05952.pdf) |  | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` | | 27 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) |  | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` | | 28 | [D4PG](https://arxiv.org/pdf/1804.08617.pdf) |  | [policy/d4pg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/d4pg.py) | python3 -u pendulum_d4pg_config.py |  means discrete action space, which is only label in normal DRL algorithms(1-15)  means continuous action space, which is only label in normal DRL algorithms(1-15)  means distributed training (collector-learner parallel) RL algorithm  means multi-agent RL algorithm  means RL algorithm which is related to exploration and sparse reward  means Imitation Learning, including Behaviour Cloning, Inverse RL, Adversarial Structured IL  means offline RL algorithm  means other sub-direction algorithm, usually as plugin-in in the whole pipeline P.S: The `.py` file in `Runnable Demo` can be found in `dizoo` ### Environment Versatility | No | Environment | Label | Visualization | dizoo link | | :--: | :--------------------------------------: | :---------------------------------: | :--------------------------------:|:---------------------------------------------------------: | | 1 | [atari](https://github.com/openai/gym/tree/master/gym/envs/atari) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/atari/envs) | | 2 | [box2d/bipedalwalker](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/bipedalwalker/envs) | | 3 | [box2d/lunarlander](https://github.com/openai/gym/tree/master/gym/envs/box2d) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/box2d/lunarlander/envs) | | 4 | [classic_control/cartpole](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/cartpole/envs) | | 5 | [classic_control/pendulum](https://github.com/openai/gym/tree/master/gym/envs/classic_control) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/classic_control/pendulum/envs) | | 6 | [competitive_rl](https://github.com/cuhkrlcourse/competitive-rl) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.classic_control) | | 7 | [gfootball](https://github.com/google-research/football) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo.gfootball/envs) | | 8 | [minigrid](https://github.com/maximecb/gym-minigrid) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/minigrid/envs) | | 9 | [mujoco](https://github.com/openai/gym/tree/master/gym/envs/mujoco) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/majoco/envs) | | 10 | [multiagent_particle](https://github.com/openai/multiagent-particle-envs) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/multiagent_particle/envs) | | 11 | [overcooked](https://github.com/HumanCompatibleAI/overcooked-demo) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/overcooded/envs) | | 12 | [procgen](https://github.com/openai/procgen) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/procgen) | | 13 | [pybullet](https://github.com/benelot/pybullet-gym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pybullet/envs) | | 14 | [smac](https://github.com/oxwhirl/smac) |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/smac/envs) | | 15 | [d4rl](https://github.com/rail-berkeley/d4rl) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/d4rl) | | 16 | league_demo |   |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/league_demo/envs) | | 17 | pomdp atari |  | | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/pomdp/envs) | | 18 | [bsuite](https://github.com/deepmind/bsuite) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/bsuite/envs) | | 19 | [ImageNet](https://www.image-net.org/) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/image_classification) | | 20 | [slime_volleyball](https://github.com/hardmaru/slimevolleygym) |  |  | [dizoo link](https://github.com/opendilab/DI-engine/tree/main/dizoo/slime_volley) |  means discrete action space  means continuous action space  means multi-agent RL environment  means environment which is related to exploration and sparse reward  means offline RL environment  means Imitation Learning or Supervised Learning Dataset  means environment that allows agent VS agent battle P.S. some enviroments in Atari, such as **MontezumaRevenge**, are also sparse reward type ## Contribution We appreciate all contributions to improve DI-engine, both algorithms and system designs. Please refer to CONTRIBUTING.md for more guides. And our roadmap can be accessed by [this link](https://github.com/opendilab/DI-engine/projects). And users can join our [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ) or our [forum](https://github.com/opendilab/DI-engine/discussions) for more detailed discussion. For future plans or milestones, please refer to our [GitHub Projects](https://github.com/opendilab/DI-engine/projects). ## Citation ```latex @misc{ding, title={{DI-engine: OpenDILab} Decision Intelligence Engine}, author={DI-engine Contributors}, publisher = {GitHub}, howpublished = {\url{https://github.com/opendilab/DI-engine}}, year={2021}, } ``` ## License DI-engine released under the Apache 2.0 license.