# rl_delay_basic **Repository Path**: mirrors_NVlabs/rl_delay_basic ## Basic Information - **Project Name**: rl_delay_basic - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-08-02 - **Last Updated**: 2025-09-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README This repository contains the implementation of the delayed-RL agent from the paper: "Acting in Delayed Environments with Non-Stationary Markov Policies", Esther Derman^\*, Gal Dalal^\*, Shie Mannor (^*equal contribution).

The agent here supports the Cartpole and Acrobot environments by OpenAI. The Atari-supported agent will be released in a separate repository. **Installation instructions:** 1. Tested with python3.7. Conda virtual env is encouraged. Other versions of python and/or environments should also be possible. 2. Clone project and cd to project dir. 3. Create virtual env:\ Option 1 -- Tensorflow 2.2: Run `pip install -r requirements.py` (other versions of the packages in requirements.py should also be fine).\ Option 2 -- Tensorflow 1.14: Run `conda env create -f environment. yml` to directly create a virtual env called `tf_14`. 4. To enable support of the noisy Cartpole and Acrobot experiments, modify the original gym cartpole.py and acrobot.py:\ Option 1 -- via pip install: ```bash cd third_party git submodule sync && git submodule update --init --recursive cd gym git apply ../gym.patch pip install -e . ``` Option 2 -- manually:\ 4a. Find location in site packages. E.g., "/home/username/anaconda3/envs/rl_delay_env/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py"\ 4b. Overwrite the above file with "rl_delay_basic/gym_modifications/cartpole.py". Repeat the same process for "rl_delay_basic/gym_modifications/acrobot.py". **Hyperparameters:** The parameters used for the experiments in the paper are the default ones appearing in init_main.py. They are the same for all types of agents (delayed, augmented, oblivious), both noisy and non-noisy, and all delay values. The only exception is that for Cartpole epsilon_decay=0.999, while for Acrobot epsilon_decay=0.9999. **Wandb sweep:** Using wandb, you can easily run multiple experiments for different agents, delay values, hyperparameters, etc. An example sweep file is included the in project: example_sweep.yml. A sweep can be created via "wandb sweep example_sweep.yml", and multiple workers can be started with "wandb agent your-sweep-id". For more details see https://docs.wandb.ai/guides/sweeps/quickstart. Feel free to leave questions and raise issues. Happy delaying!