# ESCP

**Repository Path**: wenb11/ESCP

## Basic Information

- **Project Name**: ESCP
- **Description**: ESCP:让策略快速感知并适应环境变化
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-10-17
- **Last Updated**: 2024-10-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ESCP
Code for [Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy](https://www.aaai.org/AAAI22Papers/AAAI-6573.LuoF.pdf).
![image](assets/poster.jpg)
## Installation
### Install with pip
Install the required python packages in `requirement.txt` by
```bash
pip install -r ./requirement.txt
```
Note: You can follow the instructions at [here](https://github.com/openai/mujoco-py) to properly install `mujoco-py`.

### Use a docker image
we have built  a docker image, with which we ran all the experiments in the paper. The docker image can be pulled from [DockerHub](https://hub.docker.com/repository/docker/sanluosizhou/selfdl).
```bash
docker pull sanluosizhou/selfdl:ml
```
## Run

You can conduct the experiment in `HalfCheetah-v2` with the following command.
```bash
python main.py --env_name HalfCheetah-v2 --rnn_fix_length 16 --seed 5  --task_num 40 --max_iter_num 2000 --varying_params  dof_damping_1_dim  --test_task_num 40 --ep_dim 2 --name_suffix RMDM --rbf_radius 3000  --use_rmdm  --stop_pg_for_ep  --bottle_neck
```

We also provide the command for running in the docker
```bash
docker run --rm -it --shm-size 50gb --gpus all -v $PWD:/root/policy_adaptation sanluosizhou/selfdl:ml -c "cd /root/policy_adaptation && python main.py --env_name HalfCheetah-v2 --rnn_fix_length 16 --seed 5  --task_num 40 --max_iter_num 2000 --varying_params  dof_damping_1_dim  --test_task_num 40 --ep_dim 2 --name_suffix RMDM --rbf_radius 3000  --use_rmdm  --stop_pg_for_ep  --bottle_neck"
```

There are several key parameters:
- `--env_name`: configures the environment you are going to conduct experiment on. The possible environments: `GridWorldPlat-v2,Hopper-v2,HalfCheetah-v2,Walker2d-v,Ant-v2,Humanoid-v2`.
- `--rnn_fix_length`: configures the memory length (H in the paper). 
- `--seed`: configures the random seeds.
- `--task_num`: configures how many environments are used for policy training (it should be set to *12* in `GridWorldPlat-v2`).
- `--test_task_num`: configures how many environments are used for policy testing (it should be set to `12` in `GridWorldPlat-v2`).
- `--varying_params`: configures what kinds of environment changes are used, refer to [code](envs/nonstationary_env.py) for all kinds of supported environment changes.

You can conduct the experiment in `HalfCheetah-v2` with both `gravity` and `dof_damping` changed.

```bash
python main.py --env_name HalfCheetah-v2 --rnn_fix_length 16 --seed 5  --task_num 40 --max_iter_num 2000 --varying_params dof_damping_1_dim  gravity  --test_task_num 40 --ep_dim 2 --name_suffix RMDM_more_change --kernel_type rbf --rbf_radius 80 --diversity_loss_weight 1.0  --use_rmdm  --stop_pg_for_ep  --bottle_neck 
```