# hyper

**Repository Path**: nutquant/hyper

## Basic Information

- **Project Name**: hyper
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-07-12
- **Last Updated**: 2024-07-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Hypernetworks in Meta-RL

This repository contains code for the papers [*Hypernetworks in Meta-Reinforcement Learning* (Beck et al., 2022)](https://arxiv.org/abs/2210.11348), published at [CoRL](https://proceedings.mlr.press/v205/beck23a.html), and [*Recurrent Hypernetworks are Surprisingly Strong in Meta-RL* (Beck et al., 2023)](https://arxiv.org/abs/2309.14970), published at [NeurIPS](https://neurips.cc/virtual/2023/poster/70399), and [*SplAgger: Split Aggregation for Meta-Reinforcement Learning* (Beck et al., 2024)](https://arxiv.org/abs/2403.03020), published at RLC.

```
@inproceedings{beck2022hyper,
  author     =  {Jacob Beck and Matthew Jackson and Risto Vuorio and Shimon Whiteson},
  title      =  {Hypernetworks in Meta-Reinforcement Learning},
  booktitle  =  {Conference on Robot Learning},
  year       =  {2022},
  url        =  {https://openreview.net/forum?id=N-HtsQkRotI}
}
@inproceedings{beck2023recurrent,
  author     =  {Jacob Beck and Risto Vuorio and Zheng Xiong and Shimon Whiteson},
  title      =  {Recurrent Hypernetworks are Surprisingly Strong in Meta-RL},
  booktitle  =  {Thirty-seventh Conference on Neural Information Processing Systems},
  year       =  {2023},
  url        =  {https://openreview.net/forum?id=pefAAzu8an}
}
@inproceedings{beck2024splagger,
  author    = {Jacob Beck and Matthew Jackson and Risto Vuorio and Zheng Xiong and Shimon Whiteson},
  title     = {SplAgger: Split Aggregation for Meta-Reinforcement Learning},
  booktitle   = {Reinforcement Learning Conference},
  eprint    = {2403.03020},
  url={https://openreview.net/forum?id=O1Vmua4RVW},
  year      = {2024},
}
```

This repository is based on [code](https://github.com/lmzintgraf/varibad) from *VariBAD: A very good method for Bayes-Adaptive Deep RL via Meta-Learning* (Zintgraf et al., 2020). If you use this code, please additionally cite this paper:

```
@inproceedings{zintgraf2020varibad,
  author     =  {Zintgraf, Luisa and Shiarlis, Kyriacos and Igl, Maximilian and Schulze, Sebastian and Gal, Yarin and Hofmann, Katja and Whiteson, Shimon},
  title      =  {VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning},
  booktitle  =  {International Conference on Learning Representation (ICLR)},
  year       =  {2020}}
```

Finally, the T-Maze environments, Minecraft environments, aggregators in `aggregator.py`, and analysis in `visualize_analysis.py` are all reproduced from [*AMRL: Aggregated Memory For Reinforcement Learning* (Beck et al., 2020)](https://iclr.cc/virtual_2020/poster_Bkl7bREtDr.html). These files were adapted to PyTorch, from the [original code](https://github.com/jacooba/AMRL-ICLR2020) in TensorFlow. If you use any of those modules, please cite this paper:

```
@inproceedings{beck2020AMRL,
  author     =  {Jacob Beck and Kamil Ciosek and Sam Devlin and Sebastian Tschiatschek and Cheng Zhang and Katja Hofmann},
  title      =  {AMRL: Aggregated Memory For Reinforcement Learning},
  booktitle  =  {International Conference on Learning Representations},
  year       =  {2020},
  url        =  {https://openreview.net/forum?id=Bkl7bREtDr}
}
```

### Usage

The experiments can be found in `experiment_sets/`. The models themselves are defined in `models.py`. Main results on initialization methods (Beck et al., 2022) can be found in `init_main_results.py`. Main results on supervision (Beck et al., 2023) and an example usage of SplAgger (Beck et al., 2024) can be found in `main_results.py`. Analysis and the remaining environments can be found in `analysis.py` and `all_envs.py`, respectively.

`run_experiments.py` can be used to build dockers, launch experiments, and start new experiments when there is sufficient space. Results will be saved in `hyper/data/` by default.

*Example usage:*
```
python3 run_experiments.py main_results --shuffle --gpu_free 0-7 --experiments_per_gpu 1 |& tee log.txt
```

The script, `run_experiments.py`, automatically runs commands using the docker files, e.g., executing `run_cpu.sh mujoco150 0 python ~/MetaMem/main.py --env-type gridworld_varibad`, to run gridworld on CPU 0. Within a docker, this command could be run with `python main.py --env-type gridworld_varibad`. 

The main training loop itself can be found in `metalearner.py`, the hypernetwork is in `policy.py`, and added supervision for task inference is in `ppo.py`.

After training, `visualize_runs.py` can be used for plotting. To automatically plot all results for a set of experiments, you can also use the `run_experiments.py` script. Plots will be saved in `hyper/data/plts/` by default.

*Example usage:*
```
python3 run_experiments.py main_results --plot
```

To measure the different types of gradient decay for different aggregators in the SplAgger analysis, you can use `visualize_analysis.py`. The plot will be saved as `hyper/data/analysis.png` by default. (Currently set up for CPU usage.)

*Example usage:*
```
/home/jaceck/hyper/run_cpu.sh mujoco150 0 python /home/jaceck/hyper/visualize_analysis.py --grad --noise --no_log
/home/jaceck/hyper/run_cpu.sh mujoco150 0 python /home/jaceck/hyper/visualize_analysis.py --param_grad --noise --no_log
/home/jaceck/hyper/run_cpu.sh mujoco150 0 python /home/jaceck/hyper/visualize_analysis.py --inputs_grad --noise
/home/jaceck/hyper/run_cpu.sh mujoco150 0 python /home/jaceck/hyper/visualize_analysis.py --perm_diff --no_log
```

### Comments

- The *env-type* argument refers to a config in `config/`, and is a list of default arguments common to an environment, but these can be overridden in the experiment set.
- Different environments require one of three different dockers, specifying different MuJoCo versions, as documented in the respective experiments sets.
The dockerfiles can be built automatically with `run_experiments.py`, or manually with, e.g., `bash build.sh Dockerfile_mj150`.
- To recreate SplAgger experiments, use the environments in `all_envs.py` and models in `models.py`, but note that you need to additionally set "latent_dim": 12 and "full_transitions": True, in "shared_arguments", if not done already. Additionally, setting "policy_entropy_coef": 0.0, on PlanningGame, is done for you and is very important!
- `requirements.txt` is legacy from VariBAD, and likely out of date.