# active_ops

**Repository Path**: mirrors_deepmind/active_ops

## Basic Information

- **Project Name**: active_ops
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-12-05
- **Last Updated**: 2025-10-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Active Offline Policy Selection

This is supporting example code for NeurIPS 2021 paper [Active Offline Policy
Selection](https://arxiv.org/abs/2106.10251) by Ksenia Konyushkova*, Yutian
Chen*, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz,
Misha Denil, Nando de Freitas.

To simulate the active offline policy selection for a set of policies, one needs
to provide a number of files. We provide the files for 76 policies on
`` cartpole_swingup `` environemnt.

1. Sampled episodic returns for all policies on a number of evalauation episodes
(`` full-reward-samples-dict.pkl ``), or a way of sampling a new episode of
evaluation upon request for any policy. The file
`` full-reward-samples-dict.pkl `` contains a dictionary that maps a policy by
its string representation to a numpy.ndarray of of shape (5000,) (number of
reward samples).

2. Off-policy evaluation score, such as fitted Q-evaluation (FQE) for all
policies (`` ope_values.pkl ``). The file `` ope_values.pkl `` contains
dictionary that maps policy info into OPE estimates. We provide FQE scores
for the policies.

3. Actions that policies take on 1000 randomly sampled states from the offline
dataset (`` actions.pkl ``). The file `` actions.pkl `` contains a dictionary
with keys `` actions`` and `` policy_keys``. `` actions`` is a list of 1000 (
number of states used to compute the kernel) elements of numpy.ndarray type of
dimensionality 76x1 (number of policies by the dimensionality of the actions).
`` policy_keys`` contains a dictionary mapping from string representation of a
policy to the index of that policy in actions.

## Installation

To set up the virtual environment, run the following commands.
From within the `active_ops` directory:

```
python3 -m venv active_ops_env
source active_ops_env/bin/activate

pip install --upgrade pip
pip install -r requirements.txt
```

To run the demo with colab, enable the ```jupyter_http_over_ws``` extension:

```
jupyter serverextension enable --py jupyter_http_over_ws
```

Finally, start a server:

```
jupyter notebook \
  --NotebookApp.allow_origin='https://colab.research.google.com' \
  --port=8888 \
  --NotebookApp.port_retries=0
```

## Usage

To run the code refer to `` Active_ops_experiment.ipynb  `` colab notebook.
Execute blocks of code one by one to reproduce the final plot. You can modify
various parameters maked by  `` @param `` to test various baselines in modified
settings. This code loads the example of data for cartpole_environment provided
in the data folder. Using this data, we reproduce the results of Figure 14 of
the paper.

## Citing this work

```
@inproceedings{konyushkovachen2021aops,
    title = "Active Offline Policy Selection",
    author = "Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas",
    booktitle = NeurIPS,
    year = 2021
}
```

## Disclaimer

This is not an official Google product.

The datasets in this work are licensed under the Creative Commons Attribution
4.0 International License. To view a copy of this license, visit
[http://creativecommons.org/licenses/by/4.0/]
(http://creativecommons.org/licenses/by/4.0/).