# REDQ
**Repository Path**: printffox/REDQ
## Basic Information
- **Project Name**: REDQ
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-01-17
- **Last Updated**: 2025-01-17
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# REDQ Implementation
This repository provides a clean implementation of the Randomized Ensembled Double Q-learning (REDQ) [[paper]](https://arxiv.org/abs/2101.05982) algorithm, a model-free reinforcement learning algorithm that achieves high sample efficiency in continuous action space domains. The implementation is compatible with OpenAI Gym environments and includes comparison benchmarks against Soft Actor-Critic (SAC).
## Overview
REDQ is an enhancement over traditional off-policy algorithms, employing three key mechanisms:
- High Update-To-Data (UTD) ratio for improved sample efficiency
- Ensemble of Q-functions to reduce variance in Q-function estimates
- In-target minimization to reduce over-estimation bias
The pseudocode of the REDQ algorithm is as follows:
The main hyperparameters of the REDQ algorithm are:
- G: Number of gradient steps per interaction
- N: Number of critic networks
- M: Size of the random subset over N critics
## Results
For the final conclusions see the report [here](docs/REDQ_report.pdf). Here you can see a summary of the results:
## Usage example
In this example, we train the REDQ algorithm on the LunarLander-v2 environment with the following hyperparameters: N=5, G=5, M=2
```bash
python main.py --kwargs N=5 G=5 M=2 alpha=0.05 --exp_name REDQ_alpha0.05_N5_G5_M2 --total_timesteps 200000 --seed 1 --env LunarLander-v2
```
Furthermore, since Soft-Actor Critic (SAC) is subset of REDQ, we can also train the SAC algorithm with the following hyperparameters: N=2, G=1, M=2
```bash
python main.py --kwargs N=2 G=1 M=2 alpha=0.2 --exp_name SAC_alpha0.2 --total_timesteps 200000 --seed 42 --env LunarLander-v2
```
See the file `run_experiments.sh` for more examples.
If you want to plot the results see the notebook `show_plots.ipynb`.
## Installation
```bash
conda env create -f environment.yml
conda activate RL
```
## References
1. Chen, X., et al. (2021). "Randomized ensembled double q-learning: Learning fast without a model". arXiv preprint arXiv:2101.05982.
2. Haarnoja, T., et al. (2018). "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor".
## Others
This repository was part of the course ATCI at UPC Barcelona.
Authors:
- Lukas Meggle
- Alberto Maté