# RL4RedTeam

**Repository Path**: LordRayleigh/RL4RedTeam

## Basic Information

- **Project Name**: RL4RedTeam
- **Description**: A PPO agent leveraging reinforcement learning performs Penetration Testing in a simulated computer network environment. The agent is trained to scan for vulnerabilities in the network and exploit them to gain access to various network resources.
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: clap-enhanced
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-12-07
- **Last Updated**: 2023-11-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CLAP: **C**uriosity-Driven Reinforcment **L**earning **A**utomatic **P**enetration Testing Agent

`CLAP` is a reinforcement learning [PPO agent](https://arxiv.org/abs/1707.06347) performs [**Penetration Testing**](https://en.wikipedia.org/wiki/Penetration_test) in simulated computer network environment (we use [Network Attack Simulator (NASim)](https://github.com/Jjschwartz/NetworkAttackSimulator)). The agent is trained to scan for vulnerabilities in the network and exploit them to gain access to various network resources. `CLAP` was initially poposed in our paper [*Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach*](https://arxiv.org/abs/2202.10630). 

![](https://files.catbox.moe/784yxg.jpg)

## Simulated Network Enviornment: [Network Attack Simulator (NASim)](https://github.com/Jjschwartz/NetworkAttackSimulator)
Network Attack Simulator (NASim) is a simulated computer network complete with vulnerabilities, scans and exploits designed to be used as a testing environment for AI agents and planning techniques applied to network penetration testing.

However, compared to the original paper, this repo has made following changes

- Developed based on [CleanRL](https://github.com/vwxyzjn/cleanrl) 
- Add LSTM for POMDP scenarios
  - As [Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs](https://proceedings.mlr.press/v162/ni22a.html)
- To Support NASim [2D observation space](https://networkattacksimulator.readthedocs.io/en/latest/reference/envs/environment.html), `Transformer` was implementated as preceptions 
  - However, they are extremely unstable to train
  - To learn more about the transformer enocder: [Check Yekun's Note](https://ychai.uk/notes/2019/07/21/RL/DRL/Decipher-AlphaStar-on-StarCraft-II/#Encoders)


## Prerequisites:

To run this code, you will need to have the following installed on your system:

- Python 3.5 or later
- Pytorch 2.0 or later
- OpenAI Gym 0.21.0 (huge change 0.25)
- NASim 0.91

> It's important to be aware that OpenAI Gym underwent a significant update after version 0.25.0, which included a new `step` API.
## Get Started

Use `Conda` to manage `python` environmnent and `Poetry` to manage packages.
Get Started

Clone this repo:

```bash
git clone https://github.com/yyzpiero/RL4RedTeam.git
```

Create conda environment:
```bash
conda create -p ./venv python==X.X
```
and use poetry to install all Python packages:
```bash
poetry install
```
## Train the agent
To train the agent, you can use the following command:

```python
cd ./algo
python clap.py
```

This will start the training process, which will run until the agent reaches a satisfactory level of performance. The performance of the agent will be printed to the console at regular intervals, so you can monitor its progress.

## Contributing

> The ppo implementation is heavily based on [Costa Huang's](https://costa.sh/) fantasitc library [CleanRl](https://github.com/vwxyzjn/cleanrl)

Pull requests are welcome. For major changes, please open an issue first
to discuss what you would like to change.

Please make sure to update tests as appropriate.

## Citing `CLAP`

```latex
@article{yang2022behaviour,
  title={Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach},
  author={Yang, Yizhou and Liu, Xin},
  journal={arXiv preprint arXiv:2202.10630},
  year={2022}
}
```

## TODOs

- [ ] Add origin code for `CLAP` 
- [ ] Add Random Network Distillation (RND) 
- [ ] Include figures of the training results

## Limitations
This implementation of the PPO algorithm is not intended for use in real-world penetration testing. It is only meant for use in a simulated environment, and should not be used to perform actual penetration testing on real networks.