# kGym-Kernel-Gym
**Repository Path**: wangpengabc/kGym-Kernel-Gym
## Basic Information
- **Project Name**: kGym-Kernel-Gym
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-03
- **Last Updated**: 2025-11-03
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# :scroll: Kernel Gym (kGym)
# Updates
We have created an agent **CrashFixer** - that leverages kGym to resolve kernel crashes. Please find our paper below.
## kGym Details
Kernel Gym is a platform that allows for scalable execution of hundreds to thousands of linux kernels within a span of a day.
:star: With kGym we aim to democratize research at the intersection of machine learning and system software
:star: Using kGym researchers can run LLM experiments on a massive **20 million** lines-of-code codebase like Linux with a few simple API invocations.
## :building_construction: Architecture of kGym

## :fire: Architecture Components
:star: Client Facing Functionalities for Researchers:
- **Clerk** : A simple Python client library for issuing kernel jobs that can compile and execute kernel jobs. For standard users, you need only to get familiar with this library.
:star: Backend functionalities running in the background:
- **Scheduler**: a backend scheduler that schedule jobs across the different virtual machines and also provides information to frontend. The ```Scheduler``` is internally referred to as ```kscheduler``` in the code implementation.
- **Web UI**: a WebUI based on Next.js. The ```Web UI``` is internally referred to as ```kdashboard``` in the code implementation.
- **RabbitMQ**: message broker.
- **Kbuilder**: containerized worker for building kernels.
- **Kreproducer**: containerized worker for monitoring executions. The ```Kreproducer``` is internally referred to as ```kvmmanager``` in the code implementation.
- **messager**: a low-level library for communication between **Kreproducer** and **RabbitMQ**. The ```messager``` is internally referred to as ```kworker``` in the code implementation.
## Testing Details
We have tested KGym on a Ubuntu Desktop (with specific details below)
```
~$ uname -a
Linux ********* 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
```
## Deployment
**kGym currently has a hard requirement of the Google Cloud Platform (GCP)**.
Note: We understand that being able to run **kGym** on a local server would be the most cost-efficient way for most researchers. And hence we intend to provide this support in the future.
For the remainder of this tutorial you will need to work with ```gcloud``` - Google's library to interface with GCP.
### Install `gcloud` command line tool
Based on your terminal OS, follow the "Installation instructions" section [here](https://cloud.google.com/sdk/docs/install#installation_instructions).
### Log in and create a new project
After you have installed `gcloud` CLI, login your Google account and create a new project. Type in:
```
demo@kbdr ~ % gcloud init
Welcome! This command will take you through the configuration of gcloud.
Your current configuration has been set to: [default]
You can skip diagnostics next time by using the following flag:
gcloud init --skip-diagnostics
Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).
You must log in to continue. Would you like to log in (Y/n)?
```
Type in `y`:
```
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth?xxxxxxxxxxxxxxxx
You are logged in as: [xxxxxxxxx@gmail.com].
Pick cloud project to use:
[1] Enter a project ID
[2] Create a new project
Please enter numeric choice or text value (must exactly match list item):
```
Type in `2` since we need a new project for KBDr-Runner deployment.
```
Enter a Project ID. Note that a Project ID CANNOT be changed later.
Project IDs must be 6-30 characters (lowercase ASCII, digits, or
hyphens) in length and start with a lowercase letter.
```
Type in the name of the project ID you want. In this example, the project ID is `kbdr-runner-1789`.
```
Waiting for [operations/cp.xxxxxxxxxxxxxxx] to finish...done.
Your current project has been set to: [kbdr-runner-1789].
Not setting default zone/region (this feature makes it easier to use
[gcloud compute] by setting an appropriate default value for the
--zone and --region flag).
See https://cloud.google.com/compute/docs/gcloud-compute section on how to set
default compute region and zone manually. If you would like [gcloud init] to be
able to do this for you the next time you run it, make sure the
Compute Engine API is enabled for your project on the
https://console.developers.google.com/apis page.
Created a default .boto configuration file at [/Users/xxxxxxxx/.boto]. See this file and
[https://cloud.google.com/storage/docs/gsutil/commands/config] for more
information about configuring Google Cloud Storage.
Your Google Cloud SDK is configured and ready to use!
* Commands that require authentication will use xxxxxxxxx@gmail.com by default
* Commands will reference project `kbdr-runner-1789` by default
Run `gcloud help config` to learn how to change individual settings
This gcloud configuration is called [default]. You can create additional configurations if you work with multiple accounts and/or projects.
Run `gcloud topic configurations` to learn more.
```
Now you have logged in and created a new project for KBDr-Runner.
### Setup billing information
Link your credit card to Google Cloud Platform. More resources can be found [here](https://cloud.google.com/billing/docs/how-to/modify-project).
At this point, we can now request resources and start the deployment.
### Create VMs
In this step, we are going to create computing nodes for the deployment:
- kbdr-main (for kscheduler and kmq)
- kbdr-worker-0 (for kbuilder)
- kbdr-worker-1 (for kvmmanager)
First, let's enable the GCP Compute Engine API:
```
gcloud services enable compute.googleapis.com
```
Once it's enabled, create VMs:
```
gcloud compute instances create kbdr-main \
--boot-disk-size=100GB \
--enable-nested-virtualization \
--machine-type=c2-standard-16 \
--zone=us-central1-a \
--image-family=debian-11 \
--image-project=debian-cloud
gcloud compute instances create kbdr-worker-0 \
--boot-disk-size=100GB \
--enable-nested-virtualization \
--machine-type=c2-standard-30 \
--zone=us-central1-a \
--image-family=debian-11 \
--image-project=debian-cloud
gcloud compute instances create kbdr-worker-1 \
--boot-disk-size=100GB \
--enable-nested-virtualization \
--machine-type=c2-standard-16 \
--zone=us-central1-a \
--image-family=debian-11 \
--image-project=debian-cloud
```
If you encounter the `quota` error, follow the instructions to increase the quota.
### Storage Bucket
First, let's enable the GCP storage API:
```
gcloud services enable storage.googleapis.com
```
Then, create a storage bucket. In this example, we created a bucket named `kbdr-runner-1789`:
```
gcloud storage buckets create gs://kbdr-runner-1789
```
Once it's created, upload Linux distro images to the bucket. Download `userspace-images.zip` from [here]() and place it in the root folder. Then,
```
unzip userspace-images.zip
gcloud storage cp --recursive ./userspace-images gs://kbdr-runner-1789/
```
### Service Accounts
For the purpose of simplicity, in this step, we are going to create one service account with all privileges. **It is not advised to use only one service account for all privileges in production environment**.
```
ACCOUNT_NAME=kbdr-runner-sa
PROJECT_ID=kbdr-runner-1789
gcloud iam service-accounts create $ACCOUNT_NAME \
--project $PROJECT_ID
SA_EMAIL=$ACCOUNT_NAME@$PROJECT_ID.iam.gserviceaccount.com
# Object user permission;
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/storage.objectUser"
# Compute admin permission;
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SA_EMAIL" \
--role="roles/compute.admin"
# Create key;
gcloud iam service-accounts keys create ./assets/sa-credential.json \
--iam-account="$SA_EMAIL" \
--key-file-type=json --project $PROJECT_ID
# Place keys for all roles;
cp ./assets/sa-credential.json ./assets/kscheduler-credential.json
cp ./assets/sa-credential.json ./assets/kbuilder-credential.json
cp ./assets/sa-credential.json ./assets/kvmmanager-credential.json
```
### Configure KBDr-Runner Assets
Type in
```
gcloud compute instances list
```
And then you can see the instances we just created and their IP addresses:
```
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
kbdr-main us-central1-a c2-standard-16 10.128.0.1 RUNNING
kbdr-worker-0 us-central1-a c2-standard-30 10.128.0.2 RUNNING
kbdr-worker-1 us-central1-a c2-standard-16 10.128.0.3 RUNNING
```
We will deploy kscheduler and kmq on kbdr-main. For other workers to connect to kmq, we need a RabbitMQ connection string. Construct such string in this way:
```
amqp://kbdr:ey4lai1the7peeGh@:5672/?heartbeat=60
```
For the purpose of simplicity, we will use username `kbdr` and passphrase `ey4lai1the7peeGh`.
Once we have the connection string, navigate to `assets` folder and checkout configuration files ended with `.env.template`. For example, copy `kbuilder.env.template` and save it as `kbuilder.env`, then modify `kbuilder.env`:
```
# required: RabbitMQ connection URL, modify the ;
KBDR_KBUILDER_RABBITMQ_CONN_URL=amqp://kbdr:ey4lai1the7peeGh@:5672/?heartbeat=60
# required: Bucket name for storage
GCS_BUCKET_NAME=
```
In this example, the configured `env` file would be:
```
# required: RabbitMQ connection URL, modify the ;
KBDR_KBUILDER_RABBITMQ_CONN_URL=amqp://kbdr:ey4lai1the7peeGh@10.128.0.1:5672/?heartbeat=60
# required: Bucket name for storage
GCS_BUCKET_NAME=kbdr-runner-1789
```
Configure the following files in such way:
- `kscheduler.env`
- `kvmmanager.env`
### Configure kdashboard
First make the configuration file:
```
cp ./kdashboard/src/app/config.json.template ./kdashboard/src/app/config.json
```
Edit it to:
```json
{
"KBDR_API_URL": "http://localhost:8000",
"KBDR_CLIENT_URL": "http://localhost:3000"
}
```
### Deploy KBDr-Runner on GCE VMs
Use the deployment scripts to deploy runner nodes. First, cd to KBDr-Runner root folder, upload everything in the KBDr-Runner folder to the remote machines. You can delete the `userspace-images` folder and zip file before upload.
```
gcloud compute scp --recurse ./* kbdr-main:
gcloud compute scp --recurse ./* kbdr-worker-0:
gcloud compute scp --recurse ./* kbdr-worker-1:
```
Then, run the pre-installation scripts on remote machines:
```
gcloud compute ssh kbdr-main --command="./deployment/deploy-new.sh"
gcloud compute ssh kbdr-worker-0 --command="./deployment/deploy-new.sh"
gcloud compute ssh kbdr-worker-1 --command="./deployment/deploy-new.sh"
```
After the pre-installation scripts, build Docker images on each machine and install them as services:
```
gcloud compute ssh kbdr-main --command="echo \"SERVICE_TAG=latest\" > .env && sudo docker compose build kmq kscheduler kdashboard"
gcloud compute ssh kbdr-worker-0 --command="echo \"SERVICE_TAG=latest\" > .env && sudo docker compose build kbuilder"
gcloud compute ssh kbdr-worker-1 --command="echo \"SERVICE_TAG=latest\" > .env && sudo docker compose build kvmmanager"
```
Start services:
```
gcloud compute ssh kbdr-main --command="sudo docker compose up -d kmq kscheduler kdashboard"
gcloud compute ssh kbdr-worker-0 --command="sudo docker compose up -d kbuilder"
gcloud compute ssh kbdr-worker-1 --command="sudo docker compose up -d kvmmanager"
```
### Access to `kscheduler` API
Use SSH for port forwarding:
```
gcloud compute ssh kbdr-main \
-- -NL 8000:localhost:8000 -NL 3000:localhost:3000
```
`8000` port accesses to the API, `3000` port accesses to the dashboard.
## Examples of using ```clerk```
#### (1) Kernel Building Job
Important parameters for a **kernel building job** using ```clerk```
1. ```kernel_git_url``` - a URL to the git tree of the kernel implementation
2. ```kernel_commit_id``` - a git commit-id that specifies the exact version of the kernel
3. ```kernel_config``` - the ```.config``` file containing all the kernel config operations
4. ```userspace_img_name``` - the userspace image name (mostly ```buildroot.raw```)
5. ```arch``` - we currently support only ```amd64```
6. ```compiler``` - the compiler to be used (one of ```gcc``` or ```clang```)
7. ```linker``` - the linker used to link all the modules (```ld``` for ```gcc``` and ```ld.lld``` for ```clang```)
8. ```patch``` - an optional patch provided by an LLM to fix the buggy kernel
A sample piece of code that runs a **kernel building** job
```
import KBDr.kcomposer as kcomp
session = kcomp.KBDrSession(os.environ['KBDR_RUNNER_API_BASE_URL'])
workers, args, kv = kcomp.compose_kernel_build(kernel_git_url: str,
kernel_commit_id: str,
kernel_config: str,
userspace_img_name: str,
arch: str,
compiler: str='gcc',
linker: str='ld',
patch: str='')
session.create_job(workers, args, kv)
```
#### (2) Bug Reproduction Job
In addition to the parameters used in the kernel building job, we also need the following extra parameters.
9. ```reproducer_type``` - one of ```c``` or ```log``` indicating that the reproducer file is a C code snippet or a ```syz.log``` snippet containing ```syz``` DSL statements.
10. ```reproducer_text``` - the actual text content of the reproducer
11. ```syzkaller_checkout (default='master')``` - the exact ```commit-id``` of syzkaller that we need to checkout and build. By default it is the latest syzkaller version.
12. ```syzkaller_rollback (default=True)``` - When ```True``` instructs ```kGym``` to build syzkaller at the latest commit-id if it cannot build syzkaller at the specified commit-id.
13. ```nproc (default=8)``` - Create a pool of ```nproc``` processes, where each process runs the reproducer file independently (within a GCE instance).
14. `ninstance (default=5)` - The number of GCE instances to bring up inside the `kreproducer` VM to perform parallel execution across multiple instances.
15. `restart_time (default='10m')` - Continue to run (i.e. `restart`) the reproducer file up until `restart_time` minutes inside the GCE instance.
16. ```machine_type (default='gce:e2-standard-2')``` - The exact specification of the GCE instance.
A sample piece of code that runs a bug-reproduction job which includes a **kernel building** job as welll as a **kernel reproducer** job - i.e. this job builds the kernel and then runs the reproducer file on the built kernel for a specified amount of time.
```
import KBDr.kcomposer as kcomp
session = kcomp.KBDrSession(os.environ['KBDR_RUNNER_API_BASE_URL'])
workers, args, kv = kcomp.compose_bug_reproduction(kernel_git_url: str,
kernel_commit_id: str,
kernel_config: str,
userspace_img_name: str,
arch: str,
reproducer_type: str,
reproducer_text: str,
syzkaller_checkout: str='master',
syzkaller_rollback: bool=True,
compiler: str='gcc',
linker: str='ld',
patch: str='',
nproc: int=8,
ninstance: int=4,
restart_time: int='10m',
machine_type='gce:e2-standard-2')
session.create_job(workers, args, kv)
```