# trl-jobs

**Repository Path**: mirrors_huggingface/trl-jobs

## Basic Information

- **Project Name**: trl-jobs
- **Description**: Train LLM on Hugging Face infra
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-11
- **Last Updated**: 2026-05-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 🏭 TRL Jobs

**TRL Jobs** is a simple wrapper around [Hugging Face Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs) that makes it easy to run [TRL](https://huggingface.co/docs/trl/) (Transformer Reinforcement Learning) workflows directly on 🤗 Hugging Face infrastructure.

Think of it as the quickest way to kick off **Supervised Fine-Tuning (SFT)** and more, without worrying about all the boilerplate setup.

## 📦 Installation

Get started with a single command:

```bash
pip install trl-jobs
```

## ⚡ Quick Start

Run your first supervised fine-tuning job in just one line:

```bash
trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara
```

The training is tracked with [Trackio](https://huggingface.co/docs/trackio/index) and the fine-tuned model is automatically pushed to the 🤗 Hub.

![trackio_sft](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trackio_sft.gif)
![trained_model_sft](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trained_model_sft.png)

## 🛠 Available Commands

Right now, **SFT (Supervised Fine-Tuning)** is supported. More workflows will be added soon!

### 🔹 SFT (Supervised Fine-Tuning)

```bash
trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara
```

#### Required arguments

* `--model_name` → Model to fine-tune (e.g. `Qwen/Qwen3-0.6B`)
* `--dataset_name` → Dataset to train on (e.g. `trl-lib/Capybara`)

#### Optional arguments

* `--peft` → Use [PEFT (LoRA)](https://huggingface.co/docs/peft) (default: `False`)
* `--flavor` → Hardware flavor (default: `a100-large`, only option for now)
* `--timeout` → Max runtime (`1h` by default). Supports `s`, `m`, `h`, `d`
* `-d, --detach` → Run in background and print job ID
* `--namespace` → Namespace where the job will run (default: your user namespace)
* `--token` → Hugging Face token (only needed if not logged in)

➡️ You can also pass **any arguments supported by `trl sft`**. E.g.

```bash
trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name trl-lib/Capybara --learning_rate 3e-5
```

For the full list, see the [TRL CLI docs](https://huggingface.co/docs/trl/en/clis).

#### Dataset format

SFT supports various 4 dataset formats.

* Standard language modeling

  ```python
  example = {"text": "The sky is blue."}
  ```

* Standard prompt-completion

  ```python
  example = {"prompt": "The sky is", "completion": " blue."}
  ```

* Conversationanl language modeling

    ```python
    example = {"messages": [
        {"role": "user", "content": "What color is the sky?"},
        {"role": "assistant", "content": "It is blue."}
    ]}
    ```

* Conversational prompt-completion

    ```python
    example = {"prompt": [{"role": "user", "content": "What color is the sky?"}],
               "completion": [{"role": "assistant", "content": "It is blue."}]}
    ```

> [!IMPORTANT]
> When using conversational dataset, ensure that the model has a chat template.

> [!NOTE]
> When using prompt-completion dataset, the loss is only computed on the completion part.

For more details, see the [TRL docs - Dataset formats](https://huggingface.co/docs/trl/en/dataset_formats#language-modeling).


## 📊 Supported Configurations

Here are some ready-to-go setups you can use out of the box.

### 🦙 Meta LLaMA 3

| Model | Max context length | Tokens / batch | Example command |
| --- | --- | --- | --- |
| [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 4096 | 262,144 | ```trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B --dataset_name ...``` |
| [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 4096 | 262,144 | `trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B-Instruct --dataset_name ...` |

### 🦙 Meta LLaMA 3 with PEFT

| Model | Max context length | Tokens / batch | Example command |
| --- | --- | --- | --- |
| [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) | 24,576 | 196,608 | `trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B --peft --dataset_name ...` |
| [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | 24,576 | 196,608 | `trl-jobs sft --model_name meta-llama/Meta-Llama-3-8B-Instruct --peft --dataset_name ...` |

### 🐧 Qwen3

| Model | Max context length | Tokens / batch | Example command |
| --- | --- | --- | --- |
| [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) | 32,768 | 65,536 | `trl-jobs sft --model_name Qwen/Qwen3-0.6B-Base --dataset_name ...` |
| [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 32,768 | 65,536 | `trl-jobs sft --model_name Qwen/Qwen3-0.6B --dataset_name ...` |
| [Qwen3-1.7B-Base](https://huggingface.co/Qwen/Qwen3-1.7B-Base) | 24,576 | 98,304 | `trl-jobs sft --model_name Qwen/Qwen3-1.7B-Base --dataset_name ...` |
| [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 24,576 | 98,304 | `trl-jobs sft --model_name Qwen/Qwen3-1.7B --dataset_name ...` |
| [Qwen3-4B-Base](https://huggingface.co/Qwen/Qwen3-4B-Base) | 20,480 | 163,840 | `trl-jobs sft --model_name Qwen/Qwen3-4B-Base --dataset_name ...` |
| [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | 20,480 | 163,840 | `trl-jobs sft --model_name Qwen/Qwen3-4B --dataset_name ...` |
| [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) | 4,096 | 262,144 | `trl-jobs sft --model_name Qwen/Qwen3-8B-Base --dataset_name ...` |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | 4,096 | 262,144 | `trl-jobs sft --model_name Qwen/Qwen3-8B --dataset_name ...` |

#### 🐧 Qwen3 with PEFT

| Model | Max context length | Tokens / batch | Example command |
| --- | --- | --- | --- |
| [Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) | 24,576 | 196,608 | `trl-jobs sft --model_name Qwen/Qwen3-8B-Base --peft --dataset_name ...` |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | 24,576 | 196,608 | `trl-jobs sft --model_name Qwen/Qwen3-8B --peft --dataset_name ...` |
| [Qwen3-14B-Base](https://huggingface.co/Qwen/Qwen3-14B-Base) | 20,480 | 163,840 | `trl-jobs sft --model_name Qwen/Qwen3-14B-Base --peft --dataset_name ...` |
| [Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) | 20,480 | 163,840 | `trl-jobs sft --model_name Qwen/Qwen3-14B --peft --dataset_name ...` |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | 4,096 | 131,072 | `trl-jobs sft --model_name Qwen/Qwen3-32B --peft --dataset_name ...` |

SmolLM3

| Model | Max context length | Tokens / batch | Example command |
| --- | --- | --- | --- |
| [HuggingFaceTB/SmolLM3-3B-Base](https://huggingface.co/HuggingFaceTB/SmolLM3-3B-Base) | 28,672 | 114,688 | `trl-jobs sft --model_name HuggingFaceTB/SmolLM3-3B --dataset_name ...` |
| [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) | 28,672 | 114,688 | `trl-jobs sft --model_name HuggingFaceTB/SmolLM3-3B --dataset_name ...` |

### 🤖 OpenAI GPT-OSS (with PEFT)

🚧 Coming soon!

### 💡 Want support for another model?

Open an issue or submit a PR—we’d love to hear from you!

## 🔑 Authentication

You’ll need a Hugging Face token to run jobs. You can provide it in any of these ways:

1. Login with `huggingface-cli login`
2. Set the environment variable `HF_TOKEN`
3. Pass it directly with `--token`

## 📜 License

This project is under the **MIT License**. See the [LICENSE](./LICENSE) file for details.

## 🤝 Contributing

We welcome contributions!
Please open an issue or a PR on GitHub.

Before committing, run formatting checks:

```bash
ruff check . --fix && ruff format . --line-length 119
```