# vllm-plugin-FL

**Repository Path**: flagos-ai/vllm-plugin-FL

## Basic Information

- **Project Name**: vllm-plugin-FL
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-01-23
- **Last Updated**: 2026-01-23

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# vllm-FL
A vLLM plugin built on the FlagOS unified multi-chip backend.

## Quick Start

### Setup

0. Install vllm from the official [v0.13.0](https://github.com/vllm-project/vllm/tree/v0.13.0) (optional if the correct version is installed) or from the fork [vllm-FL](https://github.com/flagos-ai/vllm-FL).


1. Install [FlagGems](https://github.com/flagos-ai/FlagGems/blob/master/docs/getting-started.md#quick-installation)

    1.1 Install Build Dependencies

    ```sh
    pip install -U scikit-build-core==0.11 pybind11 ninja cmake
    ```

    1.2 Installation FlagGems

    ```sh
    git clone https://github.com/flagos-ai/FlagGems
    cd FlagGems
    pip install --no-build-isolation .
    # or editble install
    pip install --no-build-isolation -e .
    ```

3. Install [FlagCX](https://github.com/flagos-ai/FlagCX/blob/main/docs/getting_started.md#build-and-installation)

    2.1 Clone the repository:
    ```sh
    git clone https://github.com/flagos-ai/FlagCX.git
    git checkout -b v0.7.0
    git submodule update --init --recursive
    ```

    2.2 Build the library with different flags targeting to different platforms:
    ```sh
    make USE_NVIDIA=1
    ```

    2.3 Set environment
    ```sh
    export FLAGCX_PATH="$PWD"
    ```

    2.4 Installation FlagCX
    ```sh
    cd plugin/torch/
    python setup.py develop --adaptor nvidia/ascend
    ```

3. Install vllm-plugin-fl

    3.1 Clone the repository:

    ```sh
    git clone https://github.com/flagos-ai/vllm-plugin-FL
    ```

    3.2 install
    ```sh
    cd vllm-plugin-fl
    pip install --no-build-isolation .
    # or editble install
    pip install --no-build-isolation -e .
    ```

If there are multiple plugins in the current environment, you can specify use vllm-plugin-fl via VLLM_PLUGINS='fl'.

### Run a Task

#### Offline Batched Inference
With vLLM and vLLM-fl installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing). See the example script: [offline_inference](./examples/offline_inference.py). Or use blow python script directly.
```python
from vllm import LLM, SamplingParams
import torch
from vllm.config.compilation import CompilationConfig


if __name__ == '__main__':
    prompts = [
        "Hello, my name is",
    ]
    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=10, temperature=0.0)
    # Create an LLM.
    llm = LLM(model="Qwen/Qwen3-4B", max_num_batched_tokens=16384, max_num_seqs=2048)
    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

## Advanced use

### Using CudaCommunication library
If you want to use the original CudaCommunication, you can unset the following environment variables.
```sh
unset FLAGCX_PATH
```

### Using native CUDA operators
If you want to use the original CUDA operators, you can unset the following environment variables.
```sh
unset USE_FLAGGEMS
```