# transformers_bloom_parallel

**Repository Path**: mirrors_huggingface/transformers_bloom_parallel

## Basic Information

- **Project Name**: transformers_bloom_parallel
- **Description**: Techniques used to run BLOOM at inference in parallel
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-10-24
- **Last Updated**: 2026-06-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# BLOOM parallel test

## DIRTY solution

install `transformers` branch: `thomas/dirty_bloom_tp`

```
pip -e git+https://github.com/huggingface/transformers.git@thomas/add_custom_kernels#egg=transformers
```

Alternatively,
For the custom kernel:
```
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout thomas/add_custom_kernels
python setup.py build_ext --inplace # Might have to edit `setup.py` to remove the torch import
pip install -e .
```


### RUN

This will require `redis` to be installed on the machine.
Redis is the easiest way to communicate through pubsub to all the various processes without causing too much issues for NCCL 
or the webserver threading/circuit breaking model.

```
python -m torch.distributed.run --nproc_per_node=8 generate.py --name bigscience/bloom --max-input-tokens=1000 --save-path=/data/models/
```
```
python server.py
```


### USE

```
curl -X POST -d '{"inputs": "This is a test", "parameters": {"max_new_tokens": 20, "temperature": 0.4}}' http://localhost:8000/generate -H "content-type: application/json"
```