# kitten-tts

**Repository Path**: liuhuayun/kitten-tts

## Basic Information

- **Project Name**: kitten-tts
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-05
- **Last Updated**: 2025-12-05

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# KittenTTS Server

A FastAPI-based TTS (Text-to-Speech) server that provides OpenAI-compatible API endpoints using [KittenTTS](https://github.com/KittenML/KittenTTS). This server can be easily integrated with Open WebUI and other applications that support OpenAI's TTS API format.

**note: you will need to have KittenTTS separately installed on your system**

## Features

- 🔌 OpenAI-compatible TTS API endpoints
- 🗣️ Multiple voice options with voice mapping
- ⚡ Fast and efficient speech synthesis using KittenTTS
- ⚡ GPU Accelleration for Apple Silicon and Cuda
- 🎛️ Configurable speech speed (0.25x to 4.0x)
- 📊 Health check and model status endpoints
- 🔧 Easy integration with Open WebUI

## Quick Start


### Prerequisites

- Python 3.8 or higher
- pip package manager

### Installation

1. **Clone the repository**
   ```bash
   git clone https://github.com/drivenfast/kitten-tts-server
   cd kittentts-server
   ```

2. **Create and activate a virtual environment**
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

4. **Install KittenTTS**
pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl
   ```bash
   # Follow KittenTTS installation instructions from their repository
   # This typically involves installing from source or using their provided wheels
   ```

5. **Start the server**
   ```bash
   python server.py
   ```

   Or use the startup script:
   ```bash
   chmod +x start_server.sh
   ./start_server.sh
   ```

The server will be available at `http://localhost:8001`

## API Endpoints

### Generate Speech
```bash
POST /v1/audio/speech
```

**Request Body:**
```json
{
  "model": "tts-1-hd",
  "input": "Hello, this is a test of the KittenTTS server!",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 2.0
}
```

**Example using curl:**
```bash
curl -X POST "http://localhost:8001/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "input": "Hello world!",
    "voice": "alloy",
    "speed": 1.0
  }' \
  --output speech.wav
```

### List Models
```bash
GET /v1/models
```

### List Voices
```bash
GET /v1/audio/voices
```

### Health Check
```bash
GET /health
```

## Voice Mapping

The server maps OpenAI-compatible voice names to KittenTTS voices:

| OpenAI Voice | KittenTTS Voice | Description |
|--------------|-----------------|-------------|
| alloy        | expr-voice-5-m  | Male voice  |
| echo         | expr-voice-2-m  | Male voice  |
| fable        | expr-voice-3-f  | Female voice|
| onyx         | expr-voice-4-m  | Male voice  |
| nova         | expr-voice-5-f  | Female voice|
| shimmer      | expr-voice-2-f  | Female voice|

## Integration with Open WebUI

1. **Start the KittenTTS server:**
   ```bash
   python server.py
   ```

2. **Configure Open WebUI:**
   - Go to Settings → Audio
   - Set TTS Engine to "OpenAI"
   - Set API Base URL to: `http://localhost:8001/v1`
   - Leave API Key empty (not required)
   - Input one of the voices mapped to OpenAI Voice (e.g. shimmer) in the TTS Voice Field
   - Leave TTS model field as tts-1-hd

3. **Test the integration:**
   - Try using TTS in Open WebUI chat
   - The server logs will show generation requests

## Configuration

### Environment Variables

- `KITTENTTS_HOST`: Server host (default: "0.0.0.0")
- `KITTENTTS_PORT`: Server port (default: 8001)
- `KITTENTTS_LOG_LEVEL`: Logging level (default: "info")
- `KITTENTTS_USE_GPU`: Enable GPU acceleration (default: "true")
- `KITTENTTS_GPU_PROVIDER`: GPU provider preference (default: "auto")
- `KITTENTTS_ONNX_THREADS`: ONNX Runtime threads (default: 0 = auto)

### GPU Acceleration

The server automatically detects and uses GPU acceleration when available:

**Apple Silicon (M1/M2/M3/M4):**
- Uses CoreML execution provider for GPU/Neural Engine acceleration
- Automatically enabled on macOS with Apple Silicon

**NVIDIA CUDA:**
- Uses CUDA execution provider when CUDA is available
- Requires CUDA runtime and ONNX Runtime GPU package

**Intel/AMD Systems:**
- Falls back to CPU execution with optimized threading
- Can use OpenVINO if available

**Configuration Options:**
```bash
# Enable/disable GPU acceleration
export KITTENTTS_USE_GPU=true

# Force specific provider (auto, coreml, cuda, cpu)
export KITTENTTS_GPU_PROVIDER=auto

# Set number of CPU threads (0 = auto-detect)
export KITTENTTS_ONNX_THREADS=4
```

**Check GPU Status:**
```bash
curl http://localhost:8001/gpu/status
```

### Custom Configuration

You can modify the voice mapping and other settings by editing the `config.py` file.

## Docker Support

### Build and Run with Docker

```bash
# Build the Docker image
docker build -t kittentts-server .

# Run the container
docker run -p 8001:8001 kittentts-server
```

### Using Docker Compose

```bash
docker-compose up -d
```

## Development

### Project Structure

```
kittentts-server/
├── server.py              # Main FastAPI server
├── config.py              # Configuration settings
├── requirements.txt       # Python dependencies
├── start_server.sh        # Startup script
├── Dockerfile             # Docker configuration
├── docker-compose.yml     # Docker Compose configuration
├── tests/                 # Test files
│   ├── test_api.py
│   └── test_integration.py
└── docs/                  # Additional documentation
    ├── api.md
    └── deployment.md
```

### Running Tests

```bash
# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Run with coverage
pytest tests/ --cov=. --cov-report=html
```

### Code Formatting

```bash
# Format code with black
black .

# Sort imports
isort .

# Lint with flake8
flake8 .
```

## Troubleshooting

### Common Issues

1. **KittenTTS not found:**
   - Ensure KittenTTS is properly installed in your environment
   - Check that all dependencies are installed

2. **Audio format issues:**
   - The server currently supports WAV and MP3 formats
   - MP3 support may require additional audio codecs

3. **Port already in use:**
   - Change the port in `config.py` or set the `KITTENTTS_PORT` environment variable

### Logs

Server logs are output to the console. For production deployments, consider using a proper logging configuration.

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## Acknowledgments

- [KittenTTS](https://github.com/KittenML/KittenTTS) for the excellent TTS engine
- [FastAPI](https://fastapi.tiangolo.com/) for the web framework
- [Open WebUI](https://github.com/open-webui/open-webui) for TTS integration support

---

**Note:** This server requires KittenTTS to be installed separately. Please refer to the KittenTTS documentation for installation instructions specific to your system.