# Speech-AI-Forge
**Repository Path**: ai-aigc/Speech-AI-Forge
## Basic Information
- **Project Name**: Speech-AI-Forge
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: AGPL-3.0
- **Default Branch**: dev_tts_pipeline
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-10-14
- **Last Updated**: 2024-10-14
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[cn](./README.md) | [en](./README.en.md) | [Discord Server](https://discord.gg/9XnXUhAy3t)
# 🍦 ChatTTS-Forge
ChatTTS-Forge is a project developed around the TTS generation model ChatTTS, implementing an API Server and a Gradio-based WebUI.

You can experience and deploy ChatTTS-Forge through the following methods:
| - | Description | Link |
| ------------------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Online Demo** | Deployed on HuggingFace | [HuggingFace Spaces](https://huggingface.co/spaces/lenML/ChatTTS-Forge) |
| **One-Click Start** | Click the button to start Colab | [](https://colab.research.google.com/github/lenML/ChatTTS-Forge/blob/main/colab.en.ipynb) |
| **Container Deployment** | See the docker section | [Docker](#docker) |
| **Local Deployment** | See the environment preparation section | [Local Deployment](#InstallationandRunning) |
## 1. INDEX
- 1. [INDEX](#INDEX)
- 2. [GPU Memory Requirements](#GPUMemoryRequirements)
- 2.1. [Model Loading Memory Requirements](#ModelLoadingMemoryRequirements)
- 2.2. [Batch Size Memory Requirements](#BatchSizeMemoryRequirements)
- 3. [ Installation and Running](#InstallationandRunning)
- 3.1. [WebUI Features](#WebUIFeatures)
- 3.1. [`launch.py`: API Server](#launch.py:APIServer)
- 3.1.1. [How to link to SillyTavern?](#HowtolinktoSillyTavern)
- 4. [demo](#demo)
- 4.1. [风格化控制](#)
- 4.2. [长文本生成](#-1)
- 5. [Docker](#Docker)
- 5.1. [Image](#Image)
- 5.2. [Manual build](#Manualbuild)
- 6. [Roadmap](#Roadmap)
- 7. [FAQ](#FAQ)
- 7.1. [What are Prompt1 and Prompt2?](#WhatarePrompt1andPrompt2)
- 7.2. [What is Prefix?](#WhatisPrefix)
- 7.3. [What is the difference in Style with `_p`?](#WhatisthedifferenceinStylewith_p)
- 7.4. [Why is it slow when `--compile` is enabled?](#Whyisitslowwhen--compileisenabled)
- 7.5. [7.5. Why is Colab very slow with only 2 it/s?](#WhyisColabveryslowwithonly2its)
## 2. GPU Memory Requirements
### 2.1. Model Loading Memory Requirements
| Precision | ChatTTS Model | Enhancer Model |
| --------- | ------------- | -------------- |
| Full | 2GB | 3GB |
| Half | 1GB | 1.5GB |
Note: Half precision is the default setting. Full precision can be enabled using the `--no_half` parameter.
### 2.2. Inference Process Memory Requirements
| Precision | Batch Size | Without Enhancer | With Enhancer |
| --------- | ---------- | ---------------- | ------------- |
| Full | ≤ 4 | 2GB | 4GB |
| Full | 8 | 4-10GB | 6-14GB |
| Half | ≤ 4 | 1GB | 2GB |
| Half | 8 | 2-6GB | 4-8GB |
Important notes:
1. Memory requirements are context-dependent, hence presented as a range.
2. Half precision (default) typically requires about half the memory of full precision.
3. For Batch Size ≤ 4, 4GB of VRAM is usually sufficient for inference.
4. For Batch Size 8, 6-14GB of VRAM may be needed, depending on precision and Enhancer usage.
## 3. Installation and Running
1. Ensure that the [related dependencies](./docs/dependencies.md) are correctly installed.
2. Start the required services according to your needs.
- webui: `python webui.py`
- api: `python launch.py`
#### 3.1. WebUI Features
[Click here for a detailed introduction with images](./docs/webui_features.md)
- Native functions of ChatTTS model: Refiner/Generate
- Native Batch synthesis for efficient long text synthesis
- Style control
- SSML
- Editor: Simple SSML editing, used in conjunction with other features
- Splitter: Preprocessing for long text segmentation
- Podcast: Support for creating and editing podcast scripts
- Speaker
- Built-in voices: A variety of built-in speakers available
- Speaker creator: Supports voice testing and creation of new speakers
- Embedding: Supports uploading speaker embeddings to reuse saved speakers
- Speaker merge: Supports merging speakers and fine-tuning
- Prompt Slot
- Text normalization
- Audio quality enhancement:
- Enhance: Improves output quality
- Denoise: Removes noise
- Experimental features:
- fintune
- speaker embedding
- [WIP] GPT lora
- [WIP] AE
- [WIP] ASR
- [WIP] Inpainting
### 3.1. `launch.py`: API Server
Launch.py is the startup script for ChatTTS-Forge, used to configure and launch the API server.
Once the `launch.py` script has started successfully, you can check if the API is enabled at `/docs`.
[Detailed API documentation](./docs/api.md)
#### 3.1.1. How to link to SillyTavern?
Through the `/v1/xtts_v2` series API, you can easily connect ChatTTS-Forge to your SillyTavern.
Here's a simple configuration guide:
1. Open the plugin extension.
2. Open the `TTS` plugin configuration section.
3. Switch `TTS Provider` to `XTTSv2`.
4. Check `Enabled`.
5. Select/configure `Voice`.
6. **[Key Step]** Set the `Provider Endpoint` to `http://localhost:7870/v1/xtts_v2`.

## 4. demo
### 4.1. 风格化控制
input
```xml
下面是一个 ChatTTS 用于合成多角色多情感的有声书示例[lbreak]
黛玉冷笑道:[lbreak]
我说呢 [uv_break] ,亏了绊住,不然,早就飞起来了[lbreak]
宝玉道:[lbreak]
“只许和你玩 [uv_break] ,替你解闷。不过偶然到他那里,就说这些闲话。”[lbreak]
“好没意思的话![uv_break] 去不去,关我什么事儿? 又没叫你替我解闷儿 [uv_break],还许你不理我呢” [lbreak]
说着,便赌气回房去了 [lbreak]
```
output
[多角色.webm](https://github.com/lenML/ChatTTS-Forge/assets/37396659/82d91409-ad71-42ac-a4cd-d9c9340e3a07)
### 4.2. 长文本生成
input
中华美食,作为世界饮食文化的瑰宝,以其丰富的种类、独特的风味和精湛的烹饪技艺而闻名于世。中国地大物博,各地区的饮食习惯和烹饪方法各具特色,形成了独树一帜的美食体系。从北方的京鲁菜、东北菜,到南方的粤菜、闽菜,无不展现出中华美食的多样性。
在中华美食的世界里,五味调和,色香味俱全。无论是辣味浓郁的川菜,还是清淡鲜美的淮扬菜,都能够满足不同人的口味需求。除了味道上的独特,中华美食还注重色彩的搭配和形态的美感,让每一道菜品不仅是味觉的享受,更是一场视觉的盛宴。
中华美食不仅仅是食物,更是一种文化的传承。每一道菜背后都有着深厚的历史背景和文化故事。比如,北京的烤鸭,代表着皇家气派;而西安的羊肉泡馍,则体现了浓郁的地方风情。中华美食的精髓在于它追求的“天人合一”,讲究食材的自然性和烹饪过程中的和谐。
总之,中华美食博大精深,其丰富的口感和多样的烹饪技艺,构成了一个充满魅力和无限可能的美食世界。无论你来自哪里,都会被这独特的美食文化所吸引和感动。
output
[long_text_demo.webm](https://github.com/lenML/ChatTTS-Forge/assets/37396659/fe18b0f1-a85f-4255-8e25-3c953480b881)
## 5. Docker
### 5.1. Image
WIP
### 5.2. Manual build
download models
```bash
python -m scripts.download_models --source huggingface
```
- webui: `docker-compose -f ./docker-compose.webui.yml up -d`
- api: `docker-compose -f ./docker-compose.api.yml up -d`
Environment variable configuration
- webui: [.env.webui](./.env.webui)
- api: [.env.api](./.env.api)
## 6. Roadmap
WIP
## 7. FAQ
### 7.1. What are Prompt1 and Prompt2?
Prompt1 and Prompt2 are system prompts with different insertion points. The current model is very sensitive to the first [Stts] token, hence the need for two prompts.
- Prompt1 is inserted before the first [Stts].
- Prompt2 is inserted after the first [Stts].
### 7.2. What is Prefix?
The prefix is primarily used to control the model's generation capabilities, similar to the refine prompt in the official examples. This prefix should only contain special non-lexical tokens, such as `[laugh_0]`, `[oral_0]`, `[speed_0]`, `[break_0]`, etc.
### 7.3. What is the difference in Style with `_p`?
Styles with `_p` use both prompt and prefix, while those without `_p` use only the prefix.
### 7.4. Why is it slow when `--compile` is enabled?
Due to the lack of inference padding, any change in the inference shape may trigger torch to compile.
> It is currently not recommended to enable this.
### 7.5. 7.5. Why is Colab very slow with only 2 it/s?
Make sure you are using a GPU instead of a CPU.
- Click on the menu bar **[Edit]**
- Click **[Notebook settings]**
- Select **[Hardware accelerator]** => T4 GPU
# Contributing
To contribute, clone the repository, make your changes, commit and push to your clone, and submit a pull request.
# References
- ChatTTS: https://github.com/2noise/ChatTTS
- PaddleSpeech: https://github.com/PaddlePaddle/PaddleSpeech
- resemble-enhance: https://github.com/resemble-ai/resemble-enhance
- 默认说话人: https://github.com/2noise/ChatTTS/issues/238