# pocket-tts **Repository Path**: huakey430/pocket-tts ## Basic Information - **Project Name**: pocket-tts - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-20 - **Last Updated**: 2026-01-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Pocket TTS pocket-tts-logo-v2-transparent A lightweight text-to-speech (TTS) application designed to run efficiently on CPUs. Forget about the hassle of using GPUs and web APIs serving TTS models. With Kyutai's Pocket TTS, generating audio is just a pip install and a function call away. Supports Python 3.10, 3.11, 3.12, 3.13 and 3.14. Requires PyTorch 2.5+. Does not require the gpu version of PyTorch. [🔊 Demo](https://kyutai.org/pocket-tts) | [🐱‍💻GitHub Repository](https://github.com/kyutai-labs/pocket-tts) | [🤗 Hugging Face Model Card](https://huggingface.co/kyutai/pocket-tts) | [⚙️ Tech report](https://kyutai.org/blog/2026-01-13-pocket-tts) | [📄 Paper](https://arxiv.org/abs/2509.06926) | [📚 Documentation](https://github.com/kyutai-labs/pocket-tts/tree/main/docs) ## Main takeaways * Runs on CPU * Small model size, 100M parameters * Audio streaming * Low latency, ~200ms to get the first audio chunk * Faster than real-time, ~6x real-time on a CPU of MacBook Air M4 * Uses only 2 CPU cores * Python API and CLI * Voice cloning * English only at the moment * Can handle infinitely long text inputs ## Trying it from the website, without installing anything Navigate to the [Kyutai website](https://kyutai.org/pocket-tts) to try it out directly in your browser. You can input text, select different voices, and generate speech without any installation. ## Trying it with the CLI ### The `generate` command You can use pocket-tts directly from the command line. We recommend using `uv` as it installs any dependencies on the fly in an isolated environment (uv installation instructions [here](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer)). You can also use `pip install pocket-tts` to install it manually. This will generate a wav file `./tts_output.wav` saying the default text with the default voice, and display some speed statistics. ```bash uvx pocket-tts generate # or if you installed it manually with pip: pocket-tts generate ``` Modify the voice with `--voice` and the text with `--text`. We provide a small catalog of voices. You can take a look at [this page](https://huggingface.co/kyutai/tts-voices) which details the licenses for each voice. * [alba](https://huggingface.co/kyutai/tts-voices/blob/main/alba-mackenna/casual.wav) * [marius](https://huggingface.co/kyutai/tts-voices/blob/main/voice-donations/Selfie.wav) * [javert](https://huggingface.co/kyutai/tts-voices/blob/main/voice-donations/Butter.wav) * [jean](https://huggingface.co/kyutai/tts-voices/blob/main/ears/p010/freeform_speech_01.wav) * [fantine](https://huggingface.co/kyutai/tts-voices/blob/main/vctk/p244_023.wav) * [cosette](https://huggingface.co/kyutai/tts-voices/blob/main/expresso/ex04-ex02_confused_001_channel1_499s.wav) * [eponine](https://huggingface.co/kyutai/tts-voices/blob/main/vctk/p262_023.wav) * [azelma](https://huggingface.co/kyutai/tts-voices/blob/main/vctk/p303_023.wav) The `--voice` argument can also take a plain wav file as input for voice cloning. You can use your own or check out our [voice repository](https://huggingface.co/kyutai/tts-voices). Feel free to check out the [generate documentation](https://github.com/kyutai-labs/pocket-tts/tree/main/docs/generate.md) for more details and examples. For trying multiple voices and prompts quickly, prefer using the `serve` command. ### The `serve` command You can also run a local server to generate audio via HTTP requests. ```bash uvx pocket-tts serve # or if you installed it manually with pip: pocket-tts serve ``` Navigate to `http://localhost:8000` to try the web interface, it's faster than the command line as the model is kept in memory between requests. You can check out the [serve documentation](https://github.com/kyutai-labs/pocket-tts/tree/main/docs/serve.md) for more details and examples. ## Using it as a Python library You can try out the Python library on Colab [here](https://colab.research.google.com/github/kyutai-labs/pocket-tts/blob/main/docs/pocket-tts-example.ipynb). Install the package with ```bash pip install pocket-tts # or uv add pocket-tts ``` You can use this package as a simple Python library to generate audio from text. ```python from pocket_tts import TTSModel import scipy.io.wavfile tts_model = TTSModel.load_model() voice_state = tts_model.get_state_for_audio_prompt( "alba" # One of the pre-made voices, see above # You can also use any voice file you have locally or from Hugging Face: # "./some_audio.wav" # or "hf://kyutai/tts-voices/expresso/ex01-ex02_default_001_channel2_198s.wav" ) audio = tts_model.generate_audio(voice_state, "Hello world, this is a test.") # Audio is a 1D torch tensor containing PCM data. scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy()) ``` You can have multiple voice states around if you have multiple voices you want to use. `load_model()` and `get_state_for_audio_prompt()` are relatively slow operations, so we recommend to keep the model and voice states in memory if you can. You can check out the [Python API documentation](https://github.com/kyutai-labs/pocket-tts/tree/main/docs/python-api.md) for more details and examples. ## Unsupported features At the moment, we do not support (but would love pull requests adding): - [Running the TTS inside a web browser (WebAssembly)](https://github.com/kyutai-labs/pocket-tts/issues/1) - [A compiled version with for example `torch.compile()` or `candle`.](https://github.com/kyutai-labs/pocket-tts/issues/2) - [Adding silence in the text input to generate pauses.](https://github.com/kyutai-labs/pocket-tts/issues/6) - [Quantization to run the computation in int8.](https://github.com/kyutai-labs/pocket-tts/issues/7) We tried running this TTS model on the GPU but did not observe a speedup compared to CPU execution, notably because we use a batch size of 1 and a very small model. ## Development and local setup We accept contributions! Feel free to open issues or pull requests on GitHub. You can find development instructions in the [CONTRIBUTING.md](https://github.com/kyutai-labs/pocket-tts/tree/main/CONTRIBUTING.md) file. You'll also find there how to have an editable install of the package for local development. ## Alternative implementations - [babybirdprd/pocket-tts](https://github.com/babybirdprd/pocket-tts) - Candle version (Rust) with WebAssembly and PyO3 bindings. Can run in the browser! ## Projects using pocket-tts - [lukasmwerner/pocket-reader](https://github.com/lukasmwerner/pocket-reader) - Browser screen reader - [ikidd/pocket-tts-wyoming](https://github.com/ikidd/pocket-tts-wyoming) - Docker container for pocket-tts using Wyoming protocol, ready for Home Assistant Voice use. ## Prohibited use Use of our model must comply with all applicable laws and regulations and must not result in, involve, or facilitate any illegal, harmful, deceptive, fraudulent, or unauthorized activity. Prohibited uses include, without limitation, voice impersonation or cloning without explicit and lawful consent; misinformation, disinformation, or deception (including fake news, fraudulent calls, or presenting generated content as genuine recordings of real people or events); and the generation of unlawful, harmful, libelous, abusive, harassing, discriminatory, hateful, or privacy-invasive content. We disclaim all liability for any non-compliant use. ## Authors Manu Orsini*, Simon Rouard*, Gabriel De Marmiesse*, Václav Volhejn, Neil Zeghidour, Alexandre Défossez *equal contribution