# silero-models
**Repository Path**: nauch/silero-models
## Basic Information
- **Project Name**: silero-models
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-14
- **Last Updated**: 2025-11-14
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[](mailto:hello@silero.ai) [](https://t.me/silero_speech) [](https://github.com/snakers4/silero-models/blob/master/LICENSE)
[](https://badge.fury.io/py/silero)

- [Silero Models](#silero-models)
- [Installation and Basics](#installation-and-basics)
- [Text-To-Speech](#text-to-speech)
- [Models and Speakers](#models-and-speakers)
- [V5](#v5)
- [V4](#v4)
- [V3](#v3)
- [Dependencies](#dependencies)
- [PyTorch](#pytorch)
- [Standalone Use](#standalone-use)
- [SSML](#ssml)
- [Cyrillic languages v4](#cyrillic-languages-v4)
- [Indic languages v4](#indic-languages-v4)
- [Example](#example)
- [Supported languages](#supported-languages)
- [Contact](#contact)
- [Citations](#citations)
- [Further reading](#further-reading)
- [English](#english)
- [Chinese](#chinese)
- [Russian](#russian)
# Silero Models
Our TTS models satisfy the following criteria:
- Fully end-to-end;
- Large library of voices;
- Natural-sounding speech;
- One-line usage, minimal, portable;
- Impressively fast on CPU and GPU;
- For the Russian language - automated stress and homographs;
## Installation and Basics
You can basically use our models in 3 flavours:
- Via PyTorch Hub: `torch.hub.load()`;
- Via pip: `pip install silero` and then `from silero import silero_tts`;
- Via caching the required models and utils manually and modifying if necessary;
Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these [docs](https://pytorch.org/docs/stable/hub.html#loading-models-from-hub) for more information.
PyTorch Hub and pip package are based on the same code. All of the `torch.hub.load` examples can be used with the pip package via this basic change:
```python
from silero import silero_tts
model, example_text = silero_tts(language='ru',
speaker='v5_ru')
audio = model.apply_tts(text=example_text)
```
## Text-To-Speech
### Models and Speakers
All of the provided models are listed in the [models.yml](https://github.com/snakers4/silero-models/blob/master/models.yml) file. Any metadata and newer versions will be added there.
#### V5
V5 models support [SSML](https://github.com/snakers4/silero-models/wiki/SSML). Also see Colab examples for main SSML tag usage.
Russian-only models support automated stress and homographs.
| ID | Speakers | Auto-stress / Homographs | Language | SR | Colab |
| ------- | --------------------------------------------- | ----------- | -------------- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `v5_ru` | `aidar`, `baya`, `kseniya`, `xenia`, `eugene` | yes / yes | `ru` (Russian) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
#### V4
V4 models support [SSML](https://github.com/snakers4/silero-models/wiki/SSML). Also see Colab examples for main SSML tag usage.
V4 models: v4_ru, v4_cyrillic, v4_ua, v4_uz, v4_indic
| ID | Speakers |Auto-stress | Language | SR | Colab |
| ------------- | ----------- | ----------- |---------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `v4_ru` | `aidar`, `baya`, `kseniya`, `xenia`, `eugene`, `random` | yes | `ru` (Russian) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| [`v4_cyrillic`](#cyrillic-languages) | `b_ava`, `marat_tt`, `kalmyk_erdni`... | no | `cyrillic` [(Avar, Tatar, Kalmyk, ...)](#cyrillic-languages) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| `v4_ua` | `mykyta`, `random` | no | `ua` (Ukrainian) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| `v4_uz` | `dilnavoz` | no | `uz` (Uzbek) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| [`v4_indic`](#indic-languages) | `hindi_male`, `hindi_female`, ..., `random` | no | `indic` [(Hindi, Telugu, ...)](#indic-languages) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
#### V3
V3 models support [SSML](https://github.com/snakers4/silero-models/wiki/SSML). Also see Colab examples for main SSML tag usage.
V3 models: v3_en, v3_en_indic, v3_de, v3_es, v3_fr, v3_indic
| ID | Speakers |Auto-stress | Language | SR | Colab |
| ------------- | ----------- | ----------- |---------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `v3_en` | `en_0`, `en_1`, ..., `en_117`, `random` | no | `en` (English) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| `v3_en_indic` | `tamil_female`, ..., `assamese_male`, `random` | no | `en` (English) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| `v3_de` | `eva_k`, ..., `karlsson`, `random` | no | `de` (German) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| `v3_es` | `es_0`, `es_1`, `es_2`, `random` | no | `es` (Spanish) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| `v3_fr` | `fr_0`, ..., `fr_5`, `random` | no | `fr` (French) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
| [`v3_indic`](#indic-languages) | `hindi_male`, `hindi_female`, ..., `random` | no | `indic` [(Hindi, Telugu, ...)](#indic-languages) | `8000`, `24000`, `48000` | [](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
### Dependencies
Basic dependencies for Colab examples:
- `torch`, 1.10+ for v3 models/ 2.0+ for v4 and v5 models;
- `torchaudio`, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
- `omegaconf`, latest (can be removed as well, if you do not load all of the configs);
### PyTorch
[](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb)
[](https://pytorch.org/hub/snakers4_silero-models_tts/)
```python
# V5
import torch
language = 'ru'
model_id = 'v5_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.to(device) # gpu or cpu
audio = model.apply_tts(text=example_text,
speaker=speaker,
sample_rate=sample_rate)
```
### Standalone Use
- Standalone usage only requires PyTorch 1.12+ and the Python Standard Library;
- Please see the detailed examples in Colab;
```python
# V5
import os
import torch
device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'
if not os.path.isfile(local_file):
torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v5_ru.pt',
local_file)
model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)
example_text = 'Меня зовут Лева Королев. Я из готов. И я уже готов открыть все ваши замки любой сложности!'
sample_rate = 48000
speaker='baya'
audio_paths = model.save_wav(text=example_text,
speaker=speaker,
sample_rate=sample_rate)
```
### SSML
Check out our [TTS Wiki page.](https://github.com/snakers4/silero-models/wiki/SSML)
### Cyrillic languages v4
> To be superseded with v5 model(s) soon.
Supported tokenset:
`!,-.:?iµöабвгдежзийклмнопрстуфхцчшщъыьэюяёђѓєіјњћќўѳғҕҗҙқҡңҥҫүұҳҷһӏӑӓӕӗәӝӟӥӧөӱӳӵӹ `
| Speaker_ID | Language | Gender |
| ------------ | --------------- | ------ |
| b_ava | Avar | F |
| b_bashkir | Bashkir | M |
| b_bulb | Bulgarian | M |
| b_bulc | Bulgarian | M |
| b_che | Chechen | M |
| b_cv | Chuvash | M |
| cv_ekaterina | Chuvash | F |
| b_myv | Erzya | M |
| b_kalmyk | Kalmyk | M |
| b_krc | Karachay-Balkar | M |
| kz_M1 | Kazakh | M |
| kz_M2 | Kazakh | M |
| kz_F3 | Kazakh | F |
| kz_F1 | Kazakh | F |
| kz_F2 | Kazakh | F |
| b_kjh | Khakas | F |
| b_kpv | Komi-Ziryan | M |
| b_lez | Lezghian | M |
| b_mhr | Mari | F |
| b_mrj | Mari High | M |
| b_nog | Nogai | F |
| b_oss | Ossetic | M |
| b_ru | Russian | M |
| b_tat | Tatar | M |
| marat_tt | Tatar | M |
| b_tyv | Tuvinian | M |
| b_udm | Udmurt | M |
| b_uzb | Uzbek | M |
| b_sah | Yakut | M |
| kalmyk_erdni | Kalmyk | M |
| kalmyk_delghir | Kalmyk | F |
### Indic languages v4
#### Example
(!!!) All input sentences should be romanized to ISO format using [`aksharamukha`](https://aksharamukha.appspot.com/python). An example for `hindi`:
```python
# V3
import torch
from aksharamukha import transliterate
# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language='indic',
speaker='v4_indic')
orig_text = "प्रसिद्द कबीर अध्येता, पुरुषोत्तम अग्रवाल का यह शोध आलेख, उस रामानंद की खोज करता है"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)
audio = model.apply_tts(roman_text,
speaker='hindi_male')
```
#### Supported languages
| Language | Speakers | Romanization function
-- | -- | --
hindi | `hindi_female`, `hindi_male` | `transliterate.process('Devanagari', 'ISO', orig_text)`
malayalam | `malayalam_female`, `malayalam_male` |`transliterate.process('Malayalam', 'ISO', orig_text)`
manipuri | `manipuri_female` |`transliterate.process('Bengali', 'ISO', orig_text)`
bengali | `bengali_female`, `bengali_male` | `transliterate.process('Bengali', 'ISO', orig_text)`
rajasthani | `rajasthani_female`, `rajasthani_female` | `transliterate.process('Devanagari', 'ISO', orig_text)`
tamil | `tamil_female`, `tamil_male` |`transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe'])`
telugu | `telugu_female`, `telugu_male` | `transliterate.process('Telugu', 'ISO', orig_text)`
gujarati | `gujarati_female`, `gujarati_male` | `transliterate.process('Gujarati', 'ISO', orig_text)`
kannada | `kannada_female`, `kannada_male` |`transliterate.process('Kannada', 'ISO', orig_text)`
## Contact
Try our models, create an [issue](https://github.com/snakers4/silero-models/issues/new), join our [chat](https://t.me/silero_speech), [email](mailto:hello@silero.ai) us, and read the latest [news](https://t.me/silero_news).
## Citations
```bibtex
@misc{Silero Models,
author = {Silero Team},
title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/snakers4/silero-models}},
commit = {insert_some_commit_here},
email = {hello@silero.ai}
}
```
## Further reading
### English
- STT:
- Towards an Imagenet Moment For Speech-To-Text - [link](https://thegradient.pub/towards-an-imagenet-moment-for-speech-to-text/)
- A Speech-To-Text Practitioners Criticisms of Industry and Academia - [link](https://thegradient.pub/a-speech-to-text-practitioners-criticisms-of-industry-and-academia/)
- Modern Google-level STT Models Released - [link](https://habr.com/ru/post/519562/)
- TTS:
- Multilingual Text-to-Speech Models for Indic Languages - [link](https://www.analyticsvidhya.com/blog/2022/06/multilingual-text-to-speech-models-for-indic-languages/)
- Our new public speech synthesis in super-high quality, 10x faster and more stable - [link](https://habr.com/ru/post/660571/)
- High-Quality Text-to-Speech Made Accessible, Simple and Fast - [link](https://habr.com/ru/post/549482/)
- VAD:
- One Voice Detector to Rule Them All - [link](https://thegradient.pub/one-voice-detector-to-rule-them-all/)
- Modern Portable Voice Activity Detector Released - [link](https://habr.com/ru/post/537276/)
- Text Enhancement:
- We have published a model for text repunctuation and recapitalization for four languages - [link](https://habr.com/ru/post/581960/)
### Chinese
- STT:
- 迈向语音识别领域的 ImageNet 时刻 - [link](https://www.infoq.cn/article/4u58WcFCs0RdpoXev1E2)
- 语音领域学术界和工业界的七宗罪 - [link](https://www.infoq.cn/article/lEe6GCRjF1CNToVITvNw)
### Russian
- STT
- OpenAI решили распознавание речи! Разбираемся так ли это … - [link](https://habr.com/ru/post/689572/)
- Наши сервисы для бесплатного распознавания речи стали лучше и удобнее - [link](https://habr.com/ru/post/654227/)
- Telegram-бот Silero бесплатно переводит речь в текст - [link](https://habr.com/ru/post/591563/)
- Бесплатное распознавание речи для всех желающих - [link](https://habr.com/ru/post/587512/)
- Последние обновления моделей распознавания речи из Silero Models - [link](https://habr.com/ru/post/577630/)
- Сжимаем трансформеры: простые, универсальные и прикладные способы cделать их компактными и быстрыми - [link](https://habr.com/ru/post/563778/)
- Ультимативное сравнение систем распознавания речи: Ashmanov, Google, Sber, Silero, Tinkoff, Yandex - [link](https://habr.com/ru/post/559640/)
- Мы опубликовали современные STT модели сравнимые по качеству с Google - [link](https://habr.com/ru/post/519564/)
- Понижаем барьеры на вход в распознавание речи - [link](https://habr.com/ru/post/494006/)
- Огромный открытый датасет русской речи версия 1.0 - [link](https://habr.com/ru/post/474462/)
- Насколько Быстрой Можно Сделать Систему STT? - [link](https://habr.com/ru/post/531524/)
- Наша система Speech-To-Text - [link](https://www.silero.ai/tag/our-speech-to-text/)
- Speech-To-Text - [link](https://www.silero.ai/tag/speech-to-text/)
- TTS:
- Делаем быстрый, качественный и доступный синтез на языках России — нужно ваше участие - [link](https://habr.com/ru/articles/872474/)
- Мы решили задачу омографов и ударений в русском языке - [link](https://habr.com/ru/articles/955130/)
- Теперь наш синтез также доступен в виде бота в Телеграме - [link](https://habr.com/ru/post/682188/)
- Может ли синтез речи обмануть систему биометрической идентификации? - [link](https://habr.com/ru/post/673996/)
- Теперь наш синтез на 20 языках - [link](https://habr.com/ru/post/669910/)
- Теперь наш публичный синтез в супер-высоком качестве, в 10 раз быстрее и без детских болячек - [link](https://habr.com/ru/post/660565/)
- Синтезируем голос бабушки, дедушки и Ленина + новости нашего публичного синтеза - [link](https://habr.com/ru/post/584750/)
- Мы сделали наш публичный синтез речи еще лучше - [link](https://habr.com/ru/post/563484/)
- Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - [link](https://habr.com/ru/post/549480/)
- VAD:
- Новый релиз публичного детектора голоса Silero VAD v6 - [link](https://habr.com/ru/articles/940750/)
- Наш публичный детектор голоса стал лучше - [link](https://habr.com/ru/post/695738/)
- А ты используешь VAD? Что это такое и зачем он нужен - [link](https://habr.com/ru/post/594745/)
- Модели для Детекции Речи, Чисел и Распознавания Языков - [link](https://www.silero.ai/vad-lang-classifier-number-detector/)
- Мы опубликовали современный Voice Activity Detector и не только -[link](https://habr.com/ru/post/537274/)
- Text Enhancement:
- Восстановление знаков пунктуации и заглавных букв — теперь и на длинных текстах - [link](https://habr.com/ru/post/594565/)
- Мы опубликовали модель, расставляющую знаки препинания и заглавные буквы в тексте на четырех языках - [link](https://habr.com/ru/post/581946/)