# silero-models **Repository Path**: nauch/silero-models ## Basic Information - **Project Name**: silero-models - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-14 - **Last Updated**: 2025-11-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [![Mailing list : test](http://img.shields.io/badge/Email-gray.svg?style=for-the-badge&logo=gmail)](mailto:hello@silero.ai) [![Mailing list : test](http://img.shields.io/badge/Telegram-blue.svg?style=for-the-badge&logo=telegram)](https://t.me/silero_speech) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg?style=for-the-badge)](https://github.com/snakers4/silero-models/blob/master/LICENSE) [![PyPI version](https://badge.fury.io/py/silero.svg)](https://badge.fury.io/py/silero) ![header](https://user-images.githubusercontent.com/12515440/89997349-b3523080-dc94-11ea-9906-ca2e8bc50535.png) - [Silero Models](#silero-models) - [Installation and Basics](#installation-and-basics) - [Text-To-Speech](#text-to-speech) - [Models and Speakers](#models-and-speakers) - [V5](#v5) - [V4](#v4) - [V3](#v3) - [Dependencies](#dependencies) - [PyTorch](#pytorch) - [Standalone Use](#standalone-use) - [SSML](#ssml) - [Cyrillic languages v4](#cyrillic-languages-v4) - [Indic languages v4](#indic-languages-v4) - [Example](#example) - [Supported languages](#supported-languages) - [Contact](#contact) - [Citations](#citations) - [Further reading](#further-reading) - [English](#english) - [Chinese](#chinese) - [Russian](#russian) # Silero Models Our TTS models satisfy the following criteria: - Fully end-to-end; - Large library of voices; - Natural-sounding speech; - One-line usage, minimal, portable; - Impressively fast on CPU and GPU; - For the Russian language - automated stress and homographs; ## Installation and Basics You can basically use our models in 3 flavours: - Via PyTorch Hub: `torch.hub.load()`; - Via pip: `pip install silero` and then `from silero import silero_tts`; - Via caching the required models and utils manually and modifying if necessary; Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these [docs](https://pytorch.org/docs/stable/hub.html#loading-models-from-hub) for more information. PyTorch Hub and pip package are based on the same code. All of the `torch.hub.load` examples can be used with the pip package via this basic change: ```python from silero import silero_tts model, example_text = silero_tts(language='ru', speaker='v5_ru') audio = model.apply_tts(text=example_text) ``` ## Text-To-Speech ### Models and Speakers All of the provided models are listed in the [models.yml](https://github.com/snakers4/silero-models/blob/master/models.yml) file. Any metadata and newer versions will be added there. #### V5 V5 models support [SSML](https://github.com/snakers4/silero-models/wiki/SSML). Also see Colab examples for main SSML tag usage. Russian-only models support automated stress and homographs. | ID | Speakers | Auto-stress / Homographs | Language | SR | Colab | | ------- | --------------------------------------------- | ----------- | -------------- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `v5_ru` | `aidar`, `baya`, `kseniya`, `xenia`, `eugene` | yes / yes | `ru` (Russian) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | #### V4 V4 models support [SSML](https://github.com/snakers4/silero-models/wiki/SSML). Also see Colab examples for main SSML tag usage.
V4 models: v4_ru, v4_cyrillic, v4_ua, v4_uz, v4_indic | ID | Speakers |Auto-stress | Language | SR | Colab | | ------------- | ----------- | ----------- |---------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `v4_ru` | `aidar`, `baya`, `kseniya`, `xenia`, `eugene`, `random` | yes | `ru` (Russian) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | [`v4_cyrillic`](#cyrillic-languages) | `b_ava`, `marat_tt`, `kalmyk_erdni`... | no | `cyrillic` [(Avar, Tatar, Kalmyk, ...)](#cyrillic-languages) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | `v4_ua` | `mykyta`, `random` | no | `ua` (Ukrainian) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | `v4_uz` | `dilnavoz` | no | `uz` (Uzbek) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | [`v4_indic`](#indic-languages) | `hindi_male`, `hindi_female`, ..., `random` | no | `indic` [(Hindi, Telugu, ...)](#indic-languages) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
#### V3 V3 models support [SSML](https://github.com/snakers4/silero-models/wiki/SSML). Also see Colab examples for main SSML tag usage.
V3 models: v3_en, v3_en_indic, v3_de, v3_es, v3_fr, v3_indic | ID | Speakers |Auto-stress | Language | SR | Colab | | ------------- | ----------- | ----------- |---------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `v3_en` | `en_0`, `en_1`, ..., `en_117`, `random` | no | `en` (English) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | `v3_en_indic` | `tamil_female`, ..., `assamese_male`, `random` | no | `en` (English) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | `v3_de` | `eva_k`, ..., `karlsson`, `random` | no | `de` (German) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | `v3_es` | `es_0`, `es_1`, `es_2`, `random` | no | `es` (Spanish) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | `v3_fr` | `fr_0`, ..., `fr_5`, `random` | no | `fr` (French) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) | | [`v3_indic`](#indic-languages) | `hindi_male`, `hindi_female`, ..., `random` | no | `indic` [(Hindi, Telugu, ...)](#indic-languages) | `8000`, `24000`, `48000` | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) |
### Dependencies Basic dependencies for Colab examples: - `torch`, 1.10+ for v3 models/ 2.0+ for v4 and v5 models; - `torchaudio`, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work); - `omegaconf`, latest (can be removed as well, if you do not load all of the configs); ### PyTorch [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb) [![Open on Torch Hub](https://img.shields.io/badge/Torch-Hub-red?logo=pytorch&style=for-the-badge)](https://pytorch.org/hub/snakers4_silero-models_tts/) ```python # V5 import torch language = 'ru' model_id = 'v5_ru' sample_rate = 48000 speaker = 'xenia' device = torch.device('cpu') model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', language=language, speaker=model_id) model.to(device) # gpu or cpu audio = model.apply_tts(text=example_text, speaker=speaker, sample_rate=sample_rate) ``` ### Standalone Use - Standalone usage only requires PyTorch 1.12+ and the Python Standard Library; - Please see the detailed examples in Colab; ```python # V5 import os import torch device = torch.device('cpu') torch.set_num_threads(4) local_file = 'model.pt' if not os.path.isfile(local_file): torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v5_ru.pt', local_file) model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model") model.to(device) example_text = 'Меня зовут Лева Королев. Я из готов. И я уже готов открыть все ваши замки любой сложности!' sample_rate = 48000 speaker='baya' audio_paths = model.save_wav(text=example_text, speaker=speaker, sample_rate=sample_rate) ``` ### SSML Check out our [TTS Wiki page.](https://github.com/snakers4/silero-models/wiki/SSML) ### Cyrillic languages v4 > To be superseded with v5 model(s) soon. Supported tokenset: `!,-.:?iµöабвгдежзийклмнопрстуфхцчшщъыьэюяёђѓєіјњћќўѳғҕҗҙқҡңҥҫүұҳҷһӏӑӓӕӗәӝӟӥӧөӱӳӵӹ ` | Speaker_ID | Language | Gender | | ------------ | --------------- | ------ | | b_ava | Avar | F | | b_bashkir | Bashkir | M | | b_bulb | Bulgarian | M | | b_bulc | Bulgarian | M | | b_che | Chechen | M | | b_cv | Chuvash | M | | cv_ekaterina | Chuvash | F | | b_myv | Erzya | M | | b_kalmyk | Kalmyk | M | | b_krc | Karachay-Balkar | M | | kz_M1 | Kazakh | M | | kz_M2 | Kazakh | M | | kz_F3 | Kazakh | F | | kz_F1 | Kazakh | F | | kz_F2 | Kazakh | F | | b_kjh | Khakas | F | | b_kpv | Komi-Ziryan | M | | b_lez | Lezghian | M | | b_mhr | Mari | F | | b_mrj | Mari High | M | | b_nog | Nogai | F | | b_oss | Ossetic | M | | b_ru | Russian | M | | b_tat | Tatar | M | | marat_tt | Tatar | M | | b_tyv | Tuvinian | M | | b_udm | Udmurt | M | | b_uzb | Uzbek | M | | b_sah | Yakut | M | | kalmyk_erdni | Kalmyk | M | | kalmyk_delghir | Kalmyk | F | ### Indic languages v4 #### Example (!!!) All input sentences should be romanized to ISO format using [`aksharamukha`](https://aksharamukha.appspot.com/python). An example for `hindi`: ```python # V3 import torch from aksharamukha import transliterate # Loading model model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_tts', language='indic', speaker='v4_indic') orig_text = "प्रसिद्द कबीर अध्येता, पुरुषोत्तम अग्रवाल का यह शोध आलेख, उस रामानंद की खोज करता है" roman_text = transliterate.process('Devanagari', 'ISO', orig_text) print(roman_text) audio = model.apply_tts(roman_text, speaker='hindi_male') ``` #### Supported languages | Language | Speakers | Romanization function -- | -- | -- hindi | `hindi_female`, `hindi_male` | `transliterate.process('Devanagari', 'ISO', orig_text)` malayalam | `malayalam_female`, `malayalam_male` |`transliterate.process('Malayalam', 'ISO', orig_text)` manipuri | `manipuri_female` |`transliterate.process('Bengali', 'ISO', orig_text)` bengali | `bengali_female`, `bengali_male` | `transliterate.process('Bengali', 'ISO', orig_text)` rajasthani | `rajasthani_female`, `rajasthani_female` | `transliterate.process('Devanagari', 'ISO', orig_text)` tamil | `tamil_female`, `tamil_male` |`transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe'])` telugu | `telugu_female`, `telugu_male` | `transliterate.process('Telugu', 'ISO', orig_text)` gujarati | `gujarati_female`, `gujarati_male` | `transliterate.process('Gujarati', 'ISO', orig_text)` kannada | `kannada_female`, `kannada_male` |`transliterate.process('Kannada', 'ISO', orig_text)` ## Contact Try our models, create an [issue](https://github.com/snakers4/silero-models/issues/new), join our [chat](https://t.me/silero_speech), [email](mailto:hello@silero.ai) us, and read the latest [news](https://t.me/silero_news). ## Citations ```bibtex @misc{Silero Models, author = {Silero Team}, title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/snakers4/silero-models}}, commit = {insert_some_commit_here}, email = {hello@silero.ai} } ``` ## Further reading ### English - STT: - Towards an Imagenet Moment For Speech-To-Text - [link](https://thegradient.pub/towards-an-imagenet-moment-for-speech-to-text/) - A Speech-To-Text Practitioners Criticisms of Industry and Academia - [link](https://thegradient.pub/a-speech-to-text-practitioners-criticisms-of-industry-and-academia/) - Modern Google-level STT Models Released - [link](https://habr.com/ru/post/519562/) - TTS: - Multilingual Text-to-Speech Models for Indic Languages - [link](https://www.analyticsvidhya.com/blog/2022/06/multilingual-text-to-speech-models-for-indic-languages/) - Our new public speech synthesis in super-high quality, 10x faster and more stable - [link](https://habr.com/ru/post/660571/) - High-Quality Text-to-Speech Made Accessible, Simple and Fast - [link](https://habr.com/ru/post/549482/) - VAD: - One Voice Detector to Rule Them All - [link](https://thegradient.pub/one-voice-detector-to-rule-them-all/) - Modern Portable Voice Activity Detector Released - [link](https://habr.com/ru/post/537276/) - Text Enhancement: - We have published a model for text repunctuation and recapitalization for four languages - [link](https://habr.com/ru/post/581960/) ### Chinese - STT: - 迈向语音识别领域的 ImageNet 时刻 - [link](https://www.infoq.cn/article/4u58WcFCs0RdpoXev1E2) - 语音领域学术界和工业界的七宗罪 - [link](https://www.infoq.cn/article/lEe6GCRjF1CNToVITvNw) ### Russian - STT - OpenAI решили распознавание речи! Разбираемся так ли это … - [link](https://habr.com/ru/post/689572/) - Наши сервисы для бесплатного распознавания речи стали лучше и удобнее - [link](https://habr.com/ru/post/654227/) - Telegram-бот Silero бесплатно переводит речь в текст - [link](https://habr.com/ru/post/591563/) - Бесплатное распознавание речи для всех желающих - [link](https://habr.com/ru/post/587512/) - Последние обновления моделей распознавания речи из Silero Models - [link](https://habr.com/ru/post/577630/) - Сжимаем трансформеры: простые, универсальные и прикладные способы cделать их компактными и быстрыми - [link](https://habr.com/ru/post/563778/) - Ультимативное сравнение систем распознавания речи: Ashmanov, Google, Sber, Silero, Tinkoff, Yandex - [link](https://habr.com/ru/post/559640/) - Мы опубликовали современные STT модели сравнимые по качеству с Google - [link](https://habr.com/ru/post/519564/) - Понижаем барьеры на вход в распознавание речи - [link](https://habr.com/ru/post/494006/) - Огромный открытый датасет русской речи версия 1.0 - [link](https://habr.com/ru/post/474462/) - Насколько Быстрой Можно Сделать Систему STT? - [link](https://habr.com/ru/post/531524/) - Наша система Speech-To-Text - [link](https://www.silero.ai/tag/our-speech-to-text/) - Speech-To-Text - [link](https://www.silero.ai/tag/speech-to-text/) - TTS: - Делаем быстрый, качественный и доступный синтез на языках России — нужно ваше участие - [link](https://habr.com/ru/articles/872474/) - Мы решили задачу омографов и ударений в русском языке - [link](https://habr.com/ru/articles/955130/) - Теперь наш синтез также доступен в виде бота в Телеграме - [link](https://habr.com/ru/post/682188/) - Может ли синтез речи обмануть систему биометрической идентификации? - [link](https://habr.com/ru/post/673996/) - Теперь наш синтез на 20 языках - [link](https://habr.com/ru/post/669910/) - Теперь наш публичный синтез в супер-высоком качестве, в 10 раз быстрее и без детских болячек - [link](https://habr.com/ru/post/660565/) - Синтезируем голос бабушки, дедушки и Ленина + новости нашего публичного синтеза - [link](https://habr.com/ru/post/584750/) - Мы сделали наш публичный синтез речи еще лучше - [link](https://habr.com/ru/post/563484/) - Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - [link](https://habr.com/ru/post/549480/) - VAD: - Новый релиз публичного детектора голоса Silero VAD v6 - [link](https://habr.com/ru/articles/940750/) - Наш публичный детектор голоса стал лучше - [link](https://habr.com/ru/post/695738/) - А ты используешь VAD? Что это такое и зачем он нужен - [link](https://habr.com/ru/post/594745/) - Модели для Детекции Речи, Чисел и Распознавания Языков - [link](https://www.silero.ai/vad-lang-classifier-number-detector/) - Мы опубликовали современный Voice Activity Detector и не только -[link](https://habr.com/ru/post/537274/) - Text Enhancement: - Восстановление знаков пунктуации и заглавных букв — теперь и на длинных текстах - [link](https://habr.com/ru/post/594565/) - Мы опубликовали модель, расставляющую знаки препинания и заглавные буквы в тексте на четырех языках - [link](https://habr.com/ru/post/581946/)