# zhtts **Repository Path**: davylw/zhtts ## Basic Information - **Project Name**: zhtts - **Description**: A demo of zh/Chinese Text to Speech system run on CPU in real time. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-11-26 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ZhTTS A demo of zh/Chinese Text to Speech system run on CPU in real time. (fastspeech2 + mbmelgan) > *RTF(real time factor): 0.2 with cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz* 24khz audio This repo is **mainly based on** [TensorFlowTTS](https://github.com/TensorSpeech/TensorFlowTTS) with little improvement. * tflite model come from [colab](https://colab.research.google.com/drive/1Ma3MIcSdLsOxqOKcN1MlElncYMhrOg3J?usp=sharing), thx to [@azraelkuan](https://github.com/azraelkuan) * add pause at punctuation (use #3) * add TN (Text Normalization) from [chinese_text_normalization](https://github.com/speechio/chinese_text_normalization) ## demo wav text = "2020年,这是一个开源的端到端中文语音合成系统" [click to play wav](https://gitee.com/jackiegeek/zhtts/raw/master/demo.wav) ## Install clone this repo ```shell pip install "tensorflow>=2.3.0" numpy scipy pypinyin ``` for window , `pip install "tensorflow>=2.4.0rc"` because [this](https://www.tensorflow.org/lite/guide/ops_select#python) ## Usage ```python import zhtts text = "2020年,这是一个开源的端到端中文语音合成系统" tts = zhtts.TTS() tts.text2wav(text, "demo.wav") ``` ```python tts.frontend(text) >>> ('二零二零年,这是一个开源的端到端中文语音合成系统', 'sil ^ er4 #0 l ing2 #0 ^ er4 #0 l ing2 #0 n ian2 #0 #3 zh e4 #0 sh iii4 #0 ^ i2 #0 g e4 #0 k ai1 #0 ^ van2 #0 d e5 #0 d uan1 #0 d ao4 #0 d uan1 #0 zh ong1 #0 ^ uen2 #0 ^ v3 #0 ^ in1 #0 h e2 #0 ch eng2 #0 x i4 #0 t ong3 sil') tts.synthesis(text) >>> array([0., 0., 0., ..., 0., 0., 0.], dtype=float32) ``` ### web api demo ``` python app.py ``` * visit http://localhost:5000 * do HTTP GET at http://localhost:5000/api/tts?text=your%20sentence to get WAV audio back: ```sh $ curl -G --output - \ --data-urlencode 'text=你好,世界!' \ 'http://localhost:5000/api/tts' | \ aplay ``` ## TODO - [ ] more accurate g2p by using g2pM ? - [ ] support synthesis English alpha - [ ] use tflite_runtime without full tensorflow - [ ] improve naturalness - [ ] stream tts ## known issue This is just a **demo**, Expect the experience to be rough because many TN/g2p/prosody error. * when synthesis long sentence, audio will become unnatural