# CTTS

**Repository Path**: atomai/CTTS

## Basic Information

- **Project Name**: CTTS
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-08
- **Last Updated**: 2020-12-20

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Cantonese TTS Frontend

Cantonese/Chinese Text to Speech based on statistical parametric speech 
synthesis using merlin toolkit

This project is influenced by [MTTS](https://github.com/Jackiexiao/MTTS)

## How To Reproduce
1. First, you need data contain wav and txt (prosody mark is optional)
2. Second, generate HTS label using this project 
3. Using [merlin/egs/cantonese_voice](https://github.com/mirfan899/merlin/tree/master/egs/cantonese_voice) to train and generate Cantonese Voice

## Context related annotation & Question Set
* [Context related annotation](https://github.com/Jackiexiao/MTTS/blob/master/misc/mandarin_label.md)
* [Question Set](https://github.com/Jackiexiao/MTTS/blob/master/misc/questions-mandarin.hed)
* [Rules to design a Question Set](https://github.com/Jackiexiao/MTTS/blob/master/docs/mddocs/question.md)

## Install
Python : python3.6  
System: linux(tested on ubuntu16.04)  
```
sudo apt-get install libatlas3-base
```
Run `bash tools/install_mtts.sh`  
**Or** download file by yourself
* Download [montreal-forced-aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases/download/v1.0.0/montreal-forced-aligner_linux.tar.gz) and unzip to directory tools/  

**Run Demo**
```
bash run_demo.sh
```
## Usage
### 1. Generate HTS Label by wav and text
* Usage: Run `python src/mtts.py txtfile wav_directory_path output_directory_path` (Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add`-a your_acoustic_model.zip`, otherwise, this project use thchs30.zip acoustic model as default
* Attention: Currently only support Chinese Character, txt should not have any
    Arabia number or English alphabet

**txtfile example**
```
A_01 这是一段文本
A_02 这是第二段文本
```
**wav_directory example**(Sampleing Rate should larger than 16khz)
```
A_01.wav  
A_02.wav  
```

### 2. Generate HTS Label by text with or without alignment file
* Usage: Run `python src/mandarin_frontend.py txtfile output_directory_path` 
* or import mandarin_frontend
```
from mandarin_frontend import txt2label

result = txt2label('向香港特别行政区同胞澳门和台湾同胞海外侨胞')
[print(line) for line in result]

```
see [source code](https://github.com/mirfan899/MTTS/blob/master/src/cantonese_frontend.py) for more information, but pay attention to the alignment file(sfs file), the format is `endtime phone_type` not `start_time, phone_type`(which is different from speech ocean's data)

### 3. Forced-alignment
This project use [Montreal-Forced-Aligner](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to do forced alignment, if you want to get a better alignment, use your data to train a alignment-model, see [mfa: algin-using-only-the-dataset](https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html#align-using-only-the-data-set)
1. We trained the acoustic model on our dataset.

## Prosody Mark
You can generate HTS Label without prosody mark. we assume that word segment is
smaller than prosodic word(which is adjusted in code)

## Improvement to be done in future
* Text Normalization
* Better Chinese word segment
* G2P: Polyphone Problem
* Better Label format and Question Set
* Improvement of prosody analyse
* Better alignment

## Contributor
* miran899