# vits-finetuning

**Repository Path**: PrincessSnow/vits-finetuning

## Basic Information

- **Project Name**: vits-finetuning
- **Description**: 此仓库为 https://github.com/SayaSS/vits-finetuning 同步仓库
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-04-04
- **Last Updated**: 2024-09-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

text cleaner from https://github.com/CjangCjengh/vits

original repo: https://github.com/jaywalnut310/vits

## Online training and inference
### colab
See [vits-finetuning](https://colab.research.google.com/drive/13FF2pBWxj9rMR1SjI_JpVD6mTRN-kq--?usp=share_link)

# How to use
(Suggestion) Python == 3.7

Only Japanese datasets can be used for fine-tuning in this repo.
## Clone this repository
```sh
git clone https://github.com/SayaSS/vits-finetuning.git
```
## Install requirements
```sh
pip install -r requirements.txt
```
## Download pre-trained model
- [G_0.pth](https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/G_0.pth)
- [D_0.pth](https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/D_0.pth)
- Edit "model_dir"(line 152) in utils.py
- Put pre-trained models in the "model_dir"/checkpoints

### If you need to customize "n_speakers", please replace the pre-trained model with these two.
- [G_0-p.pth](https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/G_0-p.pth)
- [D_0-p.pth](https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/D_0-p.pth)

## Create datasets
- Speaker ID should be between 0-803.
- About 50 audio-text pairs will suffice and 100-600 epochs could have quite good performance, but more data may be better. 
- Resample all audio to 22050Hz, 16-bit, mono wav files.
- Audio files should be >=1s and <=10s.
```
path/to/XXX.wav|speaker id|transcript
```
- Example

```
dataset/001.wav|10|こんにちは。
```
For complete examples, please see filelists/miyu_train.txt and filelists/miyu_val.txt.

## Preprocess
```sh
python preprocess.py --filelists path/to/filelist_train.txt path/to/filelist_val.txt
```
Edit "training_files" and "validation_files" in configs/config.json

## Train
```sh
# Mutiple speakers
python train_ms.py -c configs/config.json -m checkpoints
```