# StarGAN-Voice-Conversion-2

**Repository Path**: LL9527/StarGAN-Voice-Conversion-2

## Basic Information

- **Project Name**: StarGAN-Voice-Conversion-2
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2021-06-09
- **Last Updated**: 2021-06-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# StarGAN-Voice-Conversion-2

This is a pytorch implementation of the paper: [StarGAN-VC2:Rethinking Conditional Methods for StarGAN-Based Voice Conversion](https://arxiv.org/pdf/1907.12279.pdf).

**The converted voice examples are in *converted directory*. The sound quality will improve if more training epochs (~200000) are done.

**VCTK database has been used to train the model with 70 speakers. The convereted samples are a bit noisy because of VCTK data but it can be improved if other clean databases are used.

**Omitted PS in Generator network.

## [Dependencies]
- Python 3.5+
- pytorch 0.4.0+
- librosa 
- pyworld 
- tensorboardX
- scikit-learn
- tqdm


## [Usage]

### Download dataset

Download and unzip [VCTK](https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html) corpus to designated directories.

```bash
mkdir ./data
wget https://datashare.is.ed.ac.uk/bitstream/handle/10283/2651/VCTK-Corpus.zip?sequence=2&isAllowed=y
unzip VCTK-Corpus.zip -d ./data
```
If the downloaded VCTK is in tar.gz, run this:

```bash
tar -xzvf VCTK-Corpus.tar.gz -C ./data
```
The data directory now looks like this:

```
data
├── vctk
│   ├── p225
│   ├── p226
│   ├── ...
│   └── p360

```

### Preprocess

Extract features (mcep, f0, ap) from each speech clip.  The features are stored as npy files. We also calculate the statistical characteristics for each speaker.

```
python preprocess.py
```

This process may take minutes !

The data directory now looks like this:

```
data
├── vctk (48kHz data)
│   ├── p225
│   ├── p226
│   ├── ...
│   └── p360 
├── vctk_16 (16kHz data)
│   ├── p225
│   ├── p226
│   ├── ...
│   └── p360
├── mc
│   ├── train
│   ├── test


```

### Train

```
python main.py
```


### Convert


```
convert.py --src_spk p262 --trg_spk p272 --resume_iters 210000
```


## [Network structure]

![network](https://github.com/dipjyoti92/StarGAN-Voice-Conversion-2/blob/master/network.png)


## [Reference]

* [StarGAN-VC2 paper](https://arxiv.org/pdf/1907.12279)

* [StarGAN paper](https://arxiv.org/abs/1711.09020)

* [CycleGAN-VC paper](https://arxiv.org/abs/1711.11293)


## [Acknowlegements]

https://github.com/liusongxiang/StarGAN-Voice-Conversion

https://github.com/SamuelBroughton/StarGAN-Voice-Conversion-2