This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks.
The converted voice examples are in samples and results_2019-06-10 directory
Download the vcc 2016 dataset to the current directory
python download.py
The downloaded zip files are extracted to ./data/vcc2016_training
and ./data/evaluation_all
.
./data/vcc2016_training
. So we move the corresponding folder(eg. SF1,SF2,TM1,TM2 ) to ./data/speakers
../data/evaluation_all
. So we move the corresponding folder(eg. SF1,SF2,TM1,TM2 ) to ./data/speakers_test
.The data directory now looks like this:
data
├── speakers (training set)
│ ├── SF1
│ ├── SF2
│ ├── TM1
│ └── TM2
├── speakers_test (testing set)
│ ├── SF1
│ ├── SF2
│ ├── TM1
│ └── TM2
├── vcc2016_training (vcc 2016 training set)
│ ├── ...
├── evaluation_all (vcc 2016 evaluation set, we use it as testing set)
│ ├── ...
Extract features (mcep, f0, ap) from each speech clip. The features are stored as npy files. We also calculate the statistical characteristics for each speaker.
python preprocess.py
This process may take minutes !
python main.py
python main.py --mode test --test_iters 200000 --src_speaker TM1 --trg_speaker "['TM1','SF1']"
Note: Our implementation follows the original paper’s network structure, while pytorch StarGAN-VC code use StarGAN's network.Both can generate good audio quality.
The former implementation's network structure is the network of the original paper, but in order to achieve better conversion result, the following modifications are made in this update:
If you feel this repo is good, please star !
Your encouragement is my biggest motivation!
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。