# speech-to-text

**Repository Path**: yukio233/speech-to-text

## Basic Information

- **Project Name**: speech-to-text
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: GPL-3.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2020-02-19
- **Last Updated**: 2022-09-03

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# speech-to-text
This framework provides python scripts to train neural networks for speech recognition.

## Requirements
- Python 3.6+

## Prerequisites
### Installing Dependencies
Install the required python dependencies listed in the `requirements.txt`:
```shell
pip install -r requirements.txt
```

### Providing training data
To run a training, training data is required.
A Training accepts a file, which contains metadata about the training data. 
The file itself is a JSON file consisting of an array.
Each element has the following properties:

- `path`: Absolute path to the audio file
- `text`: Transcription of the audio file

A training data file could look like this:
```json
[
  {
    "path": "path/to/audio/file1.wav",
    "text": "hello world"
  },
  {
    "path": "path/to/audio/file2.wav",
    "text": "goodbye world"
  }
]
```
You can find a downloader for the voxforge corpus at [https://github.com/KevNetG/speech-to-text-voxforge](https://github.com/KevNetG/speech-to-text-voxforge). 
This repo also includes a `generator.py` file, which creates a training data file containing the required metadata for trainings.

Currently, only WAVE files are supported.

## Usage

### Configuring a training
The idea is, that you don't write you training configuration into the command line, 
but instead into a file, which you can modify and reuse for other trainings.
You can find a sample configuration under `examples/training.config.json` and adjust it to your needs.
A configuration file has the following properties:

- `epochs`: Number of epochs to train
- `batchSize`: Batch Size
- `trainingDataQuantity`: Amount of training data that is taken from the provided sources
- `net`: Name of the model. Models are specified in the `models.py` file
- `trainingData`: Absolute Path to a training data file. You can specify multiple sources to simply scale your amount of available training data
- `alphabetPath`: Path to an alphabet file

```json
{
  "epochs": 10,
  "batchSize": 20,
  "trainingDataQuantity": 50000,
  "net": "graves",
  "trainingData": [
    "speech-to-text/training_data.json"
  ],
  "alphabetPath": "speech-to-text/examples/english.json"
}
```

You can use the english alphabet available under `examples/english.json` or create one yourself for any other language.
The alphabet file is a simple JSON file consisting of an array containing the characters from the alphabet:

### Running a training
To run a training execute the `train.py` and provide two arguments:

- `path`: Where to store the training. You don't have to specify a file extension
- `plan`: The path to a training configuration

like this:

```shell
python train.py "trainings/graves" "training_data.json" 
```

Trainings are saved after each epoch.

### Continuing an interrupted training
If you had to stop a training prematurely, you can continue it from the last checkpoint. 
Simply execute the `continuetraining.py` and pass the path to the training save file. 
You don't have to specify the file extension:

```
python continuetraining.py "trainings/graves.json"
```

### Making a prediction
In order to create a transcription of an audio file, use the `predict.py` script.
Pass the following arguments: 

- `path`: Path to a training save file
- `weights`, Path to a weights matrix
- `audio`: Path to the audio file which shall be transcribed

For example:
```shell
python predict.py "trainings/graves.json" "trainings/graves.weights-20-65.68075.h5" "media/audio.wav"
```

### Displaying training statistics
If you want to display the training loss and the validation loss of a training, execute the `statistics.py` script:
```shell
python statistics.py "trainings/graves.statistics.json"
```