# pytorch-asr

**Repository Path**: create_future/pytorch-asr

## Basic Information

- **Project Name**: pytorch-asr
- **Description**: END TO END AED模型的lattice generation算法和代码
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-10-22
- **Last Updated**: 2022-07-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# pytorch-asr

Speech recognition models in PyTorch build on Kaldi data pre-processing toolchain.
Made at the University of Wrocław, Poland.

The repository holds the code to replicate the experiments from:
 * [Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees](https://arxiv.org/abs/1901.04379)
 * *Lattice Generation in Attention-based Speech Recognition Models*

# Setup

## Setting the Environment
1. Download the [`miniconda 3`](https://docs.conda.io/en/latest/miniconda.html)
   for your platform, then install it.
2. If you didn't add the conda to your `.bashrc` during installation, run the command provided by the installator, e.g.
   `eval "$(/pio/scratch/1/alan/miniconda2/bin/conda shell.bash hook)"`
   to populate your current shell with conda programs.
3. Install the `conda` environment by
   `conda env create -f environment.yml`

   To update the environment use `conda env update --file environment.yml`.

## Enabling the Environment
1. If you haven't done that already
   `eval "$(/pio/scratch/1/alan/miniconda2/bin/conda shell.bash hook)"`
2. Activate the appriopriate environment
    ```
    conda activate pytorch_asr
   source set-env.sh
    ```

## Dependencies

Run `build_deps.sh` script
    ```
    cd pytorch-asr
    ./build_deps.sh
    ```

# Training & Decoding

Model training
```
python train.py egs/wsj/deep_speech2.yaml /tmp/experiment
```
Greedy model decoding
```
python decode.py /tmp/experiment/config_train1.yaml --model /tmp/experiment/checkpoints/best.pkl
```
To decode using external LM, first build language models
```
bash egs/wsj/build_decoding_fst.sh
```
Then decode
```
bash egs/wsj/ctc_kaldi_decode.sh --min_checkpoint 35000 --pkl ALL --subset test lm/lm_ees_tg_larger/biphone runs/ctc_bi
```
Decoding results will be put in the experiment dir. Consult both scripts for more decoding options.

## Experiments: Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees

The models are located in `egs/wsj/yamls/*.yaml`.
To train a particular model
```
python train.py egs/wsj/yaml/ctcg_bi_cde.yaml runs/ctcg_bi_cde
```
Intermediate checkpoints and best model will be stored in `runs/ctcg_bi_cde/checkpoints`.
To decode, select the appriopriate language model.

| Model                             | $lm_path                                                  |
|-----------------------------------|-----------------------------------------------------------|
| mono-char CTC                     | exp/wsjs5/pydata/lm/lm_ees_tg_larger/monophone            |
| bi-char CTC, CTC-G, CTC-G (+ CDE) | exp/wsjs5/pydata/lm/lm_ees_tg_larger/biphone              |
| bi-char CTC-GB (+ CDE)            | exp/wsjs5/pydata/lm/lm_ees_tg_larger/biphone_contextblank |
```
bash egs/wsj/ctc_kaldi_decode.sh --min-acwt 0.3 --subset dev $lm_path runs/ctcg_bi_cde
```
Consult the decoding script for more options.

## Experiments: Lattice Generation in Attention-based Speech Recognition Models

First, train the initial model with CTC:
```
python train.py egs/wsj/yamls/lattice_decoding/ctc.yaml runs/lattice_base
```
The run the second stage with TCN
```
python train.py egs/wsj/yamls/lattice_decoding/ctc.yaml runs/lattice_stage2 --initialize-from runs/lattice_base/checkpoints/best.pkl
```
In order to decode, pick a checkpoint:
```
python decode.py ~/group/mza/recreate/tcn.yaml --model runs/lattice_stage2/checkpoints/best_51853_CER_0.0808282271662.pkl
--csv decoded.csv -m Model.decoder.use_graph_search True
Model.decoder.length_normalization 0 Model.decoder.coverage_weight 0.8
Model.decoder.min_attention_pos 0 Model.decoder.coverage_tau 0.25
Model.decoder.keep_eos_score False Model.decoder.lm_weight 0.75
Model.decoder.att_force_forward "[-10, 50]" Model.decoder.beam_size 10
Model.decoder.lm_file
/net/archive/groups/plggneurony/mza/lm_ees_tg_larger/LG_syms.fst
Datasets.test.batch_size 1
```