# vi-deberta-v3-large **Repository Path**: modelee/vi-deberta-v3-large ## Basic Information - **Project Name**: vi-deberta-v3-large - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 4 - **Forks**: 0 - **Created**: 2023-05-23 - **Last Updated**: 2025-09-02 ## Categories & Tags **Categories**: llm **Tags**: None ## README --- license: other language: - vi metrics: - accuracy - perplexity datasets: - anhdungitvn/vi-general-64g --- # Vietnamese DebertaV3 Large (vi-deberta-v3-large) ### Todo ``` [x] Corpora collection [x] Tokenizer training [x] Model pretraining [ ] Model finetuning [ ] Experimental results, comparision, and conclusion ``` ## Model Info ``` LAYER NAME #PARAMS RATIO MEM(MB) --model: 851,542,017 100.00% 3248.38 --generator: 284,459,008 33.41% 1085.12 --deberta: 283,279,360 33.27% 1080.62 --lm_predictions: 1,179,648 0.14% 4.50 --discriminator: 567,083,009 66.59% 2163.25 --deberta: 566,030,336 66.47% 2159.23 --mask_predictions: 1,052,673 0.12% 4.02 ``` ## Model Perfomance | Metric | Value | |--------------|----------------------------| | accuracy | 0.7113977778702509 | | eval_loss | 1.3216993808746338 | | eval_metric | 0.7113977778702509 | | eval_samples | 240310 | | perplexity | 3.749788284301758 | | best_metric | 0.7113977778702509@2200000 | | train_steps | 2200000 | | train_loss | 1.1960319969044906 | ## TL;DR | Aspect | Sub-Aspect | Description | |-------------|--------------|--------------------------------------------------| | Corpus | Language | Vietnames | | | Source | Wiki 2023 (1GB), News 2023 (17.2GB), News (64GB) | | | Size | 1GB, 18GB, 64GB | | | Preprocesing | None | | Tokenizer | Lib | SentencePiece | | | Algorithm | BPE | | | Type | spm | | | Vocab | 128000 | | | Ref | https://github.com/google/sentencepiece | | Model | Type | DeBERTaV3 | | | Ref | https://openreview.net/forum?id=sE7-XhLxHA | | | Code | https://github.com/microsoft/DeBERTa | | Pretraining | Task | RTD | | | Config | model_config.json | | | Args | default | | | Hardware | 5x Nvidia A100-SXM4-80G, 2x Nvidia 4090-PCI-24GB | | | Phases | Init, Refining, Enlarging | | | Status | Training on hold step 2200000 | | Finetuning | Status | Not started (need help) | ## Repo ``` 📁vi-deberta-v3-large |---🗎config.json |---🗎pytorch_model.bin |---🗎spm.model |---tl;dr.pdf |---📁discriminator |---📁generator |---📁tokenizer |---📁metrics |---📁logs ``` ### Pretraining
Phase 0: Init ### Info - Goal: Init - Progress: 30.00% ▓▓▓▓░░░░░░ - Status: training interrupped, step 1000000 - Loss: **init loss** ### Metrics
Phase 1: Refining ### Info - Goal: refining - Changes: smaller learning rate (100µ -> 2µ) - Progress: 100.00% ▓▓▓▓▓▓▓▓▓▓ - Status: training finished, step 1500000 - Loss: **refining loss** ### Metrics | Metric | Value | |--------------|----------------------------| | accuracy | 0.7515653334245732 | | eval_loss | 1.0692176818847656 | | eval_metric | 0.7515653334245732 | | eval_samples | 29227 | | perplexity | 2.913099527359009 | | best_metric | 0.7522154172511719@1450000 | | train_steps | 1500000 | | train_loss | 1.1779516744723688 |
Phase 2: Enlarging ### Info - Goal: enlarging, augmenting, expanding data - Changes: smaller learning rate (2µ -> 1µ), larger corpus (18G -> 64GB), eval samples (wiki 29227 -> news 240310) - Progress: 20.00% ▓▓░░░░░░░░ - Status: training in progress, step 2000000 - Loss: **enlarging loss** ### Metrics | Metric | Value | |--------------|----------------------------| | accuracy | 0.7084723898298032 | | eval_loss | 1.3221531009674072 | | eval_metric | 0.7084723898298032 | | eval_samples | 240310 | | perplexity | 3.7141621112823486 | | best_metric | 0.7084723898298032@2000000 | | train_steps | 2000000 | | train_loss | 1.2167873119241372 |
#### Phase 2: Enlarging (resume, on hold) - Goal: enlarging, augmenting, expanding data - Changes: smaller learning rate (2µ -> 1µ), larger corpus (18G -> 64GB), eval samples (wiki 29227 -> news 240310) - Progress: 20.00% ▓▓░░░░░░░░ - Status: training phase 2 interrupted intentionally, step 2200000 - Loss: **enlarging loss** - Metrics: | Metric | Value | |--------------|----------------------------| | accuracy | 0.7113977778702509 | | eval_loss | 1.3216993808746338 | | eval_metric | 0.7113977778702509 | | eval_samples | 240310 | | perplexity | 3.749788284301758 | | best_metric | 0.7113977778702509@2200000 | | train_steps | 2200000 | | train_loss | 1.1960319969044906 | ### Finetuning **NEED HELP** - Token Classification: tasks, datasets - Sequence Classification: tasks, datasets ## Experimental Results and Comparision - Not started ## Usage ![test](https://huggingface.co/anhdungitvn/vi-deberta-v3-large/resolve/main/test.gif) #### Method 1: Load pretrained vi-deberta-v3-large with Transformers AutoClass - Install ClickAI ``` # Why need to install clickai? # HF Transformers has not yet supported DebertaV3. # ClickAI: locally register DeBertaV3 with HF Transformers. ``` ```bash pip install git+https://gitlab.com/anhdungvo/clickai.git ``` - Tokenizer, Config, Model ```python import clickai from transformers import AutoTokenizer, AutoConfig, AutoModel config = AutoConfig.from_pretrained("anhdungitvn/vi-deberta-v3-large") tokenizer = AutoTokenizer.from_pretrained("anhdungitvn/vi-deberta-v3-large") model = AutoModel.from_pretrained("anhdungitvn/vi-deberta-v3-large") tokenizer("Xử lý ngôn ngữ tiếng Việt", return_tensors='pt') ``` - Transfer Pretrained Model Weights ```python your_model.load_state_dict(model.GET_NEEDED_MODULE_WEIGHTS()) ```
Model state dict keys ```python model.state_dict().keys() ``` ```python [ generator.deberta.embeddings.word_embeddings.weight generator.deberta.embeddings.position_embeddings.weight generator.deberta.embeddings.LayerNorm.weight generator.deberta.embeddings.LayerNorm.bias generator.deberta.encoder.layer.0.attention.self.query_proj.weight generator.deberta.encoder.layer.0.attention.self.query_proj.bias generator.deberta.encoder.layer.0.attention.self.key_proj.weight generator.deberta.encoder.layer.0.attention.self.key_proj.bias generator.deberta.encoder.layer.0.attention.self.value_proj.weight generator.deberta.encoder.layer.0.attention.self.value_proj.bias generator.deberta.encoder.layer.0.attention.output.dense.weight generator.deberta.encoder.layer.0.attention.output.dense.bias generator.deberta.encoder.layer.0.attention.output.LayerNorm.weight generator.deberta.encoder.layer.0.attention.output.LayerNorm.bias generator.deberta.encoder.layer.0.intermediate.dense.weight generator.deberta.encoder.layer.0.intermediate.dense.bias generator.deberta.encoder.layer.0.output.dense.weight generator.deberta.encoder.layer.0.output.dense.bias generator.deberta.encoder.layer.0.output.LayerNorm.weight generator.deberta.encoder.layer.0.output.LayerNorm.bias generator.deberta.encoder.layer.1.attention.self.query_proj.weight generator.deberta.encoder.layer.1.attention.self.query_proj.bias generator.deberta.encoder.layer.1.attention.self.key_proj.weight generator.deberta.encoder.layer.1.attention.self.key_proj.bias generator.deberta.encoder.layer.1.attention.self.value_proj.weight generator.deberta.encoder.layer.1.attention.self.value_proj.bias generator.deberta.encoder.layer.1.attention.output.dense.weight generator.deberta.encoder.layer.1.attention.output.dense.bias generator.deberta.encoder.layer.1.attention.output.LayerNorm.weight generator.deberta.encoder.layer.1.attention.output.LayerNorm.bias generator.deberta.encoder.layer.1.intermediate.dense.weight generator.deberta.encoder.layer.1.intermediate.dense.bias generator.deberta.encoder.layer.1.output.dense.weight generator.deberta.encoder.layer.1.output.dense.bias generator.deberta.encoder.layer.1.output.LayerNorm.weight generator.deberta.encoder.layer.1.output.LayerNorm.bias generator.deberta.encoder.layer.2.attention.self.query_proj.weight generator.deberta.encoder.layer.2.attention.self.query_proj.bias generator.deberta.encoder.layer.2.attention.self.key_proj.weight generator.deberta.encoder.layer.2.attention.self.key_proj.bias generator.deberta.encoder.layer.2.attention.self.value_proj.weight generator.deberta.encoder.layer.2.attention.self.value_proj.bias generator.deberta.encoder.layer.2.attention.output.dense.weight generator.deberta.encoder.layer.2.attention.output.dense.bias generator.deberta.encoder.layer.2.attention.output.LayerNorm.weight generator.deberta.encoder.layer.2.attention.output.LayerNorm.bias generator.deberta.encoder.layer.2.intermediate.dense.weight generator.deberta.encoder.layer.2.intermediate.dense.bias generator.deberta.encoder.layer.2.output.dense.weight generator.deberta.encoder.layer.2.output.dense.bias generator.deberta.encoder.layer.2.output.LayerNorm.weight generator.deberta.encoder.layer.2.output.LayerNorm.bias generator.deberta.encoder.layer.3.attention.self.query_proj.weight generator.deberta.encoder.layer.3.attention.self.query_proj.bias generator.deberta.encoder.layer.3.attention.self.key_proj.weight generator.deberta.encoder.layer.3.attention.self.key_proj.bias generator.deberta.encoder.layer.3.attention.self.value_proj.weight generator.deberta.encoder.layer.3.attention.self.value_proj.bias generator.deberta.encoder.layer.3.attention.output.dense.weight generator.deberta.encoder.layer.3.attention.output.dense.bias generator.deberta.encoder.layer.3.attention.output.LayerNorm.weight generator.deberta.encoder.layer.3.attention.output.LayerNorm.bias generator.deberta.encoder.layer.3.intermediate.dense.weight generator.deberta.encoder.layer.3.intermediate.dense.bias generator.deberta.encoder.layer.3.output.dense.weight generator.deberta.encoder.layer.3.output.dense.bias generator.deberta.encoder.layer.3.output.LayerNorm.weight generator.deberta.encoder.layer.3.output.LayerNorm.bias generator.deberta.encoder.layer.4.attention.self.query_proj.weight generator.deberta.encoder.layer.4.attention.self.query_proj.bias generator.deberta.encoder.layer.4.attention.self.key_proj.weight generator.deberta.encoder.layer.4.attention.self.key_proj.bias generator.deberta.encoder.layer.4.attention.self.value_proj.weight generator.deberta.encoder.layer.4.attention.self.value_proj.bias generator.deberta.encoder.layer.4.attention.output.dense.weight generator.deberta.encoder.layer.4.attention.output.dense.bias generator.deberta.encoder.layer.4.attention.output.LayerNorm.weight generator.deberta.encoder.layer.4.attention.output.LayerNorm.bias generator.deberta.encoder.layer.4.intermediate.dense.weight generator.deberta.encoder.layer.4.intermediate.dense.bias generator.deberta.encoder.layer.4.output.dense.weight generator.deberta.encoder.layer.4.output.dense.bias generator.deberta.encoder.layer.4.output.LayerNorm.weight generator.deberta.encoder.layer.4.output.LayerNorm.bias generator.deberta.encoder.layer.5.attention.self.query_proj.weight generator.deberta.encoder.layer.5.attention.self.query_proj.bias generator.deberta.encoder.layer.5.attention.self.key_proj.weight generator.deberta.encoder.layer.5.attention.self.key_proj.bias generator.deberta.encoder.layer.5.attention.self.value_proj.weight generator.deberta.encoder.layer.5.attention.self.value_proj.bias generator.deberta.encoder.layer.5.attention.output.dense.weight generator.deberta.encoder.layer.5.attention.output.dense.bias generator.deberta.encoder.layer.5.attention.output.LayerNorm.weight generator.deberta.encoder.layer.5.attention.output.LayerNorm.bias generator.deberta.encoder.layer.5.intermediate.dense.weight generator.deberta.encoder.layer.5.intermediate.dense.bias generator.deberta.encoder.layer.5.output.dense.weight generator.deberta.encoder.layer.5.output.dense.bias generator.deberta.encoder.layer.5.output.LayerNorm.weight generator.deberta.encoder.layer.5.output.LayerNorm.bias generator.deberta.encoder.layer.6.attention.self.query_proj.weight generator.deberta.encoder.layer.6.attention.self.query_proj.bias generator.deberta.encoder.layer.6.attention.self.key_proj.weight generator.deberta.encoder.layer.6.attention.self.key_proj.bias generator.deberta.encoder.layer.6.attention.self.value_proj.weight generator.deberta.encoder.layer.6.attention.self.value_proj.bias generator.deberta.encoder.layer.6.attention.output.dense.weight generator.deberta.encoder.layer.6.attention.output.dense.bias generator.deberta.encoder.layer.6.attention.output.LayerNorm.weight generator.deberta.encoder.layer.6.attention.output.LayerNorm.bias generator.deberta.encoder.layer.6.intermediate.dense.weight generator.deberta.encoder.layer.6.intermediate.dense.bias generator.deberta.encoder.layer.6.output.dense.weight generator.deberta.encoder.layer.6.output.dense.bias generator.deberta.encoder.layer.6.output.LayerNorm.weight generator.deberta.encoder.layer.6.output.LayerNorm.bias generator.deberta.encoder.layer.7.attention.self.query_proj.weight generator.deberta.encoder.layer.7.attention.self.query_proj.bias generator.deberta.encoder.layer.7.attention.self.key_proj.weight generator.deberta.encoder.layer.7.attention.self.key_proj.bias generator.deberta.encoder.layer.7.attention.self.value_proj.weight generator.deberta.encoder.layer.7.attention.self.value_proj.bias generator.deberta.encoder.layer.7.attention.output.dense.weight generator.deberta.encoder.layer.7.attention.output.dense.bias generator.deberta.encoder.layer.7.attention.output.LayerNorm.weight generator.deberta.encoder.layer.7.attention.output.LayerNorm.bias generator.deberta.encoder.layer.7.intermediate.dense.weight generator.deberta.encoder.layer.7.intermediate.dense.bias generator.deberta.encoder.layer.7.output.dense.weight generator.deberta.encoder.layer.7.output.dense.bias generator.deberta.encoder.layer.7.output.LayerNorm.weight generator.deberta.encoder.layer.7.output.LayerNorm.bias generator.deberta.encoder.layer.8.attention.self.query_proj.weight generator.deberta.encoder.layer.8.attention.self.query_proj.bias generator.deberta.encoder.layer.8.attention.self.key_proj.weight generator.deberta.encoder.layer.8.attention.self.key_proj.bias generator.deberta.encoder.layer.8.attention.self.value_proj.weight generator.deberta.encoder.layer.8.attention.self.value_proj.bias generator.deberta.encoder.layer.8.attention.output.dense.weight generator.deberta.encoder.layer.8.attention.output.dense.bias generator.deberta.encoder.layer.8.attention.output.LayerNorm.weight generator.deberta.encoder.layer.8.attention.output.LayerNorm.bias generator.deberta.encoder.layer.8.intermediate.dense.weight generator.deberta.encoder.layer.8.intermediate.dense.bias generator.deberta.encoder.layer.8.output.dense.weight generator.deberta.encoder.layer.8.output.dense.bias generator.deberta.encoder.layer.8.output.LayerNorm.weight generator.deberta.encoder.layer.8.output.LayerNorm.bias generator.deberta.encoder.layer.9.attention.self.query_proj.weight generator.deberta.encoder.layer.9.attention.self.query_proj.bias generator.deberta.encoder.layer.9.attention.self.key_proj.weight generator.deberta.encoder.layer.9.attention.self.key_proj.bias generator.deberta.encoder.layer.9.attention.self.value_proj.weight generator.deberta.encoder.layer.9.attention.self.value_proj.bias generator.deberta.encoder.layer.9.attention.output.dense.weight generator.deberta.encoder.layer.9.attention.output.dense.bias generator.deberta.encoder.layer.9.attention.output.LayerNorm.weight generator.deberta.encoder.layer.9.attention.output.LayerNorm.bias generator.deberta.encoder.layer.9.intermediate.dense.weight generator.deberta.encoder.layer.9.intermediate.dense.bias generator.deberta.encoder.layer.9.output.dense.weight generator.deberta.encoder.layer.9.output.dense.bias generator.deberta.encoder.layer.9.output.LayerNorm.weight generator.deberta.encoder.layer.9.output.LayerNorm.bias generator.deberta.encoder.layer.10.attention.self.query_proj.weight generator.deberta.encoder.layer.10.attention.self.query_proj.bias generator.deberta.encoder.layer.10.attention.self.key_proj.weight generator.deberta.encoder.layer.10.attention.self.key_proj.bias generator.deberta.encoder.layer.10.attention.self.value_proj.weight generator.deberta.encoder.layer.10.attention.self.value_proj.bias generator.deberta.encoder.layer.10.attention.output.dense.weight generator.deberta.encoder.layer.10.attention.output.dense.bias generator.deberta.encoder.layer.10.attention.output.LayerNorm.weight generator.deberta.encoder.layer.10.attention.output.LayerNorm.bias generator.deberta.encoder.layer.10.intermediate.dense.weight generator.deberta.encoder.layer.10.intermediate.dense.bias generator.deberta.encoder.layer.10.output.dense.weight generator.deberta.encoder.layer.10.output.dense.bias generator.deberta.encoder.layer.10.output.LayerNorm.weight generator.deberta.encoder.layer.10.output.LayerNorm.bias generator.deberta.encoder.layer.11.attention.self.query_proj.weight generator.deberta.encoder.layer.11.attention.self.query_proj.bias generator.deberta.encoder.layer.11.attention.self.key_proj.weight generator.deberta.encoder.layer.11.attention.self.key_proj.bias generator.deberta.encoder.layer.11.attention.self.value_proj.weight generator.deberta.encoder.layer.11.attention.self.value_proj.bias generator.deberta.encoder.layer.11.attention.output.dense.weight generator.deberta.encoder.layer.11.attention.output.dense.bias generator.deberta.encoder.layer.11.attention.output.LayerNorm.weight generator.deberta.encoder.layer.11.attention.output.LayerNorm.bias generator.deberta.encoder.layer.11.intermediate.dense.weight generator.deberta.encoder.layer.11.intermediate.dense.bias generator.deberta.encoder.layer.11.output.dense.weight generator.deberta.encoder.layer.11.output.dense.bias generator.deberta.encoder.layer.11.output.LayerNorm.weight generator.deberta.encoder.layer.11.output.LayerNorm.bias generator.deberta.encoder.rel_embeddings.weight generator.deberta.encoder.LayerNorm.weight generator.deberta.encoder.LayerNorm.bias generator.lm_predictions.lm_head.bias generator.lm_predictions.lm_head.dense.weight generator.lm_predictions.lm_head.dense.bias generator.lm_predictions.lm_head.LayerNorm.weight generator.lm_predictions.lm_head.LayerNorm.bias discriminator.deberta.embeddings.word_embeddings.weight discriminator.deberta.embeddings.word_embeddings._weight discriminator.deberta.embeddings.position_embeddings.weight discriminator.deberta.embeddings.position_embeddings._weight discriminator.deberta.embeddings.LayerNorm.weight discriminator.deberta.embeddings.LayerNorm.bias discriminator.deberta.encoder.layer.0.attention.self.query_proj.weight discriminator.deberta.encoder.layer.0.attention.self.query_proj.bias discriminator.deberta.encoder.layer.0.attention.self.key_proj.weight discriminator.deberta.encoder.layer.0.attention.self.key_proj.bias discriminator.deberta.encoder.layer.0.attention.self.value_proj.weight discriminator.deberta.encoder.layer.0.attention.self.value_proj.bias discriminator.deberta.encoder.layer.0.attention.output.dense.weight discriminator.deberta.encoder.layer.0.attention.output.dense.bias discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.0.intermediate.dense.weight discriminator.deberta.encoder.layer.0.intermediate.dense.bias discriminator.deberta.encoder.layer.0.output.dense.weight discriminator.deberta.encoder.layer.0.output.dense.bias discriminator.deberta.encoder.layer.0.output.LayerNorm.weight discriminator.deberta.encoder.layer.0.output.LayerNorm.bias discriminator.deberta.encoder.layer.1.attention.self.query_proj.weight discriminator.deberta.encoder.layer.1.attention.self.query_proj.bias discriminator.deberta.encoder.layer.1.attention.self.key_proj.weight discriminator.deberta.encoder.layer.1.attention.self.key_proj.bias discriminator.deberta.encoder.layer.1.attention.self.value_proj.weight discriminator.deberta.encoder.layer.1.attention.self.value_proj.bias discriminator.deberta.encoder.layer.1.attention.output.dense.weight discriminator.deberta.encoder.layer.1.attention.output.dense.bias discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.1.intermediate.dense.weight discriminator.deberta.encoder.layer.1.intermediate.dense.bias discriminator.deberta.encoder.layer.1.output.dense.weight discriminator.deberta.encoder.layer.1.output.dense.bias discriminator.deberta.encoder.layer.1.output.LayerNorm.weight discriminator.deberta.encoder.layer.1.output.LayerNorm.bias discriminator.deberta.encoder.layer.2.attention.self.query_proj.weight discriminator.deberta.encoder.layer.2.attention.self.query_proj.bias discriminator.deberta.encoder.layer.2.attention.self.key_proj.weight discriminator.deberta.encoder.layer.2.attention.self.key_proj.bias discriminator.deberta.encoder.layer.2.attention.self.value_proj.weight discriminator.deberta.encoder.layer.2.attention.self.value_proj.bias discriminator.deberta.encoder.layer.2.attention.output.dense.weight discriminator.deberta.encoder.layer.2.attention.output.dense.bias discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.2.intermediate.dense.weight discriminator.deberta.encoder.layer.2.intermediate.dense.bias discriminator.deberta.encoder.layer.2.output.dense.weight discriminator.deberta.encoder.layer.2.output.dense.bias discriminator.deberta.encoder.layer.2.output.LayerNorm.weight discriminator.deberta.encoder.layer.2.output.LayerNorm.bias discriminator.deberta.encoder.layer.3.attention.self.query_proj.weight discriminator.deberta.encoder.layer.3.attention.self.query_proj.bias discriminator.deberta.encoder.layer.3.attention.self.key_proj.weight discriminator.deberta.encoder.layer.3.attention.self.key_proj.bias discriminator.deberta.encoder.layer.3.attention.self.value_proj.weight discriminator.deberta.encoder.layer.3.attention.self.value_proj.bias discriminator.deberta.encoder.layer.3.attention.output.dense.weight discriminator.deberta.encoder.layer.3.attention.output.dense.bias discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.3.intermediate.dense.weight discriminator.deberta.encoder.layer.3.intermediate.dense.bias discriminator.deberta.encoder.layer.3.output.dense.weight discriminator.deberta.encoder.layer.3.output.dense.bias discriminator.deberta.encoder.layer.3.output.LayerNorm.weight discriminator.deberta.encoder.layer.3.output.LayerNorm.bias discriminator.deberta.encoder.layer.4.attention.self.query_proj.weight discriminator.deberta.encoder.layer.4.attention.self.query_proj.bias discriminator.deberta.encoder.layer.4.attention.self.key_proj.weight discriminator.deberta.encoder.layer.4.attention.self.key_proj.bias discriminator.deberta.encoder.layer.4.attention.self.value_proj.weight discriminator.deberta.encoder.layer.4.attention.self.value_proj.bias discriminator.deberta.encoder.layer.4.attention.output.dense.weight discriminator.deberta.encoder.layer.4.attention.output.dense.bias discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.4.intermediate.dense.weight discriminator.deberta.encoder.layer.4.intermediate.dense.bias discriminator.deberta.encoder.layer.4.output.dense.weight discriminator.deberta.encoder.layer.4.output.dense.bias discriminator.deberta.encoder.layer.4.output.LayerNorm.weight discriminator.deberta.encoder.layer.4.output.LayerNorm.bias discriminator.deberta.encoder.layer.5.attention.self.query_proj.weight discriminator.deberta.encoder.layer.5.attention.self.query_proj.bias discriminator.deberta.encoder.layer.5.attention.self.key_proj.weight discriminator.deberta.encoder.layer.5.attention.self.key_proj.bias discriminator.deberta.encoder.layer.5.attention.self.value_proj.weight discriminator.deberta.encoder.layer.5.attention.self.value_proj.bias discriminator.deberta.encoder.layer.5.attention.output.dense.weight discriminator.deberta.encoder.layer.5.attention.output.dense.bias discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.5.intermediate.dense.weight discriminator.deberta.encoder.layer.5.intermediate.dense.bias discriminator.deberta.encoder.layer.5.output.dense.weight discriminator.deberta.encoder.layer.5.output.dense.bias discriminator.deberta.encoder.layer.5.output.LayerNorm.weight discriminator.deberta.encoder.layer.5.output.LayerNorm.bias discriminator.deberta.encoder.layer.6.attention.self.query_proj.weight discriminator.deberta.encoder.layer.6.attention.self.query_proj.bias discriminator.deberta.encoder.layer.6.attention.self.key_proj.weight discriminator.deberta.encoder.layer.6.attention.self.key_proj.bias discriminator.deberta.encoder.layer.6.attention.self.value_proj.weight discriminator.deberta.encoder.layer.6.attention.self.value_proj.bias discriminator.deberta.encoder.layer.6.attention.output.dense.weight discriminator.deberta.encoder.layer.6.attention.output.dense.bias discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.6.intermediate.dense.weight discriminator.deberta.encoder.layer.6.intermediate.dense.bias discriminator.deberta.encoder.layer.6.output.dense.weight discriminator.deberta.encoder.layer.6.output.dense.bias discriminator.deberta.encoder.layer.6.output.LayerNorm.weight discriminator.deberta.encoder.layer.6.output.LayerNorm.bias discriminator.deberta.encoder.layer.7.attention.self.query_proj.weight discriminator.deberta.encoder.layer.7.attention.self.query_proj.bias discriminator.deberta.encoder.layer.7.attention.self.key_proj.weight discriminator.deberta.encoder.layer.7.attention.self.key_proj.bias discriminator.deberta.encoder.layer.7.attention.self.value_proj.weight discriminator.deberta.encoder.layer.7.attention.self.value_proj.bias discriminator.deberta.encoder.layer.7.attention.output.dense.weight discriminator.deberta.encoder.layer.7.attention.output.dense.bias discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.7.intermediate.dense.weight discriminator.deberta.encoder.layer.7.intermediate.dense.bias discriminator.deberta.encoder.layer.7.output.dense.weight discriminator.deberta.encoder.layer.7.output.dense.bias discriminator.deberta.encoder.layer.7.output.LayerNorm.weight discriminator.deberta.encoder.layer.7.output.LayerNorm.bias discriminator.deberta.encoder.layer.8.attention.self.query_proj.weight discriminator.deberta.encoder.layer.8.attention.self.query_proj.bias discriminator.deberta.encoder.layer.8.attention.self.key_proj.weight discriminator.deberta.encoder.layer.8.attention.self.key_proj.bias discriminator.deberta.encoder.layer.8.attention.self.value_proj.weight discriminator.deberta.encoder.layer.8.attention.self.value_proj.bias discriminator.deberta.encoder.layer.8.attention.output.dense.weight discriminator.deberta.encoder.layer.8.attention.output.dense.bias discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.8.intermediate.dense.weight discriminator.deberta.encoder.layer.8.intermediate.dense.bias discriminator.deberta.encoder.layer.8.output.dense.weight discriminator.deberta.encoder.layer.8.output.dense.bias discriminator.deberta.encoder.layer.8.output.LayerNorm.weight discriminator.deberta.encoder.layer.8.output.LayerNorm.bias discriminator.deberta.encoder.layer.9.attention.self.query_proj.weight discriminator.deberta.encoder.layer.9.attention.self.query_proj.bias discriminator.deberta.encoder.layer.9.attention.self.key_proj.weight discriminator.deberta.encoder.layer.9.attention.self.key_proj.bias discriminator.deberta.encoder.layer.9.attention.self.value_proj.weight discriminator.deberta.encoder.layer.9.attention.self.value_proj.bias discriminator.deberta.encoder.layer.9.attention.output.dense.weight discriminator.deberta.encoder.layer.9.attention.output.dense.bias discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.9.intermediate.dense.weight discriminator.deberta.encoder.layer.9.intermediate.dense.bias discriminator.deberta.encoder.layer.9.output.dense.weight discriminator.deberta.encoder.layer.9.output.dense.bias discriminator.deberta.encoder.layer.9.output.LayerNorm.weight discriminator.deberta.encoder.layer.9.output.LayerNorm.bias discriminator.deberta.encoder.layer.10.attention.self.query_proj.weight discriminator.deberta.encoder.layer.10.attention.self.query_proj.bias discriminator.deberta.encoder.layer.10.attention.self.key_proj.weight discriminator.deberta.encoder.layer.10.attention.self.key_proj.bias discriminator.deberta.encoder.layer.10.attention.self.value_proj.weight discriminator.deberta.encoder.layer.10.attention.self.value_proj.bias discriminator.deberta.encoder.layer.10.attention.output.dense.weight discriminator.deberta.encoder.layer.10.attention.output.dense.bias discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.10.intermediate.dense.weight discriminator.deberta.encoder.layer.10.intermediate.dense.bias discriminator.deberta.encoder.layer.10.output.dense.weight discriminator.deberta.encoder.layer.10.output.dense.bias discriminator.deberta.encoder.layer.10.output.LayerNorm.weight discriminator.deberta.encoder.layer.10.output.LayerNorm.bias discriminator.deberta.encoder.layer.11.attention.self.query_proj.weight discriminator.deberta.encoder.layer.11.attention.self.query_proj.bias discriminator.deberta.encoder.layer.11.attention.self.key_proj.weight discriminator.deberta.encoder.layer.11.attention.self.key_proj.bias discriminator.deberta.encoder.layer.11.attention.self.value_proj.weight discriminator.deberta.encoder.layer.11.attention.self.value_proj.bias discriminator.deberta.encoder.layer.11.attention.output.dense.weight discriminator.deberta.encoder.layer.11.attention.output.dense.bias discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.11.intermediate.dense.weight discriminator.deberta.encoder.layer.11.intermediate.dense.bias discriminator.deberta.encoder.layer.11.output.dense.weight discriminator.deberta.encoder.layer.11.output.dense.bias discriminator.deberta.encoder.layer.11.output.LayerNorm.weight discriminator.deberta.encoder.layer.11.output.LayerNorm.bias discriminator.deberta.encoder.layer.12.attention.self.query_proj.weight discriminator.deberta.encoder.layer.12.attention.self.query_proj.bias discriminator.deberta.encoder.layer.12.attention.self.key_proj.weight discriminator.deberta.encoder.layer.12.attention.self.key_proj.bias discriminator.deberta.encoder.layer.12.attention.self.value_proj.weight discriminator.deberta.encoder.layer.12.attention.self.value_proj.bias discriminator.deberta.encoder.layer.12.attention.output.dense.weight discriminator.deberta.encoder.layer.12.attention.output.dense.bias discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.12.intermediate.dense.weight discriminator.deberta.encoder.layer.12.intermediate.dense.bias discriminator.deberta.encoder.layer.12.output.dense.weight discriminator.deberta.encoder.layer.12.output.dense.bias discriminator.deberta.encoder.layer.12.output.LayerNorm.weight discriminator.deberta.encoder.layer.12.output.LayerNorm.bias discriminator.deberta.encoder.layer.13.attention.self.query_proj.weight discriminator.deberta.encoder.layer.13.attention.self.query_proj.bias discriminator.deberta.encoder.layer.13.attention.self.key_proj.weight discriminator.deberta.encoder.layer.13.attention.self.key_proj.bias discriminator.deberta.encoder.layer.13.attention.self.value_proj.weight discriminator.deberta.encoder.layer.13.attention.self.value_proj.bias discriminator.deberta.encoder.layer.13.attention.output.dense.weight discriminator.deberta.encoder.layer.13.attention.output.dense.bias discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.13.intermediate.dense.weight discriminator.deberta.encoder.layer.13.intermediate.dense.bias discriminator.deberta.encoder.layer.13.output.dense.weight discriminator.deberta.encoder.layer.13.output.dense.bias discriminator.deberta.encoder.layer.13.output.LayerNorm.weight discriminator.deberta.encoder.layer.13.output.LayerNorm.bias discriminator.deberta.encoder.layer.14.attention.self.query_proj.weight discriminator.deberta.encoder.layer.14.attention.self.query_proj.bias discriminator.deberta.encoder.layer.14.attention.self.key_proj.weight discriminator.deberta.encoder.layer.14.attention.self.key_proj.bias discriminator.deberta.encoder.layer.14.attention.self.value_proj.weight discriminator.deberta.encoder.layer.14.attention.self.value_proj.bias discriminator.deberta.encoder.layer.14.attention.output.dense.weight discriminator.deberta.encoder.layer.14.attention.output.dense.bias discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.14.intermediate.dense.weight discriminator.deberta.encoder.layer.14.intermediate.dense.bias discriminator.deberta.encoder.layer.14.output.dense.weight discriminator.deberta.encoder.layer.14.output.dense.bias discriminator.deberta.encoder.layer.14.output.LayerNorm.weight discriminator.deberta.encoder.layer.14.output.LayerNorm.bias discriminator.deberta.encoder.layer.15.attention.self.query_proj.weight discriminator.deberta.encoder.layer.15.attention.self.query_proj.bias discriminator.deberta.encoder.layer.15.attention.self.key_proj.weight discriminator.deberta.encoder.layer.15.attention.self.key_proj.bias discriminator.deberta.encoder.layer.15.attention.self.value_proj.weight discriminator.deberta.encoder.layer.15.attention.self.value_proj.bias discriminator.deberta.encoder.layer.15.attention.output.dense.weight discriminator.deberta.encoder.layer.15.attention.output.dense.bias discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.15.intermediate.dense.weight discriminator.deberta.encoder.layer.15.intermediate.dense.bias discriminator.deberta.encoder.layer.15.output.dense.weight discriminator.deberta.encoder.layer.15.output.dense.bias discriminator.deberta.encoder.layer.15.output.LayerNorm.weight discriminator.deberta.encoder.layer.15.output.LayerNorm.bias discriminator.deberta.encoder.layer.16.attention.self.query_proj.weight discriminator.deberta.encoder.layer.16.attention.self.query_proj.bias discriminator.deberta.encoder.layer.16.attention.self.key_proj.weight discriminator.deberta.encoder.layer.16.attention.self.key_proj.bias discriminator.deberta.encoder.layer.16.attention.self.value_proj.weight discriminator.deberta.encoder.layer.16.attention.self.value_proj.bias discriminator.deberta.encoder.layer.16.attention.output.dense.weight discriminator.deberta.encoder.layer.16.attention.output.dense.bias discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.16.intermediate.dense.weight discriminator.deberta.encoder.layer.16.intermediate.dense.bias discriminator.deberta.encoder.layer.16.output.dense.weight discriminator.deberta.encoder.layer.16.output.dense.bias discriminator.deberta.encoder.layer.16.output.LayerNorm.weight discriminator.deberta.encoder.layer.16.output.LayerNorm.bias discriminator.deberta.encoder.layer.17.attention.self.query_proj.weight discriminator.deberta.encoder.layer.17.attention.self.query_proj.bias discriminator.deberta.encoder.layer.17.attention.self.key_proj.weight discriminator.deberta.encoder.layer.17.attention.self.key_proj.bias discriminator.deberta.encoder.layer.17.attention.self.value_proj.weight discriminator.deberta.encoder.layer.17.attention.self.value_proj.bias discriminator.deberta.encoder.layer.17.attention.output.dense.weight discriminator.deberta.encoder.layer.17.attention.output.dense.bias discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.17.intermediate.dense.weight discriminator.deberta.encoder.layer.17.intermediate.dense.bias discriminator.deberta.encoder.layer.17.output.dense.weight discriminator.deberta.encoder.layer.17.output.dense.bias discriminator.deberta.encoder.layer.17.output.LayerNorm.weight discriminator.deberta.encoder.layer.17.output.LayerNorm.bias discriminator.deberta.encoder.layer.18.attention.self.query_proj.weight discriminator.deberta.encoder.layer.18.attention.self.query_proj.bias discriminator.deberta.encoder.layer.18.attention.self.key_proj.weight discriminator.deberta.encoder.layer.18.attention.self.key_proj.bias discriminator.deberta.encoder.layer.18.attention.self.value_proj.weight discriminator.deberta.encoder.layer.18.attention.self.value_proj.bias discriminator.deberta.encoder.layer.18.attention.output.dense.weight discriminator.deberta.encoder.layer.18.attention.output.dense.bias discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.18.intermediate.dense.weight discriminator.deberta.encoder.layer.18.intermediate.dense.bias discriminator.deberta.encoder.layer.18.output.dense.weight discriminator.deberta.encoder.layer.18.output.dense.bias discriminator.deberta.encoder.layer.18.output.LayerNorm.weight discriminator.deberta.encoder.layer.18.output.LayerNorm.bias discriminator.deberta.encoder.layer.19.attention.self.query_proj.weight discriminator.deberta.encoder.layer.19.attention.self.query_proj.bias discriminator.deberta.encoder.layer.19.attention.self.key_proj.weight discriminator.deberta.encoder.layer.19.attention.self.key_proj.bias discriminator.deberta.encoder.layer.19.attention.self.value_proj.weight discriminator.deberta.encoder.layer.19.attention.self.value_proj.bias discriminator.deberta.encoder.layer.19.attention.output.dense.weight discriminator.deberta.encoder.layer.19.attention.output.dense.bias discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.19.intermediate.dense.weight discriminator.deberta.encoder.layer.19.intermediate.dense.bias discriminator.deberta.encoder.layer.19.output.dense.weight discriminator.deberta.encoder.layer.19.output.dense.bias discriminator.deberta.encoder.layer.19.output.LayerNorm.weight discriminator.deberta.encoder.layer.19.output.LayerNorm.bias discriminator.deberta.encoder.layer.20.attention.self.query_proj.weight discriminator.deberta.encoder.layer.20.attention.self.query_proj.bias discriminator.deberta.encoder.layer.20.attention.self.key_proj.weight discriminator.deberta.encoder.layer.20.attention.self.key_proj.bias discriminator.deberta.encoder.layer.20.attention.self.value_proj.weight discriminator.deberta.encoder.layer.20.attention.self.value_proj.bias discriminator.deberta.encoder.layer.20.attention.output.dense.weight discriminator.deberta.encoder.layer.20.attention.output.dense.bias discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.20.intermediate.dense.weight discriminator.deberta.encoder.layer.20.intermediate.dense.bias discriminator.deberta.encoder.layer.20.output.dense.weight discriminator.deberta.encoder.layer.20.output.dense.bias discriminator.deberta.encoder.layer.20.output.LayerNorm.weight discriminator.deberta.encoder.layer.20.output.LayerNorm.bias discriminator.deberta.encoder.layer.21.attention.self.query_proj.weight discriminator.deberta.encoder.layer.21.attention.self.query_proj.bias discriminator.deberta.encoder.layer.21.attention.self.key_proj.weight discriminator.deberta.encoder.layer.21.attention.self.key_proj.bias discriminator.deberta.encoder.layer.21.attention.self.value_proj.weight discriminator.deberta.encoder.layer.21.attention.self.value_proj.bias discriminator.deberta.encoder.layer.21.attention.output.dense.weight discriminator.deberta.encoder.layer.21.attention.output.dense.bias discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.21.intermediate.dense.weight discriminator.deberta.encoder.layer.21.intermediate.dense.bias discriminator.deberta.encoder.layer.21.output.dense.weight discriminator.deberta.encoder.layer.21.output.dense.bias discriminator.deberta.encoder.layer.21.output.LayerNorm.weight discriminator.deberta.encoder.layer.21.output.LayerNorm.bias discriminator.deberta.encoder.layer.22.attention.self.query_proj.weight discriminator.deberta.encoder.layer.22.attention.self.query_proj.bias discriminator.deberta.encoder.layer.22.attention.self.key_proj.weight discriminator.deberta.encoder.layer.22.attention.self.key_proj.bias discriminator.deberta.encoder.layer.22.attention.self.value_proj.weight discriminator.deberta.encoder.layer.22.attention.self.value_proj.bias discriminator.deberta.encoder.layer.22.attention.output.dense.weight discriminator.deberta.encoder.layer.22.attention.output.dense.bias discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.22.intermediate.dense.weight discriminator.deberta.encoder.layer.22.intermediate.dense.bias discriminator.deberta.encoder.layer.22.output.dense.weight discriminator.deberta.encoder.layer.22.output.dense.bias discriminator.deberta.encoder.layer.22.output.LayerNorm.weight discriminator.deberta.encoder.layer.22.output.LayerNorm.bias discriminator.deberta.encoder.layer.23.attention.self.query_proj.weight discriminator.deberta.encoder.layer.23.attention.self.query_proj.bias discriminator.deberta.encoder.layer.23.attention.self.key_proj.weight discriminator.deberta.encoder.layer.23.attention.self.key_proj.bias discriminator.deberta.encoder.layer.23.attention.self.value_proj.weight discriminator.deberta.encoder.layer.23.attention.self.value_proj.bias discriminator.deberta.encoder.layer.23.attention.output.dense.weight discriminator.deberta.encoder.layer.23.attention.output.dense.bias discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.weight discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.bias discriminator.deberta.encoder.layer.23.intermediate.dense.weight discriminator.deberta.encoder.layer.23.intermediate.dense.bias discriminator.deberta.encoder.layer.23.output.dense.weight discriminator.deberta.encoder.layer.23.output.dense.bias discriminator.deberta.encoder.layer.23.output.LayerNorm.weight discriminator.deberta.encoder.layer.23.output.LayerNorm.bias discriminator.deberta.encoder.rel_embeddings.weight discriminator.deberta.encoder.LayerNorm.weight discriminator.deberta.encoder.LayerNorm.bias discriminator.mask_predictions.dense.weight discriminator.mask_predictions.dense.bias discriminator.mask_predictions.LayerNorm.weight discriminator.mask_predictions.LayerNorm.bias discriminator.mask_predictions.classifier.weight discriminator.mask_predictions.classifier.bias ] ```
#### Method 2: Manually load pretrained vi-deberta-v3-large - Dev YourTokenizer, Your Model ```python class YourTokenizer: @classmethod def from_pretrained(model_name_or_path, **kwargs): # https://huggingface.co/anhdungitvn/vi-deberta-v3-large/blob/main/tokenizer/spm.model pass class YourModel: @classmethod def from_pretrained(model_name_or_path, **kwargs): # Discriminator: https://huggingface.co/anhdungitvn/vi-deberta-v3-large/tree/main/discriminator # Generator: https://huggingface.co/anhdungitvn/vi-deberta-v3-large/tree/main/generator pass ``` - Use ```python tokenizer = YourTokenizer.from_pretrained("anhdungitvn/vi-deberta-v3-large") tokenizer = YourModel.from_pretrained("anhdungitvn/vi-deberta-v3-large") ``` ### Log
Details Logs: - 2023-03-29: init, todolist - 2023-03-30: data preparation - vi_wiki_23: lastest - vi_news_17g: available - 2023-03-30: tokenizer training - algorithm: unigram, bpe - size: 8k, 16k, 32k, 64k, 128k, 256k - 2023-03-31: training trials - config: base, large - tokenizer: unigram_16k, bpe_16k, bpe_128k - args: default changed batch_size, grad_acc - optimizer: default, customized optimizer - 2023-03-31: training phase 0 started - config: large - tokenizer: bpe_128k - args: default - GPU: 5x A100-SXM4-80G - 2023-04-04: training interrupted unintentionally, optimizer checkpoint none, step 300000 - 2023-04-05: training resumed from lastcheckpoint 300000, learning_rate addjusted 100µ -> 50µ - 2023-04-10: sweet spot detected, step 800000 - 2023-04-11: training in progress, step 900000, loss increases, accuracy increases, regularization being working well, overfitting problem under monitoring - 2023-04-12: training interrupted intentionally, step 1000000 - 2023-04-12: training phase 1 started, resumed intentionally, refining, learning_rate -> 2µ (diverging) - 2023-04-24: training phase 1 finished, refining, step 1500000 - 2023-04-24: training phase 1 finished, refining, step 1500000 - 2023-04-26: training phase 2 started, resumed intentionally, step 1500000 - 2023-05-02: training phase 2 interrupted unintentionally, step 1980000 - 2023-05-05: training phase 2 resumed intentionally, step 2000000 - 2023-05-09: training in progress, step 2200000