# vi-deberta-v3-large
**Repository Path**: modelee/vi-deberta-v3-large
## Basic Information
- **Project Name**: vi-deberta-v3-large
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 4
- **Forks**: 0
- **Created**: 2023-05-23
- **Last Updated**: 2025-09-02
## Categories & Tags
**Categories**: llm
**Tags**: None
## README
---
license: other
language:
- vi
metrics:
- accuracy
- perplexity
datasets:
- anhdungitvn/vi-general-64g
---
# Vietnamese DebertaV3 Large (vi-deberta-v3-large)
### Todo
```
[x] Corpora collection
[x] Tokenizer training
[x] Model pretraining
[ ] Model finetuning
[ ] Experimental results, comparision, and conclusion
```
## Model Info
```
LAYER NAME #PARAMS RATIO MEM(MB)
--model: 851,542,017 100.00% 3248.38
--generator: 284,459,008 33.41% 1085.12
--deberta: 283,279,360 33.27% 1080.62
--lm_predictions: 1,179,648 0.14% 4.50
--discriminator: 567,083,009 66.59% 2163.25
--deberta: 566,030,336 66.47% 2159.23
--mask_predictions: 1,052,673 0.12% 4.02
```
## Model Perfomance
| Metric | Value |
|--------------|----------------------------|
| accuracy | 0.7113977778702509 |
| eval_loss | 1.3216993808746338 |
| eval_metric | 0.7113977778702509 |
| eval_samples | 240310 |
| perplexity | 3.749788284301758 |
| best_metric | 0.7113977778702509@2200000 |
| train_steps | 2200000 |
| train_loss | 1.1960319969044906 |
## TL;DR
| Aspect | Sub-Aspect | Description |
|-------------|--------------|--------------------------------------------------|
| Corpus | Language | Vietnames |
| | Source | Wiki 2023 (1GB), News 2023 (17.2GB), News (64GB) |
| | Size | 1GB, 18GB, 64GB |
| | Preprocesing | None |
| Tokenizer | Lib | SentencePiece |
| | Algorithm | BPE |
| | Type | spm |
| | Vocab | 128000 |
| | Ref | https://github.com/google/sentencepiece |
| Model | Type | DeBERTaV3 |
| | Ref | https://openreview.net/forum?id=sE7-XhLxHA |
| | Code | https://github.com/microsoft/DeBERTa |
| Pretraining | Task | RTD |
| | Config | model_config.json |
| | Args | default |
| | Hardware | 5x Nvidia A100-SXM4-80G, 2x Nvidia 4090-PCI-24GB |
| | Phases | Init, Refining, Enlarging |
| | Status | Training on hold step 2200000 |
| Finetuning | Status | Not started (need help) |
## Repo
```
📁vi-deberta-v3-large
|---🗎config.json
|---🗎pytorch_model.bin
|---🗎spm.model
|---tl;dr.pdf
|---📁discriminator
|---📁generator
|---📁tokenizer
|---📁metrics
|---📁logs
```
### Pretraining
Phase 0: Init
### Info
- Goal: Init
- Progress: 30.00% ▓▓▓▓░░░░░░
- Status: training interrupped, step 1000000
- Loss: **init loss**
### Metrics
Phase 1: Refining
### Info
- Goal: refining
- Changes: smaller learning rate (100µ -> 2µ)
- Progress: 100.00% ▓▓▓▓▓▓▓▓▓▓
- Status: training finished, step 1500000
- Loss: **refining loss**
### Metrics
| Metric | Value |
|--------------|----------------------------|
| accuracy | 0.7515653334245732 |
| eval_loss | 1.0692176818847656 |
| eval_metric | 0.7515653334245732 |
| eval_samples | 29227 |
| perplexity | 2.913099527359009 |
| best_metric | 0.7522154172511719@1450000 |
| train_steps | 1500000 |
| train_loss | 1.1779516744723688 |
Phase 2: Enlarging
### Info
- Goal: enlarging, augmenting, expanding data
- Changes: smaller learning rate (2µ -> 1µ), larger corpus (18G -> 64GB), eval samples (wiki 29227 -> news 240310)
- Progress: 20.00% ▓▓░░░░░░░░
- Status: training in progress, step 2000000
- Loss: **enlarging loss**
### Metrics
| Metric | Value |
|--------------|----------------------------|
| accuracy | 0.7084723898298032 |
| eval_loss | 1.3221531009674072 |
| eval_metric | 0.7084723898298032 |
| eval_samples | 240310 |
| perplexity | 3.7141621112823486 |
| best_metric | 0.7084723898298032@2000000 |
| train_steps | 2000000 |
| train_loss | 1.2167873119241372 |
#### Phase 2: Enlarging (resume, on hold)
- Goal: enlarging, augmenting, expanding data
- Changes: smaller learning rate (2µ -> 1µ), larger corpus (18G -> 64GB), eval samples (wiki 29227 -> news 240310)
- Progress: 20.00% ▓▓░░░░░░░░
- Status: training phase 2 interrupted intentionally, step 2200000
- Loss: **enlarging loss**
- Metrics:
| Metric | Value |
|--------------|----------------------------|
| accuracy | 0.7113977778702509 |
| eval_loss | 1.3216993808746338 |
| eval_metric | 0.7113977778702509 |
| eval_samples | 240310 |
| perplexity | 3.749788284301758 |
| best_metric | 0.7113977778702509@2200000 |
| train_steps | 2200000 |
| train_loss | 1.1960319969044906 |
### Finetuning **NEED HELP**
- Token Classification: tasks, datasets
- Sequence Classification: tasks, datasets
## Experimental Results and Comparision
- Not started
## Usage

#### Method 1: Load pretrained vi-deberta-v3-large with Transformers AutoClass
- Install ClickAI
```
# Why need to install clickai?
# HF Transformers has not yet supported DebertaV3.
# ClickAI: locally register DeBertaV3 with HF Transformers.
```
```bash
pip install git+https://gitlab.com/anhdungvo/clickai.git
```
- Tokenizer, Config, Model
```python
import clickai
from transformers import AutoTokenizer, AutoConfig, AutoModel
config = AutoConfig.from_pretrained("anhdungitvn/vi-deberta-v3-large")
tokenizer = AutoTokenizer.from_pretrained("anhdungitvn/vi-deberta-v3-large")
model = AutoModel.from_pretrained("anhdungitvn/vi-deberta-v3-large")
tokenizer("Xử lý ngôn ngữ tiếng Việt", return_tensors='pt')
```
- Transfer Pretrained Model Weights
```python
your_model.load_state_dict(model.GET_NEEDED_MODULE_WEIGHTS())
```
Model state dict keys
```python
model.state_dict().keys()
```
```python
[
generator.deberta.embeddings.word_embeddings.weight
generator.deberta.embeddings.position_embeddings.weight
generator.deberta.embeddings.LayerNorm.weight
generator.deberta.embeddings.LayerNorm.bias
generator.deberta.encoder.layer.0.attention.self.query_proj.weight
generator.deberta.encoder.layer.0.attention.self.query_proj.bias
generator.deberta.encoder.layer.0.attention.self.key_proj.weight
generator.deberta.encoder.layer.0.attention.self.key_proj.bias
generator.deberta.encoder.layer.0.attention.self.value_proj.weight
generator.deberta.encoder.layer.0.attention.self.value_proj.bias
generator.deberta.encoder.layer.0.attention.output.dense.weight
generator.deberta.encoder.layer.0.attention.output.dense.bias
generator.deberta.encoder.layer.0.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.0.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.0.intermediate.dense.weight
generator.deberta.encoder.layer.0.intermediate.dense.bias
generator.deberta.encoder.layer.0.output.dense.weight
generator.deberta.encoder.layer.0.output.dense.bias
generator.deberta.encoder.layer.0.output.LayerNorm.weight
generator.deberta.encoder.layer.0.output.LayerNorm.bias
generator.deberta.encoder.layer.1.attention.self.query_proj.weight
generator.deberta.encoder.layer.1.attention.self.query_proj.bias
generator.deberta.encoder.layer.1.attention.self.key_proj.weight
generator.deberta.encoder.layer.1.attention.self.key_proj.bias
generator.deberta.encoder.layer.1.attention.self.value_proj.weight
generator.deberta.encoder.layer.1.attention.self.value_proj.bias
generator.deberta.encoder.layer.1.attention.output.dense.weight
generator.deberta.encoder.layer.1.attention.output.dense.bias
generator.deberta.encoder.layer.1.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.1.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.1.intermediate.dense.weight
generator.deberta.encoder.layer.1.intermediate.dense.bias
generator.deberta.encoder.layer.1.output.dense.weight
generator.deberta.encoder.layer.1.output.dense.bias
generator.deberta.encoder.layer.1.output.LayerNorm.weight
generator.deberta.encoder.layer.1.output.LayerNorm.bias
generator.deberta.encoder.layer.2.attention.self.query_proj.weight
generator.deberta.encoder.layer.2.attention.self.query_proj.bias
generator.deberta.encoder.layer.2.attention.self.key_proj.weight
generator.deberta.encoder.layer.2.attention.self.key_proj.bias
generator.deberta.encoder.layer.2.attention.self.value_proj.weight
generator.deberta.encoder.layer.2.attention.self.value_proj.bias
generator.deberta.encoder.layer.2.attention.output.dense.weight
generator.deberta.encoder.layer.2.attention.output.dense.bias
generator.deberta.encoder.layer.2.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.2.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.2.intermediate.dense.weight
generator.deberta.encoder.layer.2.intermediate.dense.bias
generator.deberta.encoder.layer.2.output.dense.weight
generator.deberta.encoder.layer.2.output.dense.bias
generator.deberta.encoder.layer.2.output.LayerNorm.weight
generator.deberta.encoder.layer.2.output.LayerNorm.bias
generator.deberta.encoder.layer.3.attention.self.query_proj.weight
generator.deberta.encoder.layer.3.attention.self.query_proj.bias
generator.deberta.encoder.layer.3.attention.self.key_proj.weight
generator.deberta.encoder.layer.3.attention.self.key_proj.bias
generator.deberta.encoder.layer.3.attention.self.value_proj.weight
generator.deberta.encoder.layer.3.attention.self.value_proj.bias
generator.deberta.encoder.layer.3.attention.output.dense.weight
generator.deberta.encoder.layer.3.attention.output.dense.bias
generator.deberta.encoder.layer.3.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.3.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.3.intermediate.dense.weight
generator.deberta.encoder.layer.3.intermediate.dense.bias
generator.deberta.encoder.layer.3.output.dense.weight
generator.deberta.encoder.layer.3.output.dense.bias
generator.deberta.encoder.layer.3.output.LayerNorm.weight
generator.deberta.encoder.layer.3.output.LayerNorm.bias
generator.deberta.encoder.layer.4.attention.self.query_proj.weight
generator.deberta.encoder.layer.4.attention.self.query_proj.bias
generator.deberta.encoder.layer.4.attention.self.key_proj.weight
generator.deberta.encoder.layer.4.attention.self.key_proj.bias
generator.deberta.encoder.layer.4.attention.self.value_proj.weight
generator.deberta.encoder.layer.4.attention.self.value_proj.bias
generator.deberta.encoder.layer.4.attention.output.dense.weight
generator.deberta.encoder.layer.4.attention.output.dense.bias
generator.deberta.encoder.layer.4.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.4.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.4.intermediate.dense.weight
generator.deberta.encoder.layer.4.intermediate.dense.bias
generator.deberta.encoder.layer.4.output.dense.weight
generator.deberta.encoder.layer.4.output.dense.bias
generator.deberta.encoder.layer.4.output.LayerNorm.weight
generator.deberta.encoder.layer.4.output.LayerNorm.bias
generator.deberta.encoder.layer.5.attention.self.query_proj.weight
generator.deberta.encoder.layer.5.attention.self.query_proj.bias
generator.deberta.encoder.layer.5.attention.self.key_proj.weight
generator.deberta.encoder.layer.5.attention.self.key_proj.bias
generator.deberta.encoder.layer.5.attention.self.value_proj.weight
generator.deberta.encoder.layer.5.attention.self.value_proj.bias
generator.deberta.encoder.layer.5.attention.output.dense.weight
generator.deberta.encoder.layer.5.attention.output.dense.bias
generator.deberta.encoder.layer.5.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.5.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.5.intermediate.dense.weight
generator.deberta.encoder.layer.5.intermediate.dense.bias
generator.deberta.encoder.layer.5.output.dense.weight
generator.deberta.encoder.layer.5.output.dense.bias
generator.deberta.encoder.layer.5.output.LayerNorm.weight
generator.deberta.encoder.layer.5.output.LayerNorm.bias
generator.deberta.encoder.layer.6.attention.self.query_proj.weight
generator.deberta.encoder.layer.6.attention.self.query_proj.bias
generator.deberta.encoder.layer.6.attention.self.key_proj.weight
generator.deberta.encoder.layer.6.attention.self.key_proj.bias
generator.deberta.encoder.layer.6.attention.self.value_proj.weight
generator.deberta.encoder.layer.6.attention.self.value_proj.bias
generator.deberta.encoder.layer.6.attention.output.dense.weight
generator.deberta.encoder.layer.6.attention.output.dense.bias
generator.deberta.encoder.layer.6.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.6.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.6.intermediate.dense.weight
generator.deberta.encoder.layer.6.intermediate.dense.bias
generator.deberta.encoder.layer.6.output.dense.weight
generator.deberta.encoder.layer.6.output.dense.bias
generator.deberta.encoder.layer.6.output.LayerNorm.weight
generator.deberta.encoder.layer.6.output.LayerNorm.bias
generator.deberta.encoder.layer.7.attention.self.query_proj.weight
generator.deberta.encoder.layer.7.attention.self.query_proj.bias
generator.deberta.encoder.layer.7.attention.self.key_proj.weight
generator.deberta.encoder.layer.7.attention.self.key_proj.bias
generator.deberta.encoder.layer.7.attention.self.value_proj.weight
generator.deberta.encoder.layer.7.attention.self.value_proj.bias
generator.deberta.encoder.layer.7.attention.output.dense.weight
generator.deberta.encoder.layer.7.attention.output.dense.bias
generator.deberta.encoder.layer.7.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.7.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.7.intermediate.dense.weight
generator.deberta.encoder.layer.7.intermediate.dense.bias
generator.deberta.encoder.layer.7.output.dense.weight
generator.deberta.encoder.layer.7.output.dense.bias
generator.deberta.encoder.layer.7.output.LayerNorm.weight
generator.deberta.encoder.layer.7.output.LayerNorm.bias
generator.deberta.encoder.layer.8.attention.self.query_proj.weight
generator.deberta.encoder.layer.8.attention.self.query_proj.bias
generator.deberta.encoder.layer.8.attention.self.key_proj.weight
generator.deberta.encoder.layer.8.attention.self.key_proj.bias
generator.deberta.encoder.layer.8.attention.self.value_proj.weight
generator.deberta.encoder.layer.8.attention.self.value_proj.bias
generator.deberta.encoder.layer.8.attention.output.dense.weight
generator.deberta.encoder.layer.8.attention.output.dense.bias
generator.deberta.encoder.layer.8.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.8.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.8.intermediate.dense.weight
generator.deberta.encoder.layer.8.intermediate.dense.bias
generator.deberta.encoder.layer.8.output.dense.weight
generator.deberta.encoder.layer.8.output.dense.bias
generator.deberta.encoder.layer.8.output.LayerNorm.weight
generator.deberta.encoder.layer.8.output.LayerNorm.bias
generator.deberta.encoder.layer.9.attention.self.query_proj.weight
generator.deberta.encoder.layer.9.attention.self.query_proj.bias
generator.deberta.encoder.layer.9.attention.self.key_proj.weight
generator.deberta.encoder.layer.9.attention.self.key_proj.bias
generator.deberta.encoder.layer.9.attention.self.value_proj.weight
generator.deberta.encoder.layer.9.attention.self.value_proj.bias
generator.deberta.encoder.layer.9.attention.output.dense.weight
generator.deberta.encoder.layer.9.attention.output.dense.bias
generator.deberta.encoder.layer.9.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.9.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.9.intermediate.dense.weight
generator.deberta.encoder.layer.9.intermediate.dense.bias
generator.deberta.encoder.layer.9.output.dense.weight
generator.deberta.encoder.layer.9.output.dense.bias
generator.deberta.encoder.layer.9.output.LayerNorm.weight
generator.deberta.encoder.layer.9.output.LayerNorm.bias
generator.deberta.encoder.layer.10.attention.self.query_proj.weight
generator.deberta.encoder.layer.10.attention.self.query_proj.bias
generator.deberta.encoder.layer.10.attention.self.key_proj.weight
generator.deberta.encoder.layer.10.attention.self.key_proj.bias
generator.deberta.encoder.layer.10.attention.self.value_proj.weight
generator.deberta.encoder.layer.10.attention.self.value_proj.bias
generator.deberta.encoder.layer.10.attention.output.dense.weight
generator.deberta.encoder.layer.10.attention.output.dense.bias
generator.deberta.encoder.layer.10.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.10.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.10.intermediate.dense.weight
generator.deberta.encoder.layer.10.intermediate.dense.bias
generator.deberta.encoder.layer.10.output.dense.weight
generator.deberta.encoder.layer.10.output.dense.bias
generator.deberta.encoder.layer.10.output.LayerNorm.weight
generator.deberta.encoder.layer.10.output.LayerNorm.bias
generator.deberta.encoder.layer.11.attention.self.query_proj.weight
generator.deberta.encoder.layer.11.attention.self.query_proj.bias
generator.deberta.encoder.layer.11.attention.self.key_proj.weight
generator.deberta.encoder.layer.11.attention.self.key_proj.bias
generator.deberta.encoder.layer.11.attention.self.value_proj.weight
generator.deberta.encoder.layer.11.attention.self.value_proj.bias
generator.deberta.encoder.layer.11.attention.output.dense.weight
generator.deberta.encoder.layer.11.attention.output.dense.bias
generator.deberta.encoder.layer.11.attention.output.LayerNorm.weight
generator.deberta.encoder.layer.11.attention.output.LayerNorm.bias
generator.deberta.encoder.layer.11.intermediate.dense.weight
generator.deberta.encoder.layer.11.intermediate.dense.bias
generator.deberta.encoder.layer.11.output.dense.weight
generator.deberta.encoder.layer.11.output.dense.bias
generator.deberta.encoder.layer.11.output.LayerNorm.weight
generator.deberta.encoder.layer.11.output.LayerNorm.bias
generator.deberta.encoder.rel_embeddings.weight
generator.deberta.encoder.LayerNorm.weight
generator.deberta.encoder.LayerNorm.bias
generator.lm_predictions.lm_head.bias
generator.lm_predictions.lm_head.dense.weight
generator.lm_predictions.lm_head.dense.bias
generator.lm_predictions.lm_head.LayerNorm.weight
generator.lm_predictions.lm_head.LayerNorm.bias
discriminator.deberta.embeddings.word_embeddings.weight
discriminator.deberta.embeddings.word_embeddings._weight
discriminator.deberta.embeddings.position_embeddings.weight
discriminator.deberta.embeddings.position_embeddings._weight
discriminator.deberta.embeddings.LayerNorm.weight
discriminator.deberta.embeddings.LayerNorm.bias
discriminator.deberta.encoder.layer.0.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.0.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.0.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.0.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.0.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.0.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.0.attention.output.dense.weight
discriminator.deberta.encoder.layer.0.attention.output.dense.bias
discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.0.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.0.intermediate.dense.weight
discriminator.deberta.encoder.layer.0.intermediate.dense.bias
discriminator.deberta.encoder.layer.0.output.dense.weight
discriminator.deberta.encoder.layer.0.output.dense.bias
discriminator.deberta.encoder.layer.0.output.LayerNorm.weight
discriminator.deberta.encoder.layer.0.output.LayerNorm.bias
discriminator.deberta.encoder.layer.1.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.1.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.1.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.1.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.1.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.1.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.1.attention.output.dense.weight
discriminator.deberta.encoder.layer.1.attention.output.dense.bias
discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.1.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.1.intermediate.dense.weight
discriminator.deberta.encoder.layer.1.intermediate.dense.bias
discriminator.deberta.encoder.layer.1.output.dense.weight
discriminator.deberta.encoder.layer.1.output.dense.bias
discriminator.deberta.encoder.layer.1.output.LayerNorm.weight
discriminator.deberta.encoder.layer.1.output.LayerNorm.bias
discriminator.deberta.encoder.layer.2.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.2.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.2.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.2.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.2.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.2.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.2.attention.output.dense.weight
discriminator.deberta.encoder.layer.2.attention.output.dense.bias
discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.2.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.2.intermediate.dense.weight
discriminator.deberta.encoder.layer.2.intermediate.dense.bias
discriminator.deberta.encoder.layer.2.output.dense.weight
discriminator.deberta.encoder.layer.2.output.dense.bias
discriminator.deberta.encoder.layer.2.output.LayerNorm.weight
discriminator.deberta.encoder.layer.2.output.LayerNorm.bias
discriminator.deberta.encoder.layer.3.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.3.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.3.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.3.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.3.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.3.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.3.attention.output.dense.weight
discriminator.deberta.encoder.layer.3.attention.output.dense.bias
discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.3.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.3.intermediate.dense.weight
discriminator.deberta.encoder.layer.3.intermediate.dense.bias
discriminator.deberta.encoder.layer.3.output.dense.weight
discriminator.deberta.encoder.layer.3.output.dense.bias
discriminator.deberta.encoder.layer.3.output.LayerNorm.weight
discriminator.deberta.encoder.layer.3.output.LayerNorm.bias
discriminator.deberta.encoder.layer.4.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.4.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.4.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.4.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.4.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.4.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.4.attention.output.dense.weight
discriminator.deberta.encoder.layer.4.attention.output.dense.bias
discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.4.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.4.intermediate.dense.weight
discriminator.deberta.encoder.layer.4.intermediate.dense.bias
discriminator.deberta.encoder.layer.4.output.dense.weight
discriminator.deberta.encoder.layer.4.output.dense.bias
discriminator.deberta.encoder.layer.4.output.LayerNorm.weight
discriminator.deberta.encoder.layer.4.output.LayerNorm.bias
discriminator.deberta.encoder.layer.5.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.5.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.5.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.5.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.5.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.5.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.5.attention.output.dense.weight
discriminator.deberta.encoder.layer.5.attention.output.dense.bias
discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.5.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.5.intermediate.dense.weight
discriminator.deberta.encoder.layer.5.intermediate.dense.bias
discriminator.deberta.encoder.layer.5.output.dense.weight
discriminator.deberta.encoder.layer.5.output.dense.bias
discriminator.deberta.encoder.layer.5.output.LayerNorm.weight
discriminator.deberta.encoder.layer.5.output.LayerNorm.bias
discriminator.deberta.encoder.layer.6.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.6.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.6.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.6.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.6.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.6.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.6.attention.output.dense.weight
discriminator.deberta.encoder.layer.6.attention.output.dense.bias
discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.6.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.6.intermediate.dense.weight
discriminator.deberta.encoder.layer.6.intermediate.dense.bias
discriminator.deberta.encoder.layer.6.output.dense.weight
discriminator.deberta.encoder.layer.6.output.dense.bias
discriminator.deberta.encoder.layer.6.output.LayerNorm.weight
discriminator.deberta.encoder.layer.6.output.LayerNorm.bias
discriminator.deberta.encoder.layer.7.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.7.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.7.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.7.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.7.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.7.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.7.attention.output.dense.weight
discriminator.deberta.encoder.layer.7.attention.output.dense.bias
discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.7.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.7.intermediate.dense.weight
discriminator.deberta.encoder.layer.7.intermediate.dense.bias
discriminator.deberta.encoder.layer.7.output.dense.weight
discriminator.deberta.encoder.layer.7.output.dense.bias
discriminator.deberta.encoder.layer.7.output.LayerNorm.weight
discriminator.deberta.encoder.layer.7.output.LayerNorm.bias
discriminator.deberta.encoder.layer.8.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.8.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.8.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.8.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.8.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.8.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.8.attention.output.dense.weight
discriminator.deberta.encoder.layer.8.attention.output.dense.bias
discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.8.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.8.intermediate.dense.weight
discriminator.deberta.encoder.layer.8.intermediate.dense.bias
discriminator.deberta.encoder.layer.8.output.dense.weight
discriminator.deberta.encoder.layer.8.output.dense.bias
discriminator.deberta.encoder.layer.8.output.LayerNorm.weight
discriminator.deberta.encoder.layer.8.output.LayerNorm.bias
discriminator.deberta.encoder.layer.9.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.9.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.9.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.9.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.9.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.9.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.9.attention.output.dense.weight
discriminator.deberta.encoder.layer.9.attention.output.dense.bias
discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.9.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.9.intermediate.dense.weight
discriminator.deberta.encoder.layer.9.intermediate.dense.bias
discriminator.deberta.encoder.layer.9.output.dense.weight
discriminator.deberta.encoder.layer.9.output.dense.bias
discriminator.deberta.encoder.layer.9.output.LayerNorm.weight
discriminator.deberta.encoder.layer.9.output.LayerNorm.bias
discriminator.deberta.encoder.layer.10.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.10.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.10.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.10.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.10.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.10.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.10.attention.output.dense.weight
discriminator.deberta.encoder.layer.10.attention.output.dense.bias
discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.10.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.10.intermediate.dense.weight
discriminator.deberta.encoder.layer.10.intermediate.dense.bias
discriminator.deberta.encoder.layer.10.output.dense.weight
discriminator.deberta.encoder.layer.10.output.dense.bias
discriminator.deberta.encoder.layer.10.output.LayerNorm.weight
discriminator.deberta.encoder.layer.10.output.LayerNorm.bias
discriminator.deberta.encoder.layer.11.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.11.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.11.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.11.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.11.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.11.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.11.attention.output.dense.weight
discriminator.deberta.encoder.layer.11.attention.output.dense.bias
discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.11.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.11.intermediate.dense.weight
discriminator.deberta.encoder.layer.11.intermediate.dense.bias
discriminator.deberta.encoder.layer.11.output.dense.weight
discriminator.deberta.encoder.layer.11.output.dense.bias
discriminator.deberta.encoder.layer.11.output.LayerNorm.weight
discriminator.deberta.encoder.layer.11.output.LayerNorm.bias
discriminator.deberta.encoder.layer.12.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.12.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.12.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.12.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.12.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.12.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.12.attention.output.dense.weight
discriminator.deberta.encoder.layer.12.attention.output.dense.bias
discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.12.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.12.intermediate.dense.weight
discriminator.deberta.encoder.layer.12.intermediate.dense.bias
discriminator.deberta.encoder.layer.12.output.dense.weight
discriminator.deberta.encoder.layer.12.output.dense.bias
discriminator.deberta.encoder.layer.12.output.LayerNorm.weight
discriminator.deberta.encoder.layer.12.output.LayerNorm.bias
discriminator.deberta.encoder.layer.13.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.13.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.13.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.13.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.13.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.13.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.13.attention.output.dense.weight
discriminator.deberta.encoder.layer.13.attention.output.dense.bias
discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.13.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.13.intermediate.dense.weight
discriminator.deberta.encoder.layer.13.intermediate.dense.bias
discriminator.deberta.encoder.layer.13.output.dense.weight
discriminator.deberta.encoder.layer.13.output.dense.bias
discriminator.deberta.encoder.layer.13.output.LayerNorm.weight
discriminator.deberta.encoder.layer.13.output.LayerNorm.bias
discriminator.deberta.encoder.layer.14.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.14.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.14.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.14.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.14.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.14.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.14.attention.output.dense.weight
discriminator.deberta.encoder.layer.14.attention.output.dense.bias
discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.14.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.14.intermediate.dense.weight
discriminator.deberta.encoder.layer.14.intermediate.dense.bias
discriminator.deberta.encoder.layer.14.output.dense.weight
discriminator.deberta.encoder.layer.14.output.dense.bias
discriminator.deberta.encoder.layer.14.output.LayerNorm.weight
discriminator.deberta.encoder.layer.14.output.LayerNorm.bias
discriminator.deberta.encoder.layer.15.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.15.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.15.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.15.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.15.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.15.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.15.attention.output.dense.weight
discriminator.deberta.encoder.layer.15.attention.output.dense.bias
discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.15.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.15.intermediate.dense.weight
discriminator.deberta.encoder.layer.15.intermediate.dense.bias
discriminator.deberta.encoder.layer.15.output.dense.weight
discriminator.deberta.encoder.layer.15.output.dense.bias
discriminator.deberta.encoder.layer.15.output.LayerNorm.weight
discriminator.deberta.encoder.layer.15.output.LayerNorm.bias
discriminator.deberta.encoder.layer.16.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.16.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.16.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.16.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.16.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.16.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.16.attention.output.dense.weight
discriminator.deberta.encoder.layer.16.attention.output.dense.bias
discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.16.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.16.intermediate.dense.weight
discriminator.deberta.encoder.layer.16.intermediate.dense.bias
discriminator.deberta.encoder.layer.16.output.dense.weight
discriminator.deberta.encoder.layer.16.output.dense.bias
discriminator.deberta.encoder.layer.16.output.LayerNorm.weight
discriminator.deberta.encoder.layer.16.output.LayerNorm.bias
discriminator.deberta.encoder.layer.17.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.17.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.17.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.17.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.17.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.17.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.17.attention.output.dense.weight
discriminator.deberta.encoder.layer.17.attention.output.dense.bias
discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.17.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.17.intermediate.dense.weight
discriminator.deberta.encoder.layer.17.intermediate.dense.bias
discriminator.deberta.encoder.layer.17.output.dense.weight
discriminator.deberta.encoder.layer.17.output.dense.bias
discriminator.deberta.encoder.layer.17.output.LayerNorm.weight
discriminator.deberta.encoder.layer.17.output.LayerNorm.bias
discriminator.deberta.encoder.layer.18.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.18.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.18.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.18.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.18.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.18.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.18.attention.output.dense.weight
discriminator.deberta.encoder.layer.18.attention.output.dense.bias
discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.18.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.18.intermediate.dense.weight
discriminator.deberta.encoder.layer.18.intermediate.dense.bias
discriminator.deberta.encoder.layer.18.output.dense.weight
discriminator.deberta.encoder.layer.18.output.dense.bias
discriminator.deberta.encoder.layer.18.output.LayerNorm.weight
discriminator.deberta.encoder.layer.18.output.LayerNorm.bias
discriminator.deberta.encoder.layer.19.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.19.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.19.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.19.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.19.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.19.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.19.attention.output.dense.weight
discriminator.deberta.encoder.layer.19.attention.output.dense.bias
discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.19.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.19.intermediate.dense.weight
discriminator.deberta.encoder.layer.19.intermediate.dense.bias
discriminator.deberta.encoder.layer.19.output.dense.weight
discriminator.deberta.encoder.layer.19.output.dense.bias
discriminator.deberta.encoder.layer.19.output.LayerNorm.weight
discriminator.deberta.encoder.layer.19.output.LayerNorm.bias
discriminator.deberta.encoder.layer.20.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.20.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.20.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.20.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.20.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.20.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.20.attention.output.dense.weight
discriminator.deberta.encoder.layer.20.attention.output.dense.bias
discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.20.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.20.intermediate.dense.weight
discriminator.deberta.encoder.layer.20.intermediate.dense.bias
discriminator.deberta.encoder.layer.20.output.dense.weight
discriminator.deberta.encoder.layer.20.output.dense.bias
discriminator.deberta.encoder.layer.20.output.LayerNorm.weight
discriminator.deberta.encoder.layer.20.output.LayerNorm.bias
discriminator.deberta.encoder.layer.21.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.21.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.21.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.21.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.21.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.21.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.21.attention.output.dense.weight
discriminator.deberta.encoder.layer.21.attention.output.dense.bias
discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.21.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.21.intermediate.dense.weight
discriminator.deberta.encoder.layer.21.intermediate.dense.bias
discriminator.deberta.encoder.layer.21.output.dense.weight
discriminator.deberta.encoder.layer.21.output.dense.bias
discriminator.deberta.encoder.layer.21.output.LayerNorm.weight
discriminator.deberta.encoder.layer.21.output.LayerNorm.bias
discriminator.deberta.encoder.layer.22.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.22.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.22.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.22.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.22.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.22.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.22.attention.output.dense.weight
discriminator.deberta.encoder.layer.22.attention.output.dense.bias
discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.22.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.22.intermediate.dense.weight
discriminator.deberta.encoder.layer.22.intermediate.dense.bias
discriminator.deberta.encoder.layer.22.output.dense.weight
discriminator.deberta.encoder.layer.22.output.dense.bias
discriminator.deberta.encoder.layer.22.output.LayerNorm.weight
discriminator.deberta.encoder.layer.22.output.LayerNorm.bias
discriminator.deberta.encoder.layer.23.attention.self.query_proj.weight
discriminator.deberta.encoder.layer.23.attention.self.query_proj.bias
discriminator.deberta.encoder.layer.23.attention.self.key_proj.weight
discriminator.deberta.encoder.layer.23.attention.self.key_proj.bias
discriminator.deberta.encoder.layer.23.attention.self.value_proj.weight
discriminator.deberta.encoder.layer.23.attention.self.value_proj.bias
discriminator.deberta.encoder.layer.23.attention.output.dense.weight
discriminator.deberta.encoder.layer.23.attention.output.dense.bias
discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.weight
discriminator.deberta.encoder.layer.23.attention.output.LayerNorm.bias
discriminator.deberta.encoder.layer.23.intermediate.dense.weight
discriminator.deberta.encoder.layer.23.intermediate.dense.bias
discriminator.deberta.encoder.layer.23.output.dense.weight
discriminator.deberta.encoder.layer.23.output.dense.bias
discriminator.deberta.encoder.layer.23.output.LayerNorm.weight
discriminator.deberta.encoder.layer.23.output.LayerNorm.bias
discriminator.deberta.encoder.rel_embeddings.weight
discriminator.deberta.encoder.LayerNorm.weight
discriminator.deberta.encoder.LayerNorm.bias
discriminator.mask_predictions.dense.weight
discriminator.mask_predictions.dense.bias
discriminator.mask_predictions.LayerNorm.weight
discriminator.mask_predictions.LayerNorm.bias
discriminator.mask_predictions.classifier.weight
discriminator.mask_predictions.classifier.bias
]
```
#### Method 2: Manually load pretrained vi-deberta-v3-large
- Dev YourTokenizer, Your Model
```python
class YourTokenizer:
@classmethod
def from_pretrained(model_name_or_path, **kwargs):
# https://huggingface.co/anhdungitvn/vi-deberta-v3-large/blob/main/tokenizer/spm.model
pass
class YourModel:
@classmethod
def from_pretrained(model_name_or_path, **kwargs):
# Discriminator: https://huggingface.co/anhdungitvn/vi-deberta-v3-large/tree/main/discriminator
# Generator: https://huggingface.co/anhdungitvn/vi-deberta-v3-large/tree/main/generator
pass
```
- Use
```python
tokenizer = YourTokenizer.from_pretrained("anhdungitvn/vi-deberta-v3-large")
tokenizer = YourModel.from_pretrained("anhdungitvn/vi-deberta-v3-large")
```
### Log
Details
Logs:
- 2023-03-29: init, todolist
- 2023-03-30: data preparation
- vi_wiki_23: lastest
- vi_news_17g: available
- 2023-03-30: tokenizer training
- algorithm: unigram, bpe
- size: 8k, 16k, 32k, 64k, 128k, 256k
- 2023-03-31: training trials
- config: base, large
- tokenizer: unigram_16k, bpe_16k, bpe_128k
- args: default changed batch_size, grad_acc
- optimizer: default, customized optimizer
- 2023-03-31: training phase 0 started
- config: large
- tokenizer: bpe_128k
- args: default
- GPU: 5x A100-SXM4-80G
- 2023-04-04: training interrupted unintentionally, optimizer checkpoint none, step 300000
- 2023-04-05: training resumed from lastcheckpoint 300000, learning_rate addjusted 100µ -> 50µ
- 2023-04-10: sweet spot detected, step 800000
- 2023-04-11: training in progress, step 900000, loss increases, accuracy increases, regularization being working well, overfitting problem under monitoring
- 2023-04-12: training interrupted intentionally, step 1000000
- 2023-04-12: training phase 1 started, resumed intentionally, refining, learning_rate -> 2µ (diverging)
- 2023-04-24: training phase 1 finished, refining, step 1500000
- 2023-04-24: training phase 1 finished, refining, step 1500000
- 2023-04-26: training phase 2 started, resumed intentionally, step 1500000
- 2023-05-02: training phase 2 interrupted unintentionally, step 1980000
- 2023-05-05: training phase 2 resumed intentionally, step 2000000
- 2023-05-09: training in progress, step 2200000