# bert-exploration

**Repository Path**: coracoding/bert-exploration

## Basic Information

- **Project Name**: bert-exploration
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-03-26
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# BERT exploration

Workspace to explore classification and other tasks based on the [pytorch implementation](https://github.com/huggingface/pytorch-pretrained-BERT) of the original bert paper (https://arxiv.org/abs/1810.04805)


# GLUE 
See [this notebook](https://colab.research.google.com/drive/1Qc0JOJ3x4vUU3nNTtBoD5GONrrrOtNdr) for an implementation of the GLUE tasks.


# SWAG
The Situations With Adversarial Generations (SWAG) dataset contains 113k sentence-pair com- pletion examples that evaluate grounded common- sense inference (Zellers et al., 2018). Given a sentence from a video captioning
dataset, the task is to decide among four choices the most plausible continuation. 

Running

```bash 
export SWAG_DIR=/home/pfecht/thesis/swagaf

python run_swag.py \
  --bert_model bert-base-uncased \
  --do_train \
  --do_eval \
  --data_dir $SWAG_DIR/data \
  --train_batch_size 16 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --max_seq_length 80 \
  --output_dir /home/pfecht/tmp/swag_output/  \
  --gradient_accumulation_steps 4
```

results in


* **Accuracy**: **78.58** (BERT paper **81.6**)

```
12/14/2018 18:42:18 - INFO - __main__ -     eval_accuracy = 0.7858642407277817
12/14/2018 18:42:18 - INFO - __main__ -     eval_loss = 0.6655298910721517
12/14/2018 18:42:18 - INFO - __main__ -     global_step = 13788
12/14/2018 18:42:18 - INFO - __main__ -     loss = 0.07108418613090857
```

with fine-tuning time on a single GPU (GeForce GTX TITAN X): **around 4 hours**.


# SQuAD
> see https://github.com/huggingface/pytorch-pretrained-BERT#fine-tuning-with-bert-running-the-examples


Running

```bash
$ python run_squad.py \
  --bert_model bert-base-uncased \
  --do_train \
  --do_predict \
  --train_file $SQUAD_DIR/train-v1.1.json \
  --predict_file $SQUAD_DIR/dev-v1.1.json \
  --train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $OUT_DIR \
  --optimize_on_cpu
```

* `optimize_on_CPU` is important to obtain enough space on the GPU for training. BertOptimiezer stores 2 moving averages of the weights of the model wich means We have to store 3-times the size of the model in the GPU if we don't move it to CPU.
* OOM errors are proportional to `train_batch_size` and `max_seq_length`.

results in 

* **F1 score**: 88.28 (BERT paper 88.5)
* **EM (Exact match)**: 81.05 (BERT paper = 80.8)

with fine-tuning time on a single GPU (GeForce GTX TITAN X): **around 8 hours**.

running evaluation based on

```json
$ python evaluate-v1.1.py /home/pfecht/res/SQUAD/dev-v1.1.json predictions.json
{"f1": 88.28409344840951, "exact_match": 81.05014191106906}
```