# ELMo-keras

**Repository Path**: greitzmann/ELMo-keras

## Basic Information

- **Project Name**: ELMo-keras
- **Description**: Re-implementation of ELMo on Keras
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-11-22
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ELMo-keras
Re-implementation of ELMo in Keras based on the tensorflow implementation presented by Allen NLP (https://github.com/allenai/bilm-tf), based on Peters et al. article in NAACL 2018 (https://arxiv.org/abs/1802.05365):

_Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer. 2018. Deep contextualized word representations_

Notice: The project includes WikiText-2 datasets for experimentation as published in (https://einstein.ai/research/blog/the-wikitext-long-term-dependency-language-modeling-dataset), presented in Merity et al. 2016 (https://arxiv.org/abs/1609.07843):

_Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer Sentinel Mixture Models_

## Why in the heck did you do that?

- This was the easiest way to understand ELMo deeply, find its pros and cons and also consider improvements (e.g., make it more computational efficient). 
- I also consider Keras as the most user-friendly and industry-ready library to work with.
- Now we are also able to integrate ELMo for practical use at Cognitiv+, where we rely on Keras for our NLP engine.
- It was really fun! This took me more than a month, in which period I had to learn many things and vastly improve my understading and skills around Keras and Tensorflow, so be kind.

## How to use it?

```
import os
import keras.backend as K

from data import DATA_SET_DIR
from elmo.lm_generator import LMDataGenerator
from elmo.model import ELMo

parameters = {
    'multi_processing': False,
    'n_threads': 4,
    'cuDNN': True if len(K.tensorflow_backend._get_available_gpus()) else False,
    'train_dataset': 'wikitext-2/wiki.train.tokens',
    'valid_dataset': 'wikitext-2/wiki.valid.tokens',
    'test_dataset': 'wikitext-2/wiki.test.tokens',
    'vocab': 'wikitext-2/wiki.vocab',
    'vocab_size': 28914,
    'num_sampled': 1000,
    'charset_size': 262,
    'sentence_maxlen': 100,
    'token_maxlen': 50,
    'token_encoding': 'word',
    'epochs': 10,
    'patience': 2,
    'batch_size': 1,
    'clip_value': 5,
    'cell_clip': 5,
    'proj_clip': 5,
    'lr': 0.2,
    'shuffle': True,
    'n_lstm_layers': 2,
    'n_highway_layers': 2,
    'cnn_filters': [[1, 32],
                    [2, 32],
                    [3, 64],
                    [4, 128],
                    [5, 256],
                    [6, 512],
                    [7, 512]
                    ],
    'lstm_units_size': 400,
    'hidden_units_size': 200,
    'char_embedding_size': 16,
    'dropout_rate': 0.1,
    'word_dropout_rate': 0.05,
    'weight_tying': True,
}

# Set-up Generators
train_generator = LMDataGenerator(os.path.join(DATA_SET_DIR, parameters['train_dataset']),
                                  os.path.join(DATA_SET_DIR, parameters['vocab']),
                                  sentence_maxlen=parameters['sentence_maxlen'],
                                  token_maxlen=parameters['token_maxlen'],
                                  batch_size=parameters['batch_size'],
                                  shuffle=parameters['shuffle'],
                                  token_encoding=parameters['token_encoding'])

val_generator = LMDataGenerator(os.path.join(DATA_SET_DIR, parameters['valid_dataset']),
                                os.path.join(DATA_SET_DIR, parameters['vocab']),
                                sentence_maxlen=parameters['sentence_maxlen'],
                                token_maxlen=parameters['token_maxlen'],
                                batch_size=parameters['batch_size'],
                                shuffle=parameters['shuffle'],
                                token_encoding=parameters['token_encoding'])

test_generator = LMDataGenerator(os.path.join(DATA_SET_DIR, parameters['test_dataset']),
                                os.path.join(DATA_SET_DIR, parameters['vocab']),
                                sentence_maxlen=parameters['sentence_maxlen'],
                                token_maxlen=parameters['token_maxlen'],
                                batch_size=parameters['batch_size'],
                                shuffle=parameters['shuffle'],
                                token_encoding=parameters['token_encoding'])

# Compile ELMo
elmo_model = ELMo(parameters)
elmo_model.compile_elmo(print_summary=True)

# Train ELMo
elmo_model.train(train_data=train_generator, valid_data=val_generator)

# Persist ELMo Bidirectional Language Model in disk
elmo_model.save(sampled_softmax=False)

# Evaluate Bidirectional Language Model
elmo_model.evaluate(test_generator)

# Build ELMo meta-model to deploy for production and persist in disk
elmo_model.wrap_multi_elmo_encoder(print_summary=True, save=True)

# Load ELMo encoder
elmo_model.load_elmo_encoder()

# Get ELMo embeddings to feed as inputs for downstream tasks
elmo_embeddings = elmo_model.get_outputs(test_generator, output_type='word', state='mean')

# BUILD & TRAIN NEW KERAS MODEL FOR DOWNSTREAM TASK (E.G., TEXT CLASSIFICATION)

```

## What is missing?

- Turn sampled softmax into full softmax dynamically in evaluation mode (TODO) ([Read comment](https://github.com/iliaschalkidis/ELMo-keras/commit/35fa4f9b3245a9c1078d4c7975064b19bd9742f4#commitcomment-31314484))
- More testing (TODO)
- Options to build a unidirectional LM (TODO)
- Proof-reading, you're all welcome!

## Credits for proof-reading and reporting so far...

[@seolhokim](https://github.com/seolhokim)
[@geneva0901](https://github.com/geneva0901)
[@masepehr](https://github.com/masepehr)
[@dilaratorunoglu](https://github.com/dilaratorunoglu)
[@Adherer](https://github.com/Adherer)