# Machine-Reading-Comprehension-Neural-Question-Answer

**Repository Path**: coracoding/Machine-Reading-Comprehension-Neural-Question-Answer

## Basic Information

- **Project Name**: Machine-Reading-Comprehension-Neural-Question-Answer
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-03-26
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Machine Reading Comprehension using SQUAD v.1

![Reading Coprehension](/images/reading_comprehension.jpg)

## About Dataset:
Data Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. You can download this dataset here https://rajpurkar.github.io/SQuAD-explorer/

![Data Strucutre](/images/dataset.PNG)

**SQuAD 1.1:** The previous version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles.

## Problem Statement
Predicting the right answer for the given question and context.

## Standford Attentive Reader
Implemented standford attentive reader model using keras.Please refer this [paper.](https://arxiv.org/pdf/1704.00051.pdf)

![Standford Attentive Reader](/images/model.JPG)

## BERT on SQUAD:

BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

Please refer this research paper. https://arxiv.org/abs/1810.04805.

**Disclaimer**

- Most of the code is taken from google-research github account
- The bert model is fine-tuned only.
- The code modified as per necesscity
- Used the bert base model with 110M parameters
- All the referance are mentioned in the referances section

**For ipynb notebook , please check the bert folder**

## Blog:
**I have written a detailed post regarding this on medium. You can read it here https://medium.com/@raman.shinde15/neural-question-and-answering-using-sqad-dataset-and-attention-983d3a1dd42c**


## Observations:

* Obtained micro f1_score of 40.33% on test data.
* Algined question embedding and f_exact match found to be the moset effective as mentioned in paper
* f1_score can be further improoved by adding Algined question embedding feature to context.
* Algined question embedding was omitted due to computational power limits
* To train on 1 epoch it took around hour without Algined question embedding
* Algined question embedding was omittited because, training on 1 epoch was taking more than 5 hours.
* Performance can be improoved further by considering:
    * All data points
    * Taking 128 units and 3 Layer of Bi_LSTM as mentioned in paper.
    * Considering Algined question embedding + f_exact together.
* Fine tuned Bert Uncased state of the art model to get the results.
* Bert model results are obtained using TPU provided by google

## Summary:
 
![Summary](/images/summary.PNG)
 
## Referances:

* The Stanford Question Answering Dataset by Rajpurkar https://rajpurkar.github.io/mlx/qa-and-squad/
* ReadingWikipedia to Answer Open-Domain Questions https://arxiv.org/pdf/1704.00051.pdf
* https://hanxiao.github.io/2018/04/21/Teach-Machine-to-Comprehend-Text-and-Answer-Question-with-Tensorflow/
* https://github.com/kellywzhang/reading-comprehension
* https://github.com/Shuang0420/Fast-Reading-Comprehension
* https://github.com/google-research/bert/blob/master/run_squad.py
* https://github.com/google-research/bert
* https://www.kaggle.com/lapolonio/bert-squad-forked-from-sergeykalutsky/code


----------------------------------------------------END------------------------------------------------------