# structured-neural-summarization-replication

**Repository Path**: wxwxzhang/structured-neural-summarization-replication

## Basic Information

- **Project Name**: structured-neural-summarization-replication
- **Description**: Replication of the paper "Structured Neural Summarization" which uses Graph Neural Networks and Seq2Seq models to summarize natural language and source code.
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2019-11-04
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Structured Neural Summarization

## Extracting the Dataset
In order to extract the features from the corpus proto files, run:

`python data_processing/data_generation.py`

In order for the command to be successful, it is necessary to have a directory 
_corpus/r252-corpus-features_ with the protos of the corpus. Optionally, it is possible to 
downloaded the extracted dataset at https://drive.google.com/file/d/14k4AgOVws4_TfPtDGefXzPn3x2Ph083h/view?usp=sharing. After putting the downloaded file
 under the _data/_ directory (which needs to be created), it is possible to train and evaluate the 
 model.

## Running the Models
In order to train a model and evaluate a model, run:

`python training/train.py --model_name="lstm_gcn_to_lstm_attention" 
--print_every=10000 --attention=True --graph=True --iterations=500000`

All the possible options when running a model can be seen by running:

`python train.py --help`

## Pretrained Models
A pretrained version of the best performing model (as a state dictionary) can be downloaded at 
https://drive.google.com/file/d/1fm7hGzr-tziNhUMh8duc8s4j5gWW3uKm/view?usp=sharing

## High-Level Code Structure
- data_processing/: contains the code for extracting, storing, analysing and processing data
    - data_analysis.ipynb: notebook containing analysis of the extracted data
    - data_extraction.py: contains the logic to extract the features data from the proto files of 
    the corpus
    - data_generation.py: file to be called to generate the features data  
    - data_util.py: contains utilities to work with data
    - text_util.py: contains utilities to work with text
- models/: contains all the code for the different models
    - full_model.py: class of the complete methodNaming model
    - gat_encoder.py: class for the Graph Attention Network encoder
    - gcn_encoder.py: class for the Graph Convolutional Network encoder
    - graph_attention_layer.py: class for the Graph Attention Layer used by the Graph Attention 
    Network 
    - graph_convolutional_layer.py: class for the Graph Convolutional Layer used by the Graph 
    Convolutional Network 
    - lstm_decoder.py: class for the LSTM sequence decoder
    - lstm_encoder.py: class for the LSTM sequence encoder
- training.py: contains code to train and evaluate the models
    - evaluation_util.py: contains utilities to compute evaluation metrics
    - train.py: entry-point for training the models
    - train_model.py: contains logic to train the models