# mta-lstm-pytorch

**Repository Path**: yimou2021/mta-lstm-pytorch

## Basic Information

- **Project Name**: mta-lstm-pytorch
- **Description**: PyTorch implementation of MTA-LSTM
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2023-10-18
- **Last Updated**: 2023-10-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# MTA-LSTM-PyTorch

This is a PyTorch implementation of the paper [Topic-to-Essay Generation with Neural Networks](http://ir.hit.edu.cn/~xcfeng/xiaocheng%20Feng's%20Homepage_files/final-topic-essay-generation.pdf) of IJCAI2018.

MTA-LSTM stands for Multi-Topic-Aware LSTM, ultilizing multi-topic coverage vector which learns the weight of each topic during the decoding process, and is sequentially updated. The original implementation written in TensorFlow can be found [here](https://github.com/hit-computer/MTA-LSTM), but it's out-of-date and is no longer maintained. Therefore I decided to use PyTorch, which is easier and more straight forward than TensorFlow in my opinion, to reimplement the paper.

## Dataset

The first link are 2 datasets provided by the author of the paper, and the second link, the news dataset, is prepared by myself. Simply download the files and put them into ```data``` folder.

- [Composition and Zhihu (Chinese)](https://drive.google.com/drive/folders/1oK9i0ukV5T0QoPQkHsxa3OdhQ1dVsNnL?usp=sharing)
- [News (English)](https://drive.google.com/drive/folders/1RpBMMBvgnMPjRdQaM46pyncu__QsZBZS?usp=sharing)

## Prerequisites

- Python3
- PyTorch >= 1.2.0
- Gensim

## Implementation Notes

- Full vocabs were used instead of using only 50000 common words as the paper did.
- Adaptive softmax were adopted instead of cross entropy in order to speed up training process.
- The model was trained on one 1080 ti, and it took 2 days for 100 epochs.
- Beam Search method is not parallel.

## Usage

1. Run ```data/word2vec.ipynb``` to create pretrained word embedding files.
2. Run ```mta-lstm.ipynb``` to train the model.

## Generated Examples

- Topics: 現在 未來 夢想 科學 文化
    ```
    我的夢想是長大後成為一名科學家，為實現自己的理想努力奮鬥。我要好好學習科學文化知識，長大後成為國家的棟樑之才。我堅信，只要我們努力學習科學文化知識，長大後，我們的未來一定會更加美好。
    ```
- Topics: 美麗 夏天 人們 玩耍 來臨
    ```
    夏天，是一個美麗的季節。夏天，樹木長得蔥蔥蘢，枝繁葉茂。夏天，池塘裡的水清了，孩子們也可以在水裡自由字在地玩耍了。小朋友們在這裡捉迷藏、嬉戲、玩耍、玩耍。
    ```