# MMN **Repository Path**: d754406193/MMN ## Basic Information - **Project Name**: MMN - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-12-04 - **Last Updated**: 2025-02-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MMN ![mmn model](./assets/mmn_model.png) This project hosts the code and dataset for our paper. - [Byeongchang Kim](https://bckim92.github.io/), [Hyunwoo Kim](https://skywalker023.github.io/) and [Gunhee Kim](http://vision.snu.ac.kr/~gunhee/). Abstractive Summarization of Reddit Posts with Multi-level Memory Networks. In *NAACL-HLT* (oral), 2019. [[arxiv]](https://arxiv.org/abs/1811.00783) [[slide]](https://drive.google.com/open?id=17nGtwNewII9Uqxmq4Rp_xmQtCJPAZRnz) We address the problem of abstractive summarization in two directions: proposing a novel dataset and a new model. First, we collected Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit. Second, we propose a novel abstractive summarization model named *multi-level memory networks* (MMN), equipped with multi-level memory to store the information of text from different levels of abstraction. ## Reference If you use this code or dataset as part of any published research, please refer following paper. ``` @inproceedings{Kim:2019:NAACL-HLT, author = {Kim, Byeongchang and Kim, Hyunwoo and Kim, Gunhee}, title = "{Abstractive Summarization of Reddit Posts with Multi-level Memory Networks}", booktitle = {NAACL-HLT}, year = 2019 } ``` ## Running Code TBU ## *Reddit TIFU* Dataset *Reddit TIFU* dataset is our newly collected Reddit dataset, where TIFU denotes the name of subbreddit [/r/tifu](https://www.reddit.com/r/tifu/). Key statistics of *Reddit TIFU* dataset are outlined below. We also show average and median (in parentheses) values. The total text-summary pairs are 122,933. | Dataset | #posts | #words/post | #words/summ | |:------------:|:---------:|:-----------:|:-----------:| | TIFU-short | 79,949 | 342.4 (269) | 9.33 (8) | | TIFU-long | 42,984 | 432.6 (351) | 23.0 (21) | You can download data from the links below. This file includes raw text and tokenized text. [[Download json]](https://drive.google.com/open?id=1ffWfITKFMJeqjT8loC8aiCLRNJpc_XnF) You can read and explore our dataset as follows ```python import json # Read entire file posts = [] with open('tifu_tokenized_and_filtered.json', 'r') as fp: for line in fp: posts.append(json.loads(line)) # Json entries print(posts[50000].keys()) # [u'title_tokenized', # u'permalink', # u'title', # u'url', # u'num_comments', # u'tldr', # (optional) # u'created_utc', # u'trimmed_title_tokenized', # u'ups', # u'selftext_html', # u'score', # u'upvote_ratio', # u'tldr_tokenized', # (optional) # u'selftext', # u'trimmed_title', # u'selftext_without_tldr_tokenized', # u'id', # u'selftext_without_tldr'] ``` ## Acknowledgement We thank [PRAW](https://praw.readthedocs.io/en/latest/) developers for their API and Reddit users for their valuable posts. We also appreciate [Chris Dongjoo Kim](http://vision.snu.ac.kr/people/dongjookim.html) and [Yunseok Jang](https://yunseokjang.github.io) for helpful comments and discussions. This work was supported by Kakao and Kakao Brain corporations, and Creative-Pioneering Researchers Program through Seoul National University. ## Authors [Byeongchang Kim](https://bckim92.github.io/), [Hyunwoo Kim](https://skywalker023.github.io/) and [Gunhee Kim](http://vision.snu.ac.kr/~gunhee/) [Vision and Learning Lab](http://vision.snu.ac.kr/) @ Computer Science and Engineering, Seoul National University, Seoul, Korea ## License MIT license