# sentence-completion **Repository Path**: d754406193/sentence-completion ## Basic Information - **Project Name**: sentence-completion - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-12-04 - **Last Updated**: 2020-12-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Word RNN for Sentence Completion A pytorch implementation of the word-level recurrent neural network for sentence completion. The code is based on [Word-level language modeling RNN](https://github.com/pytorch/examples/tree/master/word_language_model), and importance sampling module is from [PyTorch Large-Scale Language Model](https://github.com/rdspring1/PyTorch_GBW_LM). ## Requirements - torchvision >= 0.2.0 - torch >= 0.3.0.post4 - numpy >= 1.13.3 - pandas >= 0.21.0 - nltk >= 3.2.5 - tqdm >= 4.19.5 - Cython >= 0.27.3 > pip3 install -r requirements.txt ## Setup - Build Log_Uniform Sampler according to [Link](https://github.com/rdspring1/PyTorch_GBW_LM). - Download `punkt` package in `nltk`. ## Datasets - Microsoft Research Sentence Completion Challenge - Training and Test dataset can be downloaded from [Link](https://drive.google.com/open?id=0B5eGOMdyHn2mWDYtQzlQeGNKa2s). Store the downloaded test data in `./data/completion/`. - Scholastic Aptitude Test sentence completion questions - Collected questions are provided in `./data/completion/SAT_set_filled.csv`. - Nineteenth century novels (19C novels) - Extract `./data/prepro/guten.tgz` of preprocessed files. - One Billion Word Benchmark (1B word) - [Link](http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz) ## Run ### Training > python3 train.py --cuda --save_dir mynet Default arguments are set for training with 19C novels. Argument settings for training with the 1B word benchmark are presented in the following table. | Argument | 19C novels | 1B word | |---------------|---------------|-----------| | corpus | guten | gbw | | emsize | 200 | 500 | | nhid | 600 | 2000 | | outsize | 400 | 500 | | lr | 0.5 | 1.0 | | decay_after | 5 | 1 | | decay_rate | 0.5 | 0.8 | | batch_size | 20 | 100 | | nsampled | -1 | 8192 | ### Sentence completion > python3 sent_cmplt.py --cuda --save_dir mynet ## Results | corpus | bidirec | MSR accuracy | SAT accuracy | |:----------|:----------|:-------------:|:-------------:| | guten | False | 69.4 (0.8)* | 29.6 (1.5)* | | guten | True | 72.3 (1.1)* | 33.3 (2.0)* | | gbw | False | 63.2 | 66.5 | | gbw | True | 64.1 | 69.1 | *The mean accuracy of five networks trained with different random initializations is shown with the standard deviation in parentheses.