# self-attentive-emb-tf **Repository Path**: greitzmann/self-attentive-emb-tf ## Basic Information - **Project Name**: self-attentive-emb-tf - **Description**: Simple Tensorflow Implementation of "A Structured Self-attentive Sentence Embedding" (ICLR 2017) - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-02-17 - **Last Updated**: 2021-02-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # A Structured Self-attentive Sentence Embedding Tensorflow Implementation of "[A Structured Self-attentive Sentence Embedding](https://arxiv.org/abs/1703.03130)" (ICLR 2017). ![image](https://user-images.githubusercontent.com/15166794/41864478-21cbf7c8-78e5-11e8-94d2-5aa035a65c8b.png) ## Usage ### Data * AG's news topic classification dataset. * The csv files (in my [data directory](https://github.com/roomylee/self-attention-tf/tree/master/data)) were available from [here](https://github.com/mhjabreel/CharCNN/tree/master/data/ag_news_csv). ### Train * "[GoogleNews-vectors-negative300](https://code.google.com/archive/p/word2vec/)" is used as pre-trained word2vec model. * Display help message: ```bash $ python train.py --help ``` ```bash train.py: --[no]allow_soft_placement: Allow device soft device placement (default: 'true') --batch_size: Batch Size (default: '64') (an integer) --checkpoint_every: Save model after this many steps (default: '100') (an integer) --d_a_size: Size of W_s1 embedding (default: '350') (an integer) --dev_sample_percentage: Percentage of the training data to use for validation (default: '0.1') (a number) --display_every: Number of iterations to display training info. (default: '10') (an integer) --embedding_dim: Dimensionality of word embedding (default: '300') (an integer) --evaluate_every: Evaluate model on dev set after this many steps (default: '100') (an integer) --fc_size: Size of fully connected layer (default: '2000') (an integer) --hidden_size: Size of LSTM hidden layer (default: '256') (an integer) --learning_rate: Which learning rate to start with. (default: '0.001') (a number) --[no]log_device_placement: Log placement of ops on devices (default: 'false') --max_sentence_length: Max sentence length in train/test data (default: '50') (an integer) --num_checkpoints: Number of checkpoints to store (default: '5') (an integer) --num_epochs: Number of training epochs (default: '10') (an integer) --p_coef: Coefficient for penalty (default: '1.0') (a number) --r_size: Size of W_s2 embedding (default: '30') (an integer) --train_dir: Path of train data (default: 'data/train.csv') --word2vec: Word2vec file with pre-trained embeddings ``` * **Train Example (with word2vec):** ```bash $ python train.py --word2vec "GoogleNews-vectors-negative300.bin" ``` ### Evalutation * You must give "**checkpoint_dir**" argument, path of checkpoint(trained neural model) file, like below example. * If you don't want to visualize the attention, give option like `--visualize False`. * **Evaluation Example:** ```bash $ python eval.py --checkpoint_dir "runs/1523902663/checkpoints/" ``` ## Results #### 1) Accuracy test data = 0.920789 #### 2) Visualization of Self Attention ![viz](https://user-images.githubusercontent.com/15166794/41875853-1dea6f28-7907-11e8-94e9-398e2699aca5.png) ## Reference * A Structured Self-attentive Sentence Embedding (ICLR 2017), Z Lin et al. [[paper]](https://arxiv.org/abs/1703.03130) * flrngel's [Self-Attentive-tensorflow](https://github.com/flrngel/Self-Attentive-tensorflow) github repository