# ConvKB **Repository Path**: gzupanda/ConvKB ## Basic Information - **Project Name**: ConvKB - **Description**: A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network (NAACL 2018) - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-03-09 - **Last Updated**: 2020-12-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

# A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network

This program provides the implementation of the CNN-based model ConvKB for the knowledge base completion task. ConvKB obtains new state-of-the-art results on two standard datasets: WN18RR and FB15k-237 as described in [the paper](http://www.aclweb.org/anthology/N18-2053): @InProceedings{Nguyen2018, author={Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung}, title={{A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network}}, booktitle={Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)}, year={2018}, pages={327--333} }

## Usage ### Requirements - Python 3 - Tensorflow >= 1.6 ### Training To run the program: python train.py --embedding_dim --num_filters --learning_rate --name [--useConstantInit] --model_name **Required parameters:** `--embedding_dim`: Dimensionality of entity and relation embeddings. `--num_filters`: Number of filters. `--learning_rate`: Initial learning rate. `--name`: Dataset name (WN18RR or FB15k-237). `--useConstantInit`: Initialize filters by [0.1, 0.1, -0.1]. Otherwise, initialize filters by a truncated normal distribution. `--model_name`: Name of saved models. **Optional parameters:** `--l2_reg_lambda`: L2 regularizaion lambda (Default: 0.001). `--dropout_keep_prob`: Dropout keep probability (Default: 1.0). `--num_epochs`: Number of training epochs (Default: 200). `--run_folder`: Specify directory path to save trained models. `--batch_size`: Batch size. ### Reproduce the ConvKB results To reproduce the ConvKB results published in the paper: $ python train.py --embedding_dim 100 --num_filters 50 --learning_rate 0.000005 --name FB15k-237 --useConstantInit --model_name fb15k237 $ python train.py --embedding_dim 50 --num_filters 500 --learning_rate 0.0001 --name WN18RR --model_name wn18rr --saveStep 50 ### Evaluation metrics File `eval.py` provides ranking-based scores as evaluation metrics, including the mean rank, the mean reciprocal rank and Hits@10 in a setting protocol "Filtered". Files `evalFB15k-237.sh` and `evalWN18RR.sh` contain evaluation commands. Depending on the memory resources, you should change the value of `--num_splits` to a suitable value to get a faster process. To get the results (supposing `num_splits = 8`): $ python eval.py --embedding_dim 100 --num_filters 50 --name FB15k-237 --useConstantInit --model_name fb15k237 --num_splits 8 --decode $ python eval.py --embedding_dim 50 --num_filters 500 --name WN18RR --model_name wn18rr --num_splits 8 --decode ### Note Update a new initialization for WN18RR: MR:763, MRR:0.253 and Hits@10:56.7. Please check [our new NAACL2019 paper](https://arxiv.org/abs/1808.04122). $ python train.py --embedding_dim 100 --num_filters 400 --learning_rate 0.00005 --name WN18RR --num_epochs 101 --saveStep 100 --model_name wn18rr_400_3 ## License Please cite the paper whenever ConvKB is used to produce published results or incorporated into other software. I would highly appreciate to have your bug reports, comments and suggestions about ConvKB. As a free open-source implementation, ConvKB is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ConvKB is licensed under the Apache License 2.0. ## Acknowledgments I would like to thank Denny Britz for implementing a CNN for text classification in TensorFlow.