# JointLK
**Repository Path**: xuan-lai/JointLK
## Basic Information
- **Project Name**: JointLK
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-10-19
- **Last Updated**: 2023-10-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
This repo provides the source code & data of our paper: [JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering](https://arxiv.org/abs/2112.02732) (NAACL 2022).
For convenience, all data, checkpoints and codes can be downloaded from my [Baidu Netdisk](https://pan.baidu.com/s/1WsEukLdrHELu6q9_qj8NBA?pwd=y5sd).
## 1. Dependencies
Run the following commands to create a conda environment (assuming CUDA11):
```bash
conda create -n jointlk python=3.7
source activate jointlk
pip install torch==1.7.1+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==3.2.0
pip install nltk spacy==2.1.6
python -m spacy download en
# for torch-geometric
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-geometric==1.6.3 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
```
See the file env.yaml for all environment dependencies.
## 2. Download Data
We use preprocessed data from the [QA-GNN](https://github.com/michiyasunaga/qagnn) repository, which can also be downloaded from my [Baidu Netdisk](https://pan.baidu.com/s/1haczfYKB1LlgYZ5MYxMDxQ?pwd=x5sl).
The data file structure will look like:
```plain
.
├── data/
├── cpnet/ (prerocessed ConceptNet)
├── csqa/
├── train_rand_split.jsonl
├── dev_rand_split.jsonl
├── test_rand_split_no_answers.jsonl
├── statement/ (converted statements)
├── grounded/ (grounded entities)
├── graphs/ (extracted subgraphs)
├── ...
├── obqa/
├── medqa_usmle/
└── ddb/
```
## 3. Training JointLK
(Assuming slurm job scheduling system)
For CommonsenseQA, run
```
sbatch sbatch_run_jointlk__csqa.sh
```
For OpenBookQA, run
```
sbatch sbatch_run_jointlk__obqa.sh
```
## 4. Pretrained model checkpoints
CommonsenseQA
| Trained model |
In-house Dev acc. |
In-house Test acc. |
| RoBERTa-large + JointLK [link] |
77.6 |
75.3 |
| RoBERTa-large + JointLK [link] |
78.4 |
74.2 |
OpenBookQA
| Trained model |
Dev acc. |
Test acc. |
| RoBERTa-large + JointLK [link] |
68.8 |
70.4 |
| AristoRoBERTa-large + JointLK [link] |
79.2 |
85.6 |
## 5. Evaluating a pretrained model checkpoint
For CommonsenseQA, run
```
sbatch sbatch_run_jointlk__csqa_test.sh
```
For OpenBookQA, run
```
sbatch sbatch_run_jointlk__obqa_test.sh
```
## 6. Acknowledgment
This repo is built upon the following work:
```
QA-GNN: Question Answering using Language Models and Knowledge Graphs
https://github.com/michiyasunaga/qagnn
```
Many thanks to the authors and developers!
## Others
We noticed that the [QA-GNN](https://github.com/michiyasunaga/qagnn) repository added test results on the MedQA dataset. To facilitate future researchers to compare different models, we also test the performance of JointLK on MedQA.
For training MedQA, run
```
sbatch sbatch_run_jointlk__medqa_usmle.sh
```
for testing MedQA, run
```
sbatch sbatch_run_jointlk__medqa_usmle_test.sh
```
A pretrained model checkpoint
| Trained model |
Dev acc. |
Test acc. |
| SapBERT-base + JointLK [link] |
38.0 |
39.8 |