# TTPP **Repository Path**: hawkwoodd/ttpp ## Basic Information - **Project Name**: TTPP - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-10-20 - **Last Updated**: 2024-11-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README <<<<<<< HEAD # TPLinker **TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking** This repository contains all the code of the official implementation for the paper: **[TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking](https://www.aclweb.org/anthology/2020.coling-main.138.pdf).** The paper has been accepted to appear at **COLING 2020**. \[[slides](https://drive.google.com/file/d/1UAIVkuUgs122k02Ijln-AtaX2mHz70N-/view?usp=sharing)\] \[[poster](https://drive.google.com/file/d/1iwFfXZDjwEz1kBK8z1To_YBWhfyswzYU/view?usp=sharing)\] TPLinker is a joint extraction model resolved the issues of **relation overlapping** and **nested entities**, immune to the influence of **exposure bias**, and achieves SOTA performance on NYT (TPLinker: **91.9**, TPlinkerPlus: **92.6 (+3.0)**) and WebNLG (TPLinker: **91.9**, TPlinkerPlus: **92.3 (+0.5)**). Note that the details of TPLinkerPlus will be published in the extended paper, which is still in progress. **Note: Please refer to Q&A and closed issues to find your question before proposed a new issue.** - [Model](#model) - [Results](#results) - [Usage](#usage) * [Prerequisites](#prerequisites) * [Data](#data) + [download data](#download-data) + [build data](#build-data) * [Pretrained Model and Word Embeddings](#pretrained-model-and-word-embeddings) * [Train](#train) + [super parameters](#super-parameters) * [Evaluation](#evaluation) - [Citation](#citation) - [Q&A](#frequently-asked-questions) ## Update * 2020.11.01: Fixed bugs and added comments in BuildData.ipynb and build_data_config.yaml; TPLinkerPlus can support entity classification now, see [build data](#build-data) for the data format; Updated the [datasets](#download-data) (added `entity_list` for TPLinkerPlus). * 2020.12.04: The original default parameters in `build_data_config.yaml` are all for Chinese datasets. It might be misleading for reproducing the results. I changed back to the ones for English datasets. **Note that you must set `ignore_subword` to `true` for English datasets**, or it will hurt the performance and can not reach the scores reported in the paper. * 2020.12.09: We published model states for fast tests. See [Super Parameters](#super-parameters). * 2021.03.22: Add Q&A part in README. ## Model

framework

## Results

data_statistics

main_res_plus

split_res_plus

## Usage ### Prerequisites Our experiments are conducted on Python 3.6 and Pytorch == 1.6.0. The main requirements are: * tqdm * glove-python-binary==0.2.0 * transformers==3.0.2 * wandb # for logging the results * yaml In the root directory, run ```bash pip install -e . ``` ### Data #### download data Get and preprocess NYT* and WebNLG* following [CasRel](https://github.com/weizhepei/CasRel/tree/master/data) (note: named NYT and WebNLG by CasRel). Take NYT* as an example, rename train_triples.json and dev_triples.json to train_data.json and valid_data.json and move them to `ori_data/nyt_star`, put all test*.json under `ori_data/nyt_star/test_data`. The same process goes for WebNLG*. Get raw NYT from [CopyRE](https://github.com/xiangrongzeng/copy_re), rename raw_train.json and raw_valid.json to train_data.json and valid_data.json and move them to `ori_data/nyt`, rename raw_test.json to test_data.json and put it under `ori_data/nyt/test_data`. Get WebNLG from [ETL-Span](https://github.com/yubowen-ph/JointER/tree/master/dataset/WebNLG/data), rename train.json and dev.json to train_data.json and valid_data.json and move them to `ori_data/webnlg`, rename test.json to test_data.json and put it under `ori_data/webnlg/test_data`. If you are bother to prepare data on your own, you could download our preprocessed [datasets](https://drive.google.com/file/d/1RxBVMSTgBxhGyhaPEWPdtdX1aOmrUPBZ/view?usp=sharing). #### build data Build data by `preprocess/BuildData.ipynb`. Set configuration in `preprocess/build_data_config.yaml`. In the configuration file, set `exp_name` corresponding to the directory name, set `ori_data_format` corresponding to the source project name of the data. e.g. To build NYT*, set `exp_name` to `nyt_star` and set `ori_data_format` to `casrel`. See `build_data_config.yaml` for more details. If you want to run on other datasets, transform them into the normal format for TPLinker, then set `exp_name` to `` and set `ori_data_format` to `tplinker`: ```python [{ "id": , "text": , "relation_list": [{ "subject": , "subj_char_span": , # e.g [3, 10] This key is optional. If no this key, set "add_char_span" to true in "build_data_config.yaml" when you build the data "object":