# OmniEvent **Repository Path**: orangego/OmniEvent ## Basic Information - **Project Name**: OmniEvent - **Description**: 清华OmniEvent - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-08-26 - **Last Updated**: 2025-09-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
# Table of Contents - [Table of Contents](#table-of-contents) - [Overview](#overview) - [Highlights](#highlights) - [Installation](#installation) - [With pip](#with-pip) - [From source](#from-source) - [Easy Start](#easy-start) - [Train your Own Model with OmniEvent](#train-your-own-model-with-omnievent) - [Step 1: Process the dataset into the unified format](#step-1-process-the-dataset-into-the-unified-format) - [Step 2: Set up the customized configurations](#step-2-set-up-the-customized-configurations) - [Step 3: Initialize the model and tokenizer](#step-3-initialize-the-model-and-tokenizer) - [Step 4: Initialize the dataset and evaluation metric](#step-4-initialize-the-dataset-and-evaluation-metric) - [Step 5: Define Trainer and train](#step-5-define-trainer-and-train) - [Step 6: Unified Evaluation](#step-6-unified-evaluation) - [Supported Datasets \& Models \& Contests](#supported-datasets--models--contests) - [Datasets](#datasets) - [Models](#models) - [Consistent Evaluation](#consistent-evaluation) - [1. Consistent data preprocessing](#1-consistent-data-preprocessing) - [2. Output Standardization](#2-output-standardization) - [3. Pipeline Evaluation](#3-pipeline-evaluation) - [Experiments](#experiments) - [Citation](#citation) # News ❗ [2024.12] In the spring of next year, we will build a new agent system based on LLMs and small models optimized for IE tasks. The system will be more powerful and general than the models in OmniEvent. The OmniEvent repository will only serve as introductory code for EE, and major updates will no longer be made in the future. [2024.10] We recently released a series of LLMs (**[ADELIE](https://huggingface.co/THU-KEG/ADELIE-SFT-1.5B)**) trained for information extraction, which includes event extraction tasks. Although its performance underperforms specialized small models, such as BERT, its general capabilities and ability to learn from schemas in context are impressive. Welcome to try! Link: https://huggingface.co/THU-KEG/ADELIE-SFT-1.5B. # Overview OmniEvent is a powerful open-source toolkit for **event extraction**, including **event detection** and **event argument extraction**. We comprehensively cover various paradigms and provide fair and unified evaluations on widely-used **English** and **Chinese** datasets. Modular implementations make OmniEvent highly extensible. ## Highlights - **Comprehensive Capability** - Support to do ***Event Extraction*** at once, and also to independently do its two subtasks: ***Event Detection***, ***Event Argument Extraction***. - Cover various paradigms: ***Token Classification***, ***Sequence Labeling***, ***MRC(QA)*** and ***Seq2Seq***. - Implement ***Transformer-based*** ([BERT](https://arxiv.org/pdf/1810.04805.pdf), [T5](https://arxiv.org/pdf/1910.10683.pdf), etc.) and ***classical*** ([DMCNN](https://aclanthology.org/P15-1017.pdf), [CRF](http://www.cs.cmu.edu/afs/cs/Web/People/aladdin/papers/pdfs/y2001/crf.pdf), etc.) models. - Both ***Chinese*** and ***English*** are supported for all event extraction sub-tasks, paradigms and models. - **Unified Benchmark & Evaluation** - Various datasets are processed into a [unified format](https://github.com/THU-KEG/OmniEvent/tree/main/scripts/data_processing#unified-omnievent-format). - Predictions of different paradigms are all converted into a [unified candidate set](https://github.com/THU-KEG/OmniEvent/tree/main/OmniEvent/evaluation#convert-the-predictions-of-different-paradigms-to-a-unified-candidate-set) for fair evaluations. - Four [evaluation modes](https://github.com/THU-KEG/OmniEvent/tree/main/OmniEvent/evaluation#provide-four-standard-eae-evaluation-modes) (**gold**, **loose**, **default**, **strict**) well cover different previous evaluation settings. - **Modular Implementation** - All models are decomposed into four modules: - **Input Engineering**: Prepare inputs and support various input engineering methods like prompting. - **Backbone**: Encode text into hidden states. - **Aggregation**: Fuse hidden states (e.g., select [CLS], pooling, GCN) to the final event representation. - **Output Head**: Map the event representation to the final outputs, such as Linear, CRF, MRC head, etc. - You can combine and reimplement different modules to design and implement your own new model. - **Big Model Training & Inference** - Efficient training and inference of big event extraction models are supported with [BMTrain](https://github.com/OpenBMB/BMTrain). - **Easy to Use & Highly Extensible** - Open datasets can be downloaded and processed with a single command. - Fully compatible with 🤗 [Transformers](https://github.com/huggingface/transformers) and its [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer). - Users can easily reproduce existing models and build customized models with OmniEvent. # Installation ## With pip This repository is tested on Python 3.9+, Pytorch 1.12.1+. OmniEvent can be installed with pip as follows: ```shell pip install OmniEvent ``` ## From source If you want to install the repository from local source, you can install as follows: ```shell pip install . ``` And if you want to edit the repositoy, you can ```shell pip install -e . ``` # Easy Start OmniEvent provides several off-the-shelf models for the users. Examples are shown below. *Make sure you have installed OmniEvent as instructed above. Note that it may take a few minutes to download checkpoint at the first time.* ```python >>> from OmniEvent.infer import infer >>> # Even Extraction (EE) Task >>> text = "2022年北京市举办了冬奥会" >>> results = infer(text=text, task="EE") >>> print(results[0]["events"]) [ { "type": "组织行为开幕", "trigger": "举办", "offset": [8, 10], "arguments": [ { "mention": "2022年", "offset": [9, 16], "role": "时间"}, { "mention": "北京市", "offset": [81, 89], "role": "地点"}, { "mention": "冬奥会", "offset": [0, 4], "role": "活动名称"}, ] } ] >>> text = "U.S. and British troops were moving on the strategic southern port city of Basra \ Saturday after a massive aerial assault pounded Baghdad at dawn" >>> # Event Detection (ED) Task >>> results = infer(text=text, task="ED") >>> print(results[0]["events"]) [ { "type": "attack", "trigger": "assault", "offset": [113, 120]}, { "type": "injure", "trigger": "pounded", "offset": [121, 128]} ] >>> # Event Argument Extraction (EAE) Task >>> results = infer(text=text, triggers=[("assault", 113, 120), ("pounded", 121, 128)], task="EAE") >>> print(results[0]["events"]) [ { "type": "attack", "trigger": "assault", "offset": [113, 120], "arguments": [ { "mention": "U.S.", "offset": [0, 4], "role": "attacker"}, { "mention": "British", "offset": [9, 16], "role": "attacker"}, { "mention": "Saturday", "offset": [81, 89], "role": "time"} ] }, { "type": "injure", "trigger": "pounded", "offset": [121, 128], "arguments": [ { "mention": "U.S.", "offset": [0, 4], "role": "attacker"}, { "mention": "Saturday", "offset": [81, 89], "role": "time"}, { "mention": "British", "offset": [9, 16], "role": "attacker"} ] } ] ``` # Train your Own Model with OmniEvent OmniEvent can help users easily train and evaluate their customized models on specific datasets. We show a step-by-step example of using OmniEvent to train and evaluate an ***Event Detection*** model on ***ACE-EN*** dataset in the ***Seq2Seq*** paradigm. More examples are shown in [examples](./examples). ## Step 1: Process the dataset into the unified format We provide standard data processing scripts for several commonly-used datasets. Checkout the details in [scripts/data_processing](./scripts/data_processing). ```shell dataset=ace2005-en # the dataset name cd scripts/data_processing/$dataset bash run.sh ``` ## Step 2: Set up the customized configurations We keep track of the configurations of dataset, model and training parameters via a single `*.yaml` file. See [./configs](./configs) for details. ```python >>> from OmniEvent.arguments import DataArguments, ModelArguments, TrainingArguments, ArgumentParser >>> from OmniEvent.input_engineering.seq2seq_processor import type_start, type_end >>> parser = ArgumentParser((ModelArguments, DataArguments, TrainingArguments)) >>> model_args, data_args, training_args = parser.parse_yaml_file(yaml_file="config/all-datasets/ed/s2s/ace-en.yaml") >>> training_args.output_dir = 'output/ACE2005-EN/ED/seq2seq/t5-base/' >>> data_args.markers = ["