# LegalPLMs **Repository Path**: minimum-generated-pig/LegalPLMs ## Basic Information - **Project Name**: LegalPLMs - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-01-11 - **Last Updated**: 2025-01-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Lawformer ### Introduction This repository provides the source code and checkpoints of the paper "Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents". You can download the checkpoint of Lawformer from the [huggingface model hub](https://huggingface.co/xcjthu/Lawformer) or from [here](https://data.thunlp.org/legal/Lawformer.zip). Besides, the checkpoint of our baseline model, Legal RoBERTa, can be downloaded from [here](https://data.thunlp.org/legal/LegalRoBERTa.zip). The new judgement prediction dataset, CAIL-Long, can be downloaded from [here](https://data.thunlp.org/legal/CAIL-Long.tar.gz). ### Installation ``` pip install -r requirements.txt ``` ### Easy Start We have uploaded our model to the huggingface model hub. Make sure you have installed transformers. ```python >>> from transformers import AutoModel, AutoTokenizer >>> tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext") >>> model = AutoModel.from_pretrained("thunlp/Lawformer") >>> inputs = tokenizer("任某提起诉讼,请求判令解除婚姻关系并对夫妻共同财产进行分割。", return_tensors="pt") >>> outputs = model(**inputs) ``` ### Pre-training We pre-train Lawformer continuously from `hfl/chinese-roberta-wwm-ext`. Therefore, we first convert the RoBERTa model to the Longformer by running the following command: ``` python3 convert_roberta_lfm.py ``` Then run the following command to pre-train the model: ``` python3 -m torch.distributed.launch --master_port 10086 --nproc_per_node 8 train.py -c config/Lawformer.config -g 0,1,2,3,4,5,6,7 ``` ### Cite If you use the pre-trained models, please cite this paper: ``` @article{xiao2021lawformer, title={Lawformer: A Pre-trained Language Model forChinese Legal Long Documents}, author={Xiao, Chaojun and Hu, Xueyu and Liu, Zhiyuan and Tu, Cunchao and Sun, Maosong}, year={2021} } ```