# glyce **Repository Path**: li-rr/glyce ## Basic Information - **Project Name**: glyce - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-10-22 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Glyce: Glyph-vectors for Chinese Character Representations **Glyce** is an open-source toolkit built on top of PyTorch and is developed by [Shannon.AI](http://www.shannonai.com). ## Citation To appear in NeurIPS 2019. [Glyce: Glyph-vectors for Chinese Character Representations](https://arxiv.org/abs/1901.10125) (Yuxian Meng*, Wei Wu*, Fei Wang*, Xiaoya Li*, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun and Jiwei Li, 2019) ``` @article{meng2019glyce, title={Glyce: Glyph-vectors for Chinese Character Representations}, author={Meng, Yuxian and Wu, Wei and Wang, Fei and Li, Xiaoya and Nie, Ping and Yin, Fan and Li, Muyu and Han, Qinghong and Sun, Xiaofei and Li, Jiwei}, journal={arXiv preprint arXiv:1901.10125}, year={2019} } ```

## Table of Contents - [What is Glyce ?](#What-is-Glyce-?) - [Experiment Results](#Experiment-Results) - [Sequence Labeling Tasks](#1.-Sequence-Labeling-Tasks) - [Sentence Pair Classification](#2.-Sentence-Pair-Classification) - [Single Sentence Classification](#3.-Single-Sentence-Classification) - [Chinese Semantic Role Labeling](#4.-Chinese-Semantic-Role-Labeling) - [Chinese Dependency Parsing](#5.-Chinese-Dependency-Parsing) - [Requirements](#Requirements) - [Installation](#Installation) - [Quick Start of Glyce](#Quick-Start-of-Glyce) - [Usage of Glyce Char/Word Embedding](#Usage-of-Glyce-Char/Word-Embedding) - [Usage of Glyce-BERT](#Usage-of-Glyce-BERT) - [Folder Description](#Folder-Description) - [Implementation](#Implementation) - [Download Task Data](#Download-Task-Data) - [Welcome Contributions to Glyce Open Source Project](#Welcome-Contributions-to-Glyce-Open-Source-Project ) - [Acknowledgement](#Ackowledgement) - [Contact](#Contact) ## What is Glyce ? Glyce is a Chinese char representation based on Chinese glyph information. Glyce Chinese char embeddings are composed by two parts: (1) glyph-embeddings and (2) char-ID embeddings. The two parts are combined using concatenation, a highway network or a fully connected layer. Glyce word embeddings are also composed by two parts: (1) word glyph-embeddings and (2) word-ID embeddings. Glyph word embedding is the output of forwarding glyph embeddings for each char in the word through the max-pooling layer. Then glyph word embedding combine word-ID embedding using concatenation to get glyce word embedding. Here are the some highlights in glyce: #### 1. Utilize Chinese Logographic Information Glyce utilizes useful logographic information by encoding images of historical and contemporary scripts, along with the scripts of different writing styles. #### 2. Combine Glyce with Chinese Pre-trained BERT Model We combine Glyce with Pre-trained Chinese BERT model and adopt specific layer to downstream tasks. The Glyce-BERT model outperforms BERT and sets new SOTA results for tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks. #### 3. Propose Tianzige-CNN(田字格) to Model Chinese Char We propose the Tianzige-CNN(田字格) structure, which is tailored to Chinese character modeling. Tianzige-CNN(田字格) tackle the issue of small number of Chinese characters and small sacle of image compared to Imagenet. #### 4. Auxiliary Task Performs As a Regularizer During training process, image-classification loss performs as an auxiliary training objective with the purpose of preventing overfitting and promiting the model's ability to generalize. ## Experiment Results ### 1. Sequence Labeling Tasks #### Named Entity Recognition (NER) MSRA(Levow, 2006), OntoNotes 4.0(Weischedel et al., 2011), Resume(Zhang et al., 2018). Model | P(onto) | R(onto) | F1(onto) | P(msra) | R(msra) | F1(msra) ---------- | ------ | ------ | ------ | ------ | ------ | ------ CRF-LSTM | 74.36 | 69.43 | 71.81 | 92.97 | 90.62 | 90.95 Lattice-LSTM | 76.35 | 71.56 | 73.88 | 93.57 | 92.79 | 93.18 Lattice-LSTM+Glyce | 82.06 | 68.74 | 74.81 | 93.86 | 93.92 | 93.89 | | | | **(+0.93)** | | | **(+0.71)** BERT | 78.01 | 80.35 | 79.16 | 94.97 | 94.62 | 94.80 ERNIE | - | - | - | - | - | 93.8 Glyce+BERT | 81.87 | 81.40 | 81.63 | 95.57 | 95.51 | 95.54 | | | | **(+2.47)** | | | **(+0.74)** Model | P(resume) | R(resume) | F1(resume) | P(weibo) | R(weibo) | F1(weibo) ---------- | ------ | ------ | ------ | ------ | ------ | ------ CRF-LSTM | 94.53 | 94.29 | 94.41 | 51.16 | 51.07 | 50.95 Lattice-LSTM | 94.81 | 94.11 | 94.46 | 52.71 | 53.92 | 53.13 Lattice-LSTM+Glyce | 95.72 | 95.63 | 95.67 | 53.69 | 55.30 | 54.32 | | | | **(+1.21)** | | | **(+1.19)** BERT | 96.12 | 95.45 | 95.78 | 67.12 | 66.88 | 67.33 Glyce+BERT | 96.62 | 96.48 | 96.54 | 67.68 | 67.71 | 67.60 | | | | **(+0.76)** | | | **(+0.76)** #### Chinese Part-Of-Speech Tagging (POS) CTB5, CTB6, CTB9 and UD1 Model | P(ctb5) | R(ctb5) | F1(ctb5) | P(ctb6) | R(ctb6) | F1(ctb6) ---------- | ------ | ------ | ------ | ------ | ------ | ------ (Shao, 2017) (sig) | 93.68 | 94.47 | 94.07 | - | - | 90.81 (Shao, 2017) (ens) | 93.95 | 94.81 | 94.38 | - | - | Lattice-LSTM | 94.77 | 95.51 | 95.14 | 92.00 | 90.86 | 91.43 Glyce+Lattice-LSTM | 95.49 | 95.72 | 95.61 | 92.72 | 91.14 | 91.92 | | | | **(+0.47)** | | | **(+0.49)** BERT | 95.86 | 96.26 | 96.06 | 94.91 | 94.63 | 94.77 Glyce+BERT | 96.50 | 96.74 | 96.61 | 95.56 | 95.26 | 95.41 | | | | **(+0.55)** | | | **(+0.55)** Model | P(ctb9) | R(ctb9) | F1(ctb9) | P(ud1) | R(ud1) | F1(ud1) ---------- | ------ | ------ | ------ | ------ | ------ | ------ (Shao, 2017) (sig) | 91.81 | 94.47 | 91.89 | 89.28 | 89.54 | 89.41 (Shao, 2017) (ens) | 92.28 | 92.40 | 92.34 | 89.67 | 89.86 | 89.75 Lattice-LSTM | 92.53 | 91.73 | 92.13 | 90.47 | 89.70 | 90.09 Glyce+Lattice-LSTM | 92.28 | 92.85 | 92.38 | 91.57 | 90.19 | 90.87 | | | | **(+0.25)** | | | **(+0.78)** BERT | 92.43 | 92.15 | 92.29 | 95.42 | 94.97 | 95.19 Glyce+BERT | 93.49 | 92.84 | 93.15 | 96.19 | 96.10 | 96.14 | | | | **(+0.86)** | | | **(+1.35)** #### Chinese Word Segmentation (CWS) PKU, CITYU, MSR and AS. Model | P(pku) | R(pku) | F1(pku) | P(cityu) | R(cityu) | F1(cityu) ---------- | ------ | ------ | ------ | ------ | ------ | ------ BERT | 96.8 | 96.3 | 96.5 | 97.5 | 97.7 | 97.6 Glyce+BERT | 97.1 | 96.4 | 96.7 | 97.9 | 98.0 | 97.9 | | | | **(+0.2)** | | | **(+0.3)** Model | P(msr) | R(msr) | F1(msr) | P(as) | R(as) | F1(as) ---------- | ------ | ------ | ------ | ------ | ------ | ------ BERT | 98.1 | 98.2 | 98.1 | 96.7 | 96.4 | 96.5 Glyce+BERT | 98.2 | 98.3 | 98.3 | 96.6 | 96.8 | 96.7 | | | | **(+0.2)** | | | **(+0.2)** ### 2. Sentence Pair Classification #### Dataset Description The BQ corpus, XNLI, LCQMC, NLPCC-DBQA Model | P(bq) | R(bq) | F1(bq) | Acc(bq) | P(lcqmc) | R(lcqmc) | F1(lcqmc) | Acc(lcqmc) -------------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ BiMPM | 82.3 | 81.2 | 81.7 | 81.9 | 77.6 | 93.9 | 85.0 | 83.4 BiMPM+Glyce | 81.9 | 85.5 | 83.7 | 83.3 | 80.4 | 93.4 | 86.4 | 85.3 | | | | **(+2.0)** | **(+1.4)** | | | **(+1.4)** | **(+1.9)** BERT | 83.5 | 85.7 | 84.6 | 84.8 | 83.2 | 94.2 | 88.2 | 87.5 ERNIE | - | - | - | - | - | - | - | 87.4 Glyce+BERT | 84.2 | 86.9 | 85.5 | 85.8 | 86.8 | 91.2 | 88.8 | 88.7 | | | | **(+0.9)** | **(+1.0)** | | | **(+0.6)** | **(+1.2)** Model | Acc(xnli) | P(nlpcc-dbqa) | R(nlpcc-dbqa) | F1(nlpcc-dbqa) -------- | -------- | ------ | ------ | ------ BIMPM | 67.5 | 78.8 | 56.5 | 65.8 BIMPM+Glyce | 67.7 | 76.3 | 59.9 | 67.1 | | **(+0.2)** | | | **(+1.3)** BERT | 78.4 | 79.6 | 86.0 | 82.7 ERNIE | 78.4 | - | - | 82.7 Glyce+BERT | 79.2 | 81.1 | 85.8 | 83.4 | | **(+0.8)** | | | **(+0.7)** ### 3.Single Sentence Classification #### Dataset Description ChnSentiCorp, Fudan and Ifeng. Model | ChnSentiCorp | Fudan | iFeng -------------- | ------ | ------ | ------ LSTM | 91.7 | 95.8 | 84.9 LSTM+Glyce | 93.1 | 96.3 | 85.8 | | **(+1.4)** | **(+0.5)** | **(+0.9)** BERT | 95.4 | 99.5 | 87.1 ERNIE | 95.4 | - | - Glyce+BERT | 95.9 | 99.8 | 87.5 | | **(+0.5)** | **(+0.3)** | **(+0.4)** ### 4. Chinese Semantic Role Labeling #### Dataset Description CoNLL-2009 #### Model Performance Model | Precision | Recall | F1 -------------- | ------ | ------ | ------ Roth and Lapata (2016) | 76.9 | 73.8 | 75.3 Marcheggiani and Titov (2017) | 84.6 | 80.4 | 82.5 K-order pruning (He et al., 2018) | 84.2 | 81.5 | 82.8 K-order pruning + Glyce-word | 85.4 | 82.1 | 83.7 | | **(+0.8)** | **(+0.6)** | **(+0.9)** ### 5. Chinese Dependency Parsing #### Dataset Description Chinese Penn TreeBank 5.1. Dataset splits follows (Dozat and Manning, 2016). #### Model Performance Model | UAS | LAS -------------- | ------ | ------ Ballesteros et al. (2016) | 87.7 | 86.2 Kiperwasser and Goldberg (2016) | 87.6 | 86.1 Cheng et al. (2016) | 88.1 | 85.7 Biaffine | 89.3 | 88.2 Biaffine+Glyce-word | 90.2 | 89.0 | | **(+0.9)** | **(+0.8)** ## Requirements - Python Version >= 3.6 - GPU, Use NVIDIA TITAN Xp with 12G RAM - Chinese scripts could be found in [Google Drive](https://drive.google.com/file/d/1TxY_Z_SdvIW-7BnXmjDE3gpfpVEzu22_/view?usp=sharing). Please refer to the [description](./glyce/fonts/README.md) and download scripts files to `glyce/glyce/fonts/`. **NOTE**: Some experimental results are obtained by training on multi-GPU machines. May use DIFFERENT PyTorch versions refer to previous open-source SOTA models. Experiment environment for Glyce-BERT is Python 3.6 and PyTorch 1.10. ## Installation ```bash # Clone glyce git clone git@github.com:ShannonAI/glyce.git cd glyce python3.6 setup.py develop # Install package dependency pip install -r requirements.txt ``` ## Quick start of Glyce ### Usage of Glyce Char/Word Embedding ```python import torch import torch.nn as nn from torch.nn import CrossEntropyLoss from glyce import GlyceConfig from glyce import CharGlyceEmbedding, WordGlyceEmbedding # load glyce hyper-params glyce_config = GlyceConfig() # if input ids is char-level glyce_embedding = CharGlyceEmbedding(glyce_config) # elif input ids is word-level glyce_embedding = WordGlyceEmbedding(glyce_config) # forward input_ids into embedding layer glyce_embedding, glyph_loss = glyce_embedding(input_ids) # utilize image classifiation loss as the auxiliary loss glyph_decay = 0.1 glyph_ratio = 0.01 # the proportion of image classification loss to total loss current_epoch = 3 # compute loss loss_fct = CrossEntropyLoss() task_loss = loss_fct(logits, labels) total_loss = task_loss * (1 - glyce_ratio) + glyph_ratio * glyph_decay ** (current_epoch+1) * glyph_loss ``` ### Usage of Glyce-BERT #### 1. Preparation - Download and unzip [`BERT-Base, Chinese`](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip) pretrained model. - Install PyTorch pretrained bert by pip as follows: ```bash pip install pytorch-pretrained-bert ``` - Convert TF checkpoint into PyTorch ```shell export BERT_BASE_DIR=/path/to/bert/chinese_L-12_H-768_A-12 pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch \ $BERT_BASE_DIR/bert_model.ckpt \ $BERT_BASE_DIR/bert_config.json \ $BERT_BASE_DIR/pytorch_model.bin ``` #### 2. Train/Dev/Test DataFormat - Sequence Labeling Task ```text 模型的输入: [CLS] 我爱北京天安门模型的输出: ‘我爱北京天安门' 对应的命名实体标签 O O B-LOC M-LOC M-LOC M-LOC E-LOC Model Input: [CLS] I like Bei Jing Tian An Men Model Output: labels corresponds to 'I like BeiJing Tian An men' are O O B-LOC M-LOC M-LOC M-LOC E-LOC ``` - Sentence Pair Classification ```text 模型的输入: [CLS] 我爱北京天安门 [SEP] 北京欢迎您模型的输出: [CLS] 位置对应的输出,假如是语意匹配任务则为 0,代表两句话的语意不相同 Model Input: [CLS] I like Bei Jing Tian An Men [SEP] Bei Jing Wel ##come you Model Output: the output in [CLS] position. 0 if task sign is semantic matching. It means that the semantic of two sentences are different. ``` - Single Sentence Classification ```text 模型的输入: [CLS] 我爱北京天安门模型的输出: [CLS] 位置对应的输出,假如是情感分类任务则为1,代表情感为积极 Model Input: [CLS] I like Bei Jing Tian An Men. Model Output: the output in [CLS] position. 1 if task is sentiment analysis. It means the sentiment polarity is positive. ``` #### 3. Start Train and Evaluate Glyce-BERT * `scritps/*_bert.sh` are the commands we used to finetune BERT. * `scripts/*_glyce_bert.sh` are the commands we used to obtained the results of Glyce-BERT. * `scripts/ctb5_binaffine.sh` is the command that we used to reimplement PREVIOUS SOTA result on CTB5 for dependency parsing. * `scripts/ctb5_glyce_binaffine.sh` is the command that we used to obtain the SOTA result on CTB5 for dependency parsing. For example, training command of Glyce-BERT for sentence pair dataset BQ is included in `scripts/glyce_bert/bq_glyce_bert.sh`. Start train and evaluate Glyce-BERT on BQ by, ```bash bash scripts/glyce_bert/bq_glyce_bert.sh ``` Notes: - `repo_path` refer to work directory of glyce. - `config_path` refer to the path of configuration file. - `bert_model` refer to the the directory of the pre-trained Chinese BERT model. - `task_name` refer to the task signiature. - `data_dir` and `output_dir` refer to the directories of the "raw data" and "intermediate checkpoints" respectively. ## Folder Description ## Implementation Glyce toolkit provides implementations of previous SOTA models incorporated with Glyce embeddings. - [Glyce: Glyph-vectors for Chinese Character Representations](https://arxiv.org/abs/1901.10125). Refer to [./glyce/models/glyce_bert](./glyce/models/glyce_bert) - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805). Refer to [./glyce/models/bert](./glyce/models/bert) - [Chinese NER Using Lattice LSTM](https://arxiv.org/abs/1805.02023). Refer to [./glyce/models/latticeLSTM](./glyce/models/latticeLSTM) - [Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF](https://arxiv.org/abs/1704.01314). Refer to [./glyce/models/latticeLSTM/model](./glyce/models/latticeLSTM) - [Syntax for Semantic Role Labeling, To Be, Or Not To Be](https://www.aclweb.org/anthology/P18-1192). Refer to [./glyce/models/srl](./glyce/models/srl) - [Bilateral Multi-Perspective Matching for Natural Language Sentences](https://arxiv.org/abs/1702.03814). Refer to [./glyce/models/bimpm](./glyce/models/bimpm) - [Deep Biaffine Attention for Neural Dependency Parsing](https://arxiv.org/abs/1611.01734). Refer to [./glyce/models/biaffine](./glyce/models/biaffine) ## Download Task Data [Download Task Data](./docs/dataset_download.md) ## Welcome Contributions to Glyce Open Source Project We actively welcome researchers and practitioners to contribute to Glyce open source project. Please read this [Guide](https://help.github.com/en/articles/creating-a-pull-request) and submit your **Pull Request**. ## Acknowledgement [Acknowledgement](./docs/acknowledge.md). Vanilla Glyce is developed based on the previous SOTA model. Glyce-BERT is developed based on [PyTorch implementation by HuggingFace](https://github.com/huggingface/pytorch-pretrained-BERT). And pretrained BERT model is released by [Google's pre-trained models](https://github.com/google-research/bert). ## Contact Please feel free to discuss paper/code through issues or emails. ### License [Apache License 2.0](./LICENSE)