modelee/glm-large-chinese

language

Model description

glm-large-chinese is pretrained on the WuDaoCorpora dataset. It has 24 transformer layers, with hidden size 1024 and 16 attention heads in each layer. The model is pretrained with autoregressive blank filling objectives designed for natural language understanding, seq2seq, and language modeling.

How to use

Please refer the instruction in our Github repo.

We use three different mask tokens for different tasks: [MASK] for short blank filling, [sMASK] for sentence filling, and [gMASK] for left to right generation. You can find examples about different masks from here. The prediction always begin with a special <|startofpiece|> token and ends with a <|endofpiece|> token.

Citation

Please cite our paper if you find this code useful for your research:

@article{DBLP:conf/acl/DuQLDQY022,
  author    = {Zhengxiao Du and
               Yujie Qian and
               Xiao Liu and
               Ming Ding and
               Jiezhong Qiu and
               Zhilin Yang and
               Jie Tang},
  title     = {{GLM:} General Language Model Pretraining with Autoregressive Blank Infilling},
  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational
               Linguistics (Volume 1: Long Papers), {ACL} 2022, Dublin, Ireland,
               May 22-27, 2022},
  pages     = {320--335},
  publisher = {Association for Computational Linguistics},
  year      = {2022},
}