# ME-CNER **Repository Path**: greitzmann/ME-CNER ## Basic Information - **Project Name**: ME-CNER - **Description**: Code for CIKM 2019 paper "Exploiting Multiple Embeddings for Chinese Named Entity Recognition". - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-11-11 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ME-CNER Code for CIKM 2019 paper ["Exploiting Multiple Embeddings for Chinese Named Entity Recognition"](https://arxiv.org/abs/1908.10657). ## Citation If you use this code in your work, please kindly cite our work: ```bibtex @inproceedings{cikm19:xu, author = {Canwen Xu and Feiyang Wang and Jialong Han and Chenliang Li}, title = {Exploiting Multiple Embeddings for Chinese Named Entity Recognition}, booktitle = {The 28th ACM International Conference on Information and Knowledge Management, {CIKM} 2019, Beijing, China, November 3-7, 2019}, publisher = {{ACM}}, year = {2019}, url = {https://doi.org/10.1145/3357384.3358117}, doi = {10.1145/3357384.3358117} } ``` ## Requirement Python: 3.6 Keras: 2.2.2 Keras-contrib: 2.0.8 jieba: 0.39 ## Dataset We use a standard Weibo NER dataset provided by [Peng and Dredze, 2015](http://aclweb.org/anthology/D/D15/D15-1064.pdf), and a formal MSRA News dataset provided by [Levow, 2006](https://www.aclweb.org/anthology/W06-0115). ## Pretrained Embeddings The pretrained character and word embeddings are provided by [Tencent AI Lab](https://ai.tencent.com/ailab/nlp/embedding.html). Download it [here](https://ai.tencent.com/ailab/nlp/data/Tencent_AILab_ChineseEmbedding.tar.gz). The radical embedding is randomly initialized. ## How to Run 1. Install all requirements ```shell pip install keras==2.2.2 # for Keras pip install git+https://www.github.com/keras-team/keras-contrib.git # for CRF layer pip install jieba # for word segmentation ``` 2. Download pretrained embeddings Download [Tencent Embeddings](https://ai.tencent.com/ailab/nlp/data/Tencent_AILab_ChineseEmbedding.tar.gz), extract it and put it in `process_data/data_preprocess`. 3. Run the pre-processing code ```shell python concat_data.py ``` 4. Run the model (with different config) ```shell python main.py --dataset ${weibo/msra} --with_radical ${1/0} --network ${convgru/cnn/bilstm} --tagger ${bigrucrf/bilstmcrf} --entity_type ${all/nm/ne} ``` ``` dataset: weibo msra with_radical: # input radical embedding or not 0 # no radical embedding input, only word embedding and char embedding 1 # with radical embedding network: # for characters convgru # Conv-GRU bilstm cnn tagger: bigrucrf # Bidirectional GRU-CRF bilstmcrf # Bidirectional LSTM-CRF entity_type: ne # only Named Entity. e.g. 王小明 (Xiaoming Wang), 北京市 (Beijing City) nm # only Nominal Mention. e.g. 班长 (class president), 妈妈 (mother) all # take both Named Entity and Nominal Mention into accounts ``` For example, run the following shell to run our final ME-CNER model on WEIBO dataset, but only recognize named entities (all nominal mentions are ignored). ```shell python main.py --dataset weibo --with_radical 1 --network convgru --tagger bigrucrf --entity_type ne ```