# Chinese_NRE **Repository Path**: jeave/Chinese_NRE ## Basic Information - **Project Name**: Chinese_NRE - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-07-30 - **Last Updated**: 2021-11-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Chinese_NRE 中文关系抽取 # 数据集：人物关系抽取比赛https://biendata.com/competition/ccks_2019_ipre/leaderboard/ 备用下载链接：https://pan.baidu.com/s/1JR7L_pCIXFLLjrbRSOJw9A 提取码：obn7 关系： ![image](https://github.com/Mryangkaitong/Chinese_NRE/blob/master/photo/people_relation.png) # 模型这里进行了三方面的尝试一个是使用OpenNRE，这是一个清华开发的API:https://github.com/thunlp/OpenNRE 对应于openNRE文件夹下一个是(BGRU+2ATT)网络：https://github.com/squirrel1982/TensorFlow-NRE 最后一个是一个简单的版本对应于bag_sent文件夹下 OpenNRE其目前只实现了bag方式的单标签，没有实现多标签，且没有sent方式，不过现在好像正在开发，大家可以期待，对其感兴趣的同学可以关注： https://github.com/thunlp/OpenNRE/tree/nrekit 所以准确来说，OpenNRE并不适用该比赛，为此，为了进一步展示bag方式（多标签）和sent这种形式，这里会结合比赛的给出的baseline的代码进行实践，补存实现pcnn,rnn,cnn(目前只有sentences)等即bag_sent文件夹下 baseline：https://github.com/ccks2019-ipre/baseline 还有就是GRU，其官方对比效果好于OpenNRE，其实可以看成是一个改进版吧 https://github.com/squirrel1982/TensorFlow-NRE 这也是本篇主要参考几篇资料，第一部分和第三部分在该比赛中效果不好，这里之所以讲主要目的就是介绍一下其使用流程，以便有需要的场合使用。关于本比赛的部分解决方案可以直接看bag_sent部分 # 开始首先下载后数据集后，训练词向量

Python word2vec_train.py

## OpenNRE 因为其只能处理单标签，这里就用sent的数据一：在OpenNRE文件夹下创建/data/people_relation/文件夹，将训练好的词向量和解压的数据放入二：将txt转化成json

 python txt2json.py

三：训练

 python train_demo.py people_relation  pcnn att

四：性能

 python draw_plot.py people_relation_pcnn_att

五:预测

 python test_demo.py people_relation  pcnn  att

六：转化为txt

 python json2txt.py people_relation_pcnn_att

说明：entity文件夹下的两个脚本对应的是在转化过程中将实体对单独编码id，这个逻辑上面更通一些 ## bag_sent 在该文件加下创建/data文件夹，将解压的数据和词向量放入一：下载bert模型因为这里看了一下bert，所以需要下载训练好的bert模型，链接：https://pan.baidu.com/s/1ZuiOLCSluMCyVp3HhvCexw 提取码：rhza , 下载好后将其解压放到bag_sent/bert_model/文件夹下二：训练假设使用cnn 训练sent模式

 baseline.py --encoder rnn --level sent

假设使用pcnn 训练bag模式

 baseline.py --encoder rnn --level bag

四：预测

 baseline.py --encoder rnn --level bag --mode test

会在当前文件夹生成结果 # BGRU 一：将解压的数据放入origin_data目录下二：数据预处理

 
python initial.py

三：训练


python train_GRU.py

其中它会自动调用test_GRU.py验证其在dev上面的性能四：预测结果

 
python predict_GRU.py 2643

其中2643是加载2643模型，可以加载别的，具体看model下面有哪些即可 # 部分结果： ## OpenNRE ![image](https://github.com/Mryangkaitong/Chinese_NRE/blob/master/photo/OpenNRE_rnn_one.png) ![image](https://github.com/Mryangkaitong/Chinese_NRE/blob/master/photo/OpenNRE_pcnn_att.png) ## bag_sent ![image](https://github.com/Mryangkaitong/Chinese_NRE/blob/master/photo/sent_bag_result.png) 更新(此时比赛已经结束，没办法验证线上，下面是线下的结果)：一 input+双层birnn（lstm）+attention_1 sent: 0.2515 二 input+单层birnn（gru）+attention_1 sent:0.2693 三 input+cnn bag:0.2819 四 input+双层birnn+attention_2 sent:0.2698 五 input（cnn）+双层birnn+attention_2 sent:0.2088 六双层birnn+attention_1+level_1 sent:0.254 七双层birnn+attention_1+MASK sent:0.272969 八双层birnn（gru）+attention_1+drop sent:0.276112 九双层birnn（gru）+attention_1+drop+MASK sent:0.264407 十双层birnn（gru）+attention_1+drop+shuffle sent:0.256494 十一双层birnn（gru）+attention_1+drop+shuffle+MASK sent:0.280276 十二双层birnn（gru）+attention_1+shuffle+MASK sent:0.271749 十三双层birnn（gru）+MultiHeadAttention+drop sent:0.273533 ## BGRU ![image](https://github.com/Mryangkaitong/Chinese_NRE/blob/master/photo/BGRU.png) 需要说明的是这里个f1是35类的f1结果，而bag_sent部分的f1都是除去NA类别的f1 # 详细解析： https://blog.csdn.net/weixin_42001089/article/details/95493249 # 其他探索：一个基于bert的三元组关系抽取 https://blog.csdn.net/weixin_42001089/article/details/97657149 一个半监督的关系抽取： https://github.com/Mryangkaitong/python-Machine-learning/tree/master/deepdive