# Chinese_NLU_by_using_RASA_NLU **Repository Path**: MicrophoneBen/Chinese_NLU_by_using_RASA_NLU ## Basic Information - **Project Name**: Chinese_NLU_by_using_RASA_NLU - **Description**: 使用 RASA NLU 来构建中文自然语言理解系统(NLU)| Use RASA NLU to build a Chinese Natural Language Understanding System (NLU) - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2020-06-14 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README [README written in English](README.en-US.md) ------------------------------ # 使用 RASA NLU 来构建中文自然语言理解系统(NLU) 本仓库提供前沿、详细和完备的中文自然语言理解系统构建指南。 ## 在线演示 TODO ## 特性 - 提供中文语料库 - 提供语料库转换工具,帮助用户转移语料数据 - 提供多种基于 RASA NLU 的中文语言处理流程 - 提供模型性能评测工具,帮助自动选择和优化模型 ## 系统要求 Python 3 (也许支持 python2, 但未经过良好测试) ## 处理流程 详情请访问 [workflow.md](workflow.md) ## 可用 pipeline 列表 ### MITIE+jieba #### 描述 * jieba 提供中文分词功能 * MITIE 负责 `intent classification` 和 `slot filling` #### 安装依赖的软件包 ```bash pip install git+https://github.com/mit-nlp/MITIE.git pip install jieba ``` #### 下载所需的模型数据 MITIE 需要一个模型文件,在本人的另一个项目: [MITIE_Chinese_Wikipedia_corpus](https://github.com/howl-anderson/MITIE_Chinese_Wikipedia_corpus) 的 [release](https://github.com/howl-anderson/MITIE_Chinese_Wikipedia_corpus/releases) 下载 `total_word_feature_extractor.dat.tar.gz`. 解压后将 `total_word_feature_extractor.dat` 放至 `data` #### pipeline ```yaml language: "zh" pipeline: - name: "nlp_mitie" model: "data/total_word_feature_extractor.dat" - name: "tokenizer_jieba" - name: "ner_mitie" - name: "ner_synonyms" - name: "intent_featurizer_mitie" - name: "intent_classifier_sklearn" ``` #### 训练脚本 ```bash trainer/MITIE+jieba.bash ``` #### 评估脚本 ```bash cross_validation/MITIE+jieba.bash ``` ### tensorflow_embedding #### 描述 * jieba 提供中文分词功能 * tensorflow_embedding 负责 `intent classification` * MITIE 负责 `slot filling` #### 安装依赖的软件包 ```bash pip install git+https://github.com/mit-nlp/MITIE.git pip install jieba pip install tensorflow ``` #### 下载所需的模型数据 MITIE 需要一个模型文件,在本人的另一个项目: [MITIE_Chinese_Wikipedia_corpus](https://github.com/howl-anderson/MITIE_Chinese_Wikipedia_corpus) 的 [release](https://github.com/howl-anderson/MITIE_Chinese_Wikipedia_corpus/releases) 下载 `total_word_feature_extractor.dat.tar.gz`. 解压后将 `total_word_feature_extractor.dat` 放至 `data` #### pipeline ```yaml language: "zh" pipeline: - name: "nlp_mitie" model: "data/total_word_feature_extractor.dat" - name: "tokenizer_jieba" - name: "intent_featurizer_count_vectors" - name: "intent_classifier_tensorflow_embedding" - name: "ner_mitie" - name: "ner_synonyms" ``` #### 训练脚本 ```bash trainer/tensorflow_embedding.bash ``` #### 评估脚本 ```bash cross_validation/tensorflow_embedding.bash ``` ### spacy #### 描述 * [Chinese_models_for_SpaCy](https://github.com/howl-anderson/Chinese_models_for_SpaCy) 负责 `intent classification` and `slot filling` #### 安装依赖的软件包 ```bash pip install https://github.com/howl-anderson/Chinese_models_for_SpaCy/releases/download/v2.0.3/zh_core_web_sm-2.0.3.tar.gz ./spacy_model_link.bash ``` #### pipeline ```yaml language: "zh" pipeline: - name: "nlp_spacy" model: "zh" - name: "tokenizer_spacy" - name: "intent_entity_featurizer_regex" - name: "intent_featurizer_spacy" - name: "ner_crf" - name: "ner_synonyms" - name: "intent_classifier_sklearn" ``` #### 训练脚本 ```bash trainer/spacy.bash ``` #### 评估脚本 ```bash cross_validation/spacy.bash ``` ## 性能测试 ### DialogFlow > weather
| Intent | Entity | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| train | test | train | test | |||||||||
| No | ACC | F1 | PRC | ACC | F1 | PRC | ACC | F1 | PRC | ACC | F1 | PRC |
| 1 | 0.986 | 0.986 | 0.986 | 0.665 | 0.631 | 0.648 | 0.987 | 0.987 | 0.988 | 0.967 | 0.968 | 0.973 |
| 2 | 0.990 | 0.990 | 0.990 | 0.434 | 0.406 | 0.432 | 0.987 | 0.987 | 0.988 | 0.968 | 0.970 | 0.975 |
| 3 | 0.992 | 0.992 | 0.992 | 0.657 | 0.598 | 0.587 | 0.987 | 0.987 | 0.988 | 0.939 | 0.934 | 0.947 |
| ACC: Accuracy; F1: F1-score; PRC: Precision; | ||||||||||||