3 Star 1 Fork 2

Gitee 极速下载 / elasticsearch-jieba-plugin

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
此仓库是为了提升国内下载速度的镜像仓库,每日同步一次。 原始仓库: https://github.com/sing1ee/elasticsearch-jieba-plugin
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

elasticsearch-jieba-plugin

20221201更新

  1. 新增分支:
  • 7.17.x分支,支持es 7.17.0,JDK版本:11.0.7, gradle版本:7.6
  • 8.4.1分支,支持es 8.4.1,JDK版本:18.0.2.1, gradle版本:7.6
  1. 当适配不同的ES版本,以及JDK版本,需要参考ES和JDK版本的对应关系
  2. 适配不同ES版本,修改以下文件,需要修改的地方已经注明
  • build.gradle
  • src/main/resources/plugin-descriptor.properties
  1. 需要切换不同的gradle版本,7.6是要切换的目标版本
gradle wrapper --gradle-version 7.6

jieba analysis plugin for elasticsearch: 7.7.0, 7.4.2, 7.3.0, 7.0.0, 6.4.0, 6.0.0 , 5.4.0, 5.3.0, 5.2.2, 5.2.1, 5.2.0, 5.1.2, 5.1.1

特点

  • 支持动态添加字典,不重启ES。

简单的修改,即可适配不同版本的ES

戳这里

支持动态添加字典,ES不需要重启

戳这里

有关jieba_index和jieba_search的应用

戳这里

新分词支持

如果是ES6.4.0的版本,请使用6.4.0分支最新的代码,或者master分支最新代码,也可以下载6.4.1的release,强烈推荐升级!

6.4.1的release,解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析

版本对应

分支 tag elasticsearch版本 Release Link
7.7.0 tag v7.7.1 v7.7.0 Download: v7.7.0
7.4.2 tag v7.4.2 v7.4.2 Download: v7.4.2
7.3.0 tag v7.3.0 v7.3.0 Download: v7.3.0
7.0.0 tag v7.0.0 v7.0.0 Download: v7.0.0
6.4.0 tag v6.4.1 v6.4.0 Download: v6.4.1
6.4.0 tag v6.4.0 v6.4.0 Download: v6.4.0
6.0.0 tag v6.0.0 v6.0.0 Download: v6.0.1
5.4.0 tag v5.4.0 v5.4.0 Download: v5.4.0
5.3.0 tag v5.3.0 v5.3.0 Download: v5.3.0
5.2.2 tag v5.2.2 v5.2.2 Download: v5.2.2
5.2.1 tag v5.2.1 v5.2.1 Download: v5.2.1
5.2 tag v5.2.0 v5.2.0 Download: v5.2.0
5.1.2 tag v5.1.2 v5.1.2 Download: v5.1.2
5.1.1 tag v5.1.1 v5.1.1 Download: v5.1.1

more details

  • choose right version source code.
  • run
git clone https://github.com/sing1ee/elasticsearch-jieba-plugin.git --recursive
./gradlew clean pz
  • copy the zip file to plugin directory
cp build/distributions/elasticsearch-jieba-plugin-5.1.2.zip ${path.home}/plugins
  • unzip and rm zip file
unzip elasticsearch-jieba-plugin-5.1.2.zip
rm elasticsearch-jieba-plugin-5.1.2.zip
  • start elasticsearch
./bin/elasticsearch

Custom User Dict

Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:

小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq

Using stopwords

  • find stopwords.txt in ${path.home}/plugins/jieba/dic.
  • create folder named stopwords under ${path.home}/config
mkdir -p {path.home}/config/stopwords
  • copy stopwords.txt into the folder just created
cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords
  • create index:
PUT http://localhost:9200/jieba_index
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonyms/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}
  • test analyzer:
PUT http://localhost:9200/jieba_index/_analyze
{
  "analyzer" : "my_ana",
  "text" : "黄河之水天上来"
}

Response as follow:

{
    "tokens": [
        {
            "token": "黄河",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "黄河之水天上来",
            "start_offset": 0,
            "end_offset": 7,
            "type": "word",
            "position": 0
        },
        {
            "token": "之水",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "天上",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "上来",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 2
        }
    ]
}

NOTE

migrate from jieba-solr

Roadmap

I will add more analyzer support:

  • stanford chinese analyzer
  • fudan nlp analyzer
  • ...

If you have some ideas, you should create an issue. Then, we will do it together.

MIT License Copyright (c) 2017 Cheng Zhang Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

elasticsearch-jieba-plugin 是 Jieba 中文分词插件 展开 收起
Java
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Java
1
https://gitee.com/mirrors/elasticsearch-jieba-plugin.git
git@gitee.com:mirrors/elasticsearch-jieba-plugin.git
mirrors
elasticsearch-jieba-plugin
elasticsearch-jieba-plugin
master

搜索帮助