# introduction_es **Repository Path**: xiaolixi/introduction_es ## Basic Information - **Project Name**: introduction_es - **Description**: es入门 - **Primary Language**: Java - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-09-24 - **Last Updated**: 2024-10-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # introduction_es ![es的发展历史](./images/es的发展历史.png) ![elastic_stack](./images/elastic_stack.png) ![vs](./images/vs.png) - [blili视频学习](https://www.bilibili.com/video/BV1bg4y157Vf?p=7&spm_id_from=pageDriver&vd_source=26bb43d70f463acac2b0cce092be2eaa) - [db-engines数据库排行榜](https://db-engines.com/en/ranking/search+engine) - [elasticsearch官网文档](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html#buckets-path-syntax) - [elasticsearch的clients](https://www.elastic.co/guide/en/elasticsearch/client/index.html) ![ES_Clients_landing_page](./images/ES_Clients_landing_page.png) ## 概念 ![es的术语](./images/es_item.png) 1. 一个 索引 就是一个拥有几分相似特征的文档的集合。ES 将数据存储于一个或多个索引中,索引 就相当于 SQL 中的一个数据库。 2. Type(类型)是索引内部的逻辑分区(category/partition),然而其意义完全取决于用户需求。因此,一个索引内部可定义一个或多个类型(type)。一般来说,类型就是为那些拥有相同的域的文档做的预定义。类比传统的关系型数据库领域来说,类型 相当于 表,7.x 版本默认使用 _doc 作为 type 。 3. Document(文档)是 Lucene 索引和搜索的 原子单位,它是包含了一个或多个域的容器,基于 Json 格式进行表示。文档有一个或多个域组成,每个域拥有一个名字及一个或多个值,有多个值的域通常被称为 多值域,每个文档可以存储不同的域集,但同一类型下的文档至应该有某种程度上的相似之处。相当于 mysql 表中的 row 。 4. Field (字段)是相当于数据库中的 Column 5. Mapping(映射)是相当于数据库中的 Schema ## 安装 ![es的安装包目录](./images/es的安装包.png) ### 测试环境8.12.2版本docker-compose部署 ```shell (base)  ~/data/elasticsearch-kibana/ ll total 24 -rw-r--r-- 1 admin staff 686B 9 11 11:15 docker-compose.yml -rw-r--r-- 1 admin staff 959B 9 11 11:00 elasticsearch.yml -rw-r--r-- 1 admin staff 197B 9 11 11:02 kibana.yml (base)  ~/data/elasticsearch-kibana/ (base)  ~/data/elasticsearch-kibana/ (base)  ~/data/elasticsearch-kibana/ (base)  ~/data/elasticsearch-kibana/ (base)  ~/data/elasticsearch-kibana/ pwd /Users/admin/data/elasticsearch-kibana (base)  ~/data/elasticsearch-kibana/ (base)  ~/data/elasticsearch-kibana/ cat docker-compose.yml version: '3' services: elk_elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.12.2 #镜像 container_name: elk_elasticsearch #定义容器名称 environment: - "discovery.type=single-node" #以单一节点模式启动 volumes: - ./elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml ports: - 9200:9200 - 9300:9300 elk_kibana: image: kibana:8.12.2 container_name: elk_kibana ports: - 5601:5601 depends_on: - elk_elasticsearch volumes: - ./kibana.yml:/usr/share/kibana/config/kibana.yml environment: - "ELASTICSEARCH_HOSTS=http://elk_elasticsearch:9200" (base)  ~/data/elasticsearch-kibana/ cat elasticsearch.yml cluster.name: "docker-cluster" network.host: 0.0.0.0 #----------------------- BEGIN SECURITY AUTO CONFIGURATION ----------------------- # # The following settings, TLS certificates, and keys have been automatically # generated to configure Elasticsearch security features on 06-03-2024 02:11:12 # # -------------------------------------------------------------------------------- # Enable security features xpack.security.enabled: false xpack.security.enrollment.enabled: false # Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents xpack.security.http.ssl: enabled: false # keystore.path: certs/http.p12 # Enable encryption and mutual authentication between cluster nodes xpack.security.transport.ssl: enabled: false # verification_mode: certificate # keystore.path: certs/transport.p12 # truststore.path: certs/transport.p12 #----------------------- END SECURITY AUTO CONFIGURATION ------------------------- (base)  ~/data/elasticsearch-kibana/ cat kibana.yml server.host: "0.0.0.0" server.shutdownTimeout: "5s" elasticsearch.hosts: [ "http://elk_elasticsearch:9200" ] elasticsearch.username: "kibana_system" elasticsearch.password: "WGhf-wm6RIOFczaZjh+o" (base)  ~/data/elasticsearch-kibana/ (base)  ~/data/elasticsearch-kibana/ curl http://localhost:9200/ { "name" : "594bb9ef6d01", "cluster_name" : "docker-cluster", "cluster_uuid" : "YFVXW_mhROmDeaJjMNiSNw", "version" : { "number" : "8.12.2", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "48a287ab9497e852de30327444b0809e55d46466", "build_date" : "2024-02-19T10:04:32.774273190Z", "build_snapshot" : false, "lucene_version" : "9.9.2", "minimum_wire_compatibility_version" : "7.17.0", "minimum_index_compatibility_version" : "7.0.0" }, "tagline" : "You Know, for Search" } (base)  ~/data/elasticsearch-kibana/ ``` ### 测试环境7.16.3版本docker部署 ```shell docker pull elasticsearch:7.16.3 docker pull kibana:7.16.3 ``` ```shell mkdir elasticsearch chmod -R 770 elasticsearch cd elasticsearch mkdir config data plugins cat > config/elasticsearch.yml << EOF http.host: 0.0.0.0 EOF docker rm -f elasticsearch docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -e ES_JAVA_OPTS="-Xms64m -Xmx512m" \ -v /Users/admin/data/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /Users/admin/data/elasticsearch/data:/usr/share/elasticsearch/data \ -v /Users/admin/data/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ -d elasticsearch:7.16.3 [23:20:14] xiaoyu:elasticsearch $ curl localhost:9200 { "name" : "45c00d5d62d7", "cluster_name" : "elasticsearch", "cluster_uuid" : "3SShF4zxTfCDroMjWNJ5DQ", "version" : { "number" : "7.16.3", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "4e6e4eab2297e949ec994e688dad46290d018022", "build_date" : "2022-01-06T23:43:02.825887787Z", "build_snapshot" : false, "lucene_version" : "8.10.1", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" } ``` ```shell mkdir -p /Users/admin/data/kibana docker rm -f kibana docker run --name kibana --link elasticsearch:elasticsearch \ # -v /Users/admin/data/kibana/config:/usr/share/kibana/config \ -e ELASTICSEARCH_HOSTS=http://elasticsearch:9200 \ -p 5601:5601 -d kibana:7.16.3 ``` ### 测试环境7.16.3版本docker-compose部署 ```shell (base)  ~/data/es_7/ ll total 8 -rw-r--r--@ 1 admin staff 534B 10 9 14:07 docker-compose.yml drwxrwx--- 5 admin staff 160B 10 9 13:58 elasticsearch (base)  ~/data/es_7/ ll elasticsearch/ (base)  ~/data/es_7/ tree . . ├── docker-compose.yml └── elasticsearch ├── config │ └── elasticsearch.yml ├── data └── plugins 5 directories, 2 files (base)  ~/data/es_7/ cat elasticsearch/config/elasticsearch.yml http.host: 0.0.0.0 (base)  ~/data/es_7/ (base)  ~/data/es_7/ cat elasticsearch/config/elasticsearch.yml http.host: 0.0.0.0 (base)  ~/data/es_7/ cat docker-compose.yml version: "3" services: elasticsearch_7: image: elasticsearch:7.16.3 container_name: elasticsearch_7 environment: - "discovery.type=single-node" volumes: - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml ports: - 9200:9200 - 9300:9300 kibana_7: image: kibana:7.16.3 container_name: kibana_7 ports: - 5601:5601 depends_on: - elasticsearch_7 environment: - "ELASTICSEARCH_HOSTS=http://elasticsearch_7:9200" (base)  ~/data/es_7/ (base)  ~/data/es_7/ docker-compose up -d [+] Running 2/2 ✔ Container elasticsearch_7 Started 0.4s ✔ Container kibana_7 Started 0.6s (base)  ~/data/es_7/ ``` ### 客户端 > 注意安装插件访问的时候可能需要允许跨站访问的设置 > #允许 CORS 请求来自 https://app.elasticvue.com > http.cors.enabled: true > #http.cors.allow-origin: "https://app.elasticvue.com" > http.cors.allow-origin: "*" 1. kibana 2. [elasticvue客户端](https://elasticvue.com/) 看上去支持客户端、浏览器插件、docker、在线web app这么多种类型。但是DSL查询语句没有提示。。。那就和kibana相差很多了 ![elasticVUE](./images/elasticVUE.png) ![elasticVUE_search](./images/elasticVUE_search.png) - 使用浏览器插件 ![elasticsearch_vue_chrome_plugin](./images/elasticsearch_vue_chrome_plugin.png) 3. [head客户端](https://github.com/mobz/elasticsearch-head/?tab=readme-ov-file) ![elastic_head_plugin](./images/elastic_head_plugin.png) ## 架构 ### 节点以及节点类型 1. Master Node:主节点,主要是负责一些索引的创建、删除、决定分片要分配到哪个结点、维护整个集群的更新等 2. Master eligible nodes:可以参与选举的合格节点,当主节点挂了该节点就可以参与选举,该节点也是Master主结点的一个从结点 3. Data Node:专门用于存放数据的节点,如索引插入的数据就是存放在这个节点中。这些节点对CPU、内存和IO的要求较高 由master主结点负责如何将分片分发到数据节点上,通过数据节点解决数据的水平扩展和解决数据的单点问题。 4. Coordinating Node:协调节点,用于接收和响应请求,完成数据的接收和分发 ### 分片和副本 创建索引的时候,可以设置分片的数量以及设置副本的数量。 ```json PUT /zhs_db { "settings": { "number_of_shards": 3, //设置分片数,主要是用于解决数据水平扩展的问题 "number_of_replicas": 1 //设置副本数,用于解决数据高可用的问题 } } ``` 分片数和节点数有关,如有一个三个节点组成的集群,那么设三个分片,那么会根据默认的hash算法,一个节点中就会有一个分片。 ### 集群健康状态 可以通过命令 _cluster/health 查看集群的健康状态,如下: - Green 主分片与副本分片都正常 - Yellow 主分片正常,副本分片不正常 - Red 有主分片不正常,可能某个分片容量超过了磁盘大小等 ### es的架构 ![es的4个关键结构](./images/es的4个关键结构.jpg) #### 读写流程 1. 写流程 - 协调节点 - 主分片 - 副本 - 返回主节点 - 返回到协调节点 - 2. 读流程 - search phase - fetch phase ## 分词 ```json POST /_analyze { "text": "你好JAva啦", "analyzer": "standard" } POST /_analyze { "text": "你好JAva啦", "analyzer": "english" } { "tokens": [ { "token": "你", "start_offset": 0, "end_offset": 1, "type": "", "position": 0 }, { "token": "好", "start_offset": 1, "end_offset": 2, "type": "", "position": 1 }, { "token": "java", "start_offset": 2, "end_offset": 6, "type": "", "position": 2 }, { "token": "啦", "start_offset": 6, "end_offset": 7, "type": "", "position": 3 } ] } ``` 中文默认是一个字一个字的拆,显然这是错的。 中文分词使用ik分词器 ik_smart:最少切分,切分的词语最最大长度的 ik_max_word:最细切分,尽量包含最小粒度的词和最大粒度的词,都包含。 ### ik分词器的安装 [github地址 https://github.com/infinilabs/analysis-ik](https://github.com/infinilabs/analysis-ik) [下载对应版本的ik](https://release.infinilabs.com/analysis-ik/stable/) ![ik_download.png](images/ik_download.png) 解压复制到`/usr/share/elasticsearch/plugins/ik`,可以看到目录权限和所属用户和用户组都不对。 ```shell root@c3a25b9bf2e6:/usr/share/elasticsearch/plugins# ll total 16 drwxrwxr-x 1 elasticsearch root 4096 Oct 14 01:35 ./ drwxrwxr-x 1 root root 4096 Sep 10 07:00 ../ d-wx------ 3 501 dialout 4096 Oct 14 01:33 ik/ root@c3a25b9bf2e6:/usr/share/elasticsearch/plugins# ``` 修改所属用户和用户组`chown -R elasticsearch:root ik` ```shell d-wx------ 3 elasticsearch root 4096 Oct 14 01:33 ik/ root@c3a25b9bf2e6:/usr/share/elasticsearch/plugins# ll ik/ total 1556 d-wx------ 3 elasticsearch root 4096 Oct 14 01:33 ./ drwxrwxr-x 1 elasticsearch root 4096 Oct 14 01:35 ../ -rw-r--r-- 1 elasticsearch root 335042 Mar 28 2024 commons-codec-1.11.jar -rw-r--r-- 1 elasticsearch root 61829 Mar 28 2024 commons-logging-1.2.jar drwxr-xr-x 2 elasticsearch root 4096 Mar 12 2024 config/ -rw-r--r-- 1 elasticsearch root 8171 Mar 28 2024 elasticsearch-analysis-ik-8.12.2.jar -rw-r--r-- 1 elasticsearch root 780321 Mar 28 2024 httpclient-4.5.13.jar -rw-r--r-- 1 elasticsearch root 328593 Mar 28 2024 httpcore-4.4.13.jar -rw-r--r-- 1 elasticsearch root 47931 Mar 28 2024 ik-core-1.0.jar -rw-r--r-- 1 elasticsearch root 1802 Mar 28 2024 plugin-descriptor.properties -rw-r--r-- 1 elasticsearch root 125 Mar 28 2024 plugin-security.policy root@c3a25b9bf2e6:/usr/share/elasticsearch/plugins# root@c3a25b9bf2e6:/usr/share/elasticsearch/plugins# ``` 如果不修改目录的权限就重启es会出现下面的错误 ```shell 2024-10-14 09:49:45 CompileCommand: exclude org/apache/lucene/util/MSBRadixSorter.computeCommonPrefixLengthAndBuildHistogram bool exclude = true 2024-10-14 09:49:45 CompileCommand: exclude org/apache/lucene/util/RadixSelector.computeCommonPrefixLengthAndBuildHistogram bool exclude = true 2024-10-14 09:49:45 Oct 14, 2024 1:49:45 AM sun.util.locale.provider.LocaleProviderAdapter 2024-10-14 09:49:45 WARNING: COMPAT locale provider will be removed in a future release 2024-10-14 09:49:46 {"@timestamp":"2024-10-14T01:49:45.998Z", "log.level": "INFO", "message":"Java vector incubator API enabled; uses preferredBitSize=128; FMA enabled", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.apache.lucene.internal.vectorization.PanamaVectorizationProvider","elasticsearch.node.name":"c3a25b9bf2e6","elasticsearch.cluster.name":"docker-cluster"} 2024-10-14 09:49:46 {"@timestamp":"2024-10-14T01:49:46.041Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"c3a25b9bf2e6","elasticsearch.cluster.name":"docker-cluster","error.type":"java.nio.file.AccessDeniedException","error.message":"/usr/share/elasticsearch/plugins/ik","error.stack_trace":"java.nio.file.AccessDeniedException: /usr/share/elasticsearch/plugins/ik\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)\n\tat java.base/sun.nio.fs.UnixException.asIOException(UnixException.java:115)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:477)\n\tat java.base/java.nio.file.Files.newDirectoryStream(Files.java:549)\n\tat org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.PolicyUtil.readPolicyInfo(PolicyUtil.java:319)\n\tat org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.PolicyUtil.getPluginPolicyInfo(PolicyUtil.java:379)\n\tat org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Security.getPluginAndModulePermissions(Security.java:165)\n\tat org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Security.configure(Security.java:126)\n\tat org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Elasticsearch.initPhase2(Elasticsearch.java:198)\n\tat org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:72)\n"} 2024-10-14 09:49:46 ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log 2024-10-14 09:49:46 2024-10-14 09:49:46 ERROR: Elasticsearch exited unexpectedly, with exit code 1 ``` ![ik_error.png](images/ik_error.png) 修改目录权限`chmod 777 /usr/share/elasticsearch/plugins/ik` 重启es发现成功 ```shell 2024-10-14 09:50:14 {"@timestamp":"2024-10-14T01:50:14.985Z", "log.level": "INFO", "message":"loaded module [x-pack-eql]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.plugins.PluginsService","elasticsearch.node.name":"c3a25b9bf2e6","elasticsearch.cluster.name":"docker-cluster"} 2024-10-14 09:50:14 {"@timestamp":"2024-10-14T01:50:14.985Z", "log.level": "INFO", "message":"loaded plugin [analysis-ik]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.plugins.PluginsService","elasticsearch.node.name":"c3a25b9bf2e6","elasticsearch.cluster.name":"docker-cluster"} 2024-10-14 09:50:15 {"@timestamp":"2024-10-14T01:50:15.364Z", "log.level": "INFO", "message":"using [1] data paths, mounts [[/ (overlay)]], net usable_space [33.5gb], net total_space [58.3gb], types [overlay]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.env.NodeEnvironment","elasticsearch.node.name":"c3a25b9bf2e6","elasticsearch.cluster.name":"docker-cluster"} ``` ```json POST /_analyze { "text": "你好JAva啦,不许百嫖", "analyzer": "ik_smart" } { "tokens": [ { "token": "你好", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, { "token": "java", "start_offset": 2, "end_offset": 6, "type": "ENGLISH", "position": 1 }, { "token": "啦", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 2 }, { "token": "不许", "start_offset": 8, "end_offset": 10, "type": "CN_WORD", "position": 3 }, { "token": "百", "start_offset": 10, "end_offset": 11, "type": "TYPE_CNUM", "position": 4 }, { "token": "嫖", "start_offset": 11, "end_offset": 12, "type": "CN_CHAR", "position": 5 } ] } ``` #### 搜索示例 ```json PUT /index { "mappings": { "properties": { "content_ik": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "content": { "type": "text" } } } } ``` ```json PUT /index/_doc/1 { "content": "美国留给伊拉克的是个烂摊子吗", "content_ik": "美国留给伊拉克的是个烂摊子吗" } ``` ```json GET /index/_search { "query": { "match": { "content_ik": "伊拉克" } }, "highlight": { "pre_tags": [ "

" ], "post_tags": [ "

" ], "fields": { "content_ik": {} } } } { "took": 4, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.2876821, "hits": [ { "_index": "index", "_id": "1", "_score": 0.2876821, "_source": { "content": "美国留给伊拉克的是个烂摊子吗", "content_ik": "美国留给伊拉克的是个烂摊子吗" }, "highlight": { "content_ik": [ "美国留给

伊拉克

的是个烂摊子吗" ] } } ] } } ``` ```json GET /index/_search { "query": { "match": { "content": "伊拉克" } }, "highlight": { "pre_tags": [ "

" ], "post_tags": [ "

" ], "fields": { "content": {} } } } { "took": 9, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.8630463, "hits": [ { "_index": "index", "_id": "1", "_score": 0.8630463, "_source": { "content": "美国留给伊拉克的是个烂摊子吗", "content_ik": "美国留给伊拉克的是个烂摊子吗" }, "highlight": { "content": [ "美国留给

的是个烂摊子吗" ] } } ] } } ``` ### 添加自定义词组 在配置文件中添加自定义的词和排除的词 Config file `IKAnalyzer.cfg.xml` can be located at `{conf}/analysis-ik/config/IKAnalyzer.cfg.xml` or `{plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml` ```xml custom/mydict.dic;custom/single_word_low_freq.dic custom/ext_stopword.dic location http://xxx.com/xxx.dic ``` ## 基础API ### 查看版本 `GET /` ```json GET / { "name": "c3a25b9bf2e6", "cluster_name": "docker-cluster", "cluster_uuid": "a3jKvnL_SBeLrdeYuq6fuA", "version": { "number": "8.12.2", # es的版本 "build_flavor": "default", "build_type": "docker", "build_hash": "48a287ab9497e852de30327444b0809e55d46466", "build_date": "2024-02-19T10:04:32.774273190Z", "build_snapshot": false, "lucene_version": "9.9.2", "minimum_wire_compatibility_version": "7.17.0", "minimum_index_compatibility_version": "7.0.0" }, "tagline": "You Know, for Search" } ``` ### ES查询结果中各个字段意思 ```json GET /tom_es/_search { "took": 1, # 整个搜索请求耗费了多少毫秒 "timed_out": false, # 查询是否超时。默认情况下,搜索请求不会超时。如果低响应时间比完成结果更重要,你可以指定 timeout 为 10 或者 10ms(10毫秒),或者 1s(1秒):GET /_search?timeout=10ms "_shards": { # _shards 部分告诉我们在查询中参与分片的总数,以及这些分片成功了多少个失败了多少个 "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5, # 文档总数 "relation": "eq" }, "max_score": 1, # 最大得分 "hits": [ { "_index": "tom_es", "_id": "1", "_score": 1, # 文档与查询的匹配程度 "_source": { "name": "Go", "age": 30, "tags": [ "1234" ] } }, ... ] } } ``` ### `/_cat` 获取ElasticSearch的当前的很多信息 >(1)GET /_cat/nodes:查看所有节点 >(2)GET /_cat/health:查看es健康状况 >(3)GET /_cat/master:查看主节点 >(4)GET /_cat/indices:查看所有索引 ,等价于mysql数据库的show databases; ```json GET /_cat =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/tasks /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/thread_pool/{thread_pools} /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} /_cat/nodeattrs /_cat/repositories /_cat/snapshots/{repository} /_cat/templates /_cat/component_templates/_cat/ml/anomaly_detectors /_cat/ml/anomaly_detectors/{job_id} /_cat/ml/datafeeds /_cat/ml/datafeeds/{datafeed_id} /_cat/ml/trained_models /_cat/ml/trained_models/{model_id} /_cat/ml/data_frame/analytics /_cat/ml/data_frame/analytics/{id} /_cat/transforms /_cat/transforms/{transform_id} ``` ### `GET /_cat/indices?v`查询所有的索引 ```json GET /_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size yellow open my_index_01 4ipSBo0qQ86Kc1Y43zJUlQ 1 1 1 0 4.7kb 4.7kb 4.7kb yellow open product 0WgR0t-JQpObO0D8s7EuAA 1 1 3 1 18.4kb 18.4kb 18.4kb yellow open my_index ItWutcoSTKm2c4Li1t7CBQ 1 1 302 2 445.5kb 445.5kb 445.5kb yellow open tom_es Q3YFPC1YR2uaiPD2VPn5Lw 1 1 5 0 22.6kb 22.6kb 22.6kb yellow open my-index-000001 o7g1BDgSRaaR1ooxR3LnKg 1 1 5 0 16.3kb 16.3kb 16.3kb yellow open tom_ees Pxn1fXDNTraUD-XoZykc2A 1 1 0 0 249b 249b 249b ``` ## DSL query ### 查询语句语法 GET /index_name/_search ```json { "query": { "查询类型": { "查询条件": "条件值" } } } ``` ### 查询类型分类 - 查询所有 - match_all ```json GET /tom_es/_search { "query": { "match_all": { } } } ``` - 全文查询,可查询字段中所有分词 - match 查询 - multi_match 查询 ```json GET /tom_es/_search { "query": { "match": { "age": "21" } } } GET /tom_es/_search { "query": { "multi_match": { "query": "Go", "fields": ["name", "tags"] } } } ``` - 精确匹配/模糊匹配 - ids - term - range ```json GET /tom_es/_search { "query": { "ids": { "values": ["1", "2","3"] } } } GET /tom_es/_search { "query": { "term": { "age": { "value": "21" } } } } GET /tom_es/_search { "query": { "range": { "age": { "gte": 10, "lte": 200 } } }, "sort": [ { "age": { "order": "asc" } } ] } GET /product/_search { "query": { "range": { "startTime": { "gt": "2024-10-05 06:40:40", "lte": "now" } } } } ``` - terms / terms_set 多值查询,也就是数组查询。terms是精确包含,terms_set是最多包含 ```json GET /tom_es/_search { "query": { "terms": { "tags.keyword": [ # 注意大小写 "Go" ] } } } GET /tom_es/_search { "query": { "terms_set": { "tags": { "terms": ["go", "饭"], # 注意大小写 "minimum_should_match": 1 # 最小匹配一个 } } } } ``` - fuzzy 模糊查询 ```json GET /tom_es/_search { "query": { "fuzzy": { "name": { "value": "lmo", "fuzziness": "2", # 最多编辑次数 "transpositions": false # 例如ad -> da这种转变是否算数,false表示不计算这种变换。 } } } } ``` - exists 文档是否存在字段 ```json GET /tom_es/_search { "query": { "exists": { "field": "age" } } } ``` - prefix/wildcard/regexp 前缀/通配符/正则 - 注意prefix前缀是整个匹配,但是对于es来说由于倒排索引的原因,因此前缀匹配可能匹配不上,通常是使用keyword进行匹配 - wildcard通配符的*是>=0,?是一个 - regexp需要.表示任意,*表示>=0 ```json GET /tom_es/_search { "query": { "prefix": { "name.keyword": { "value": "GoJ" } } } } GET /tom_es/_search { "query": { "wildcard": { "name.keyword": { "value": "*o" } } } } GET /tom_es/_search { "query": { "regexp": { "tags": "1.*" } } } ``` - match_phrase 查询是一种精确匹配短语的查询类型,它通常用于需要确保特定单词按照指定顺序连续出现的场景。与简单的match查询相比,match_phrase更加严格,适合对短语进行精确匹配的需求。 ```json PUT /my-index-000001/_doc/1 { "message": "some arrays in this document...", "tags": [ "elasticsearch", "wow" ], "lists": [ { "name": "prog_list", "description": "programming list" }, { "name": "cool_list", "description": "cool stuff list" } ] } GET /my-index-000001/_search { "query": { "match_phrase": { "message": "some arrays" } } } { "took": 2, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": null, "hits": [] } } GET /my-index-000001/_search { "query": { "match": { "message": "some in" } } } { "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 0.5753642, "hits": [ { "_index": "my-index-000001", "_id": "1", "_score": 0.5753642, "_source": { "message": "some arrays in this document...", "tags": [ "elasticsearch", "wow" ], "lists": [ { "name": "prog_list", "description": "programming list" }, { "name": "cool_list", "description": "cool stuff list" } ] } } ] } } ``` - 查询高亮 ```json GET /tom_es/_search { "query": { "match": { "name": "tom" } }, "highlight": { "pre_tags": "

", # 前缀 "post_tags": "

", # 后缀 "fields": { "name": {} # 高亮的fields } }, "from": 0, "size": 1 } { "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1.0892314, "hits": [ { "_index": "tom_es", "_id": "3", "_score": 1.0892314, "_source": { "id": 123, "name": "tom lixi", "age": 30, "tags": [ "合肥", "Go", "Java" ] }, "highlight": { "name": [ "

tom

lixi" ] } } ] } } ``` - 地理查询 - geo_distance - geo_bounding_box - 复合查询 - bool - 搜索上下文 must/should - 过滤上下文 filter/must_not ```json GET /tom_es/_search { "query": { "bool": { "must": [ { "terms": { "tags": [ "go", "java" ] } }, { "range": { "age": { "gte": 10, "lte": 50 } } }, { "regexp": { "name": ".*om.*" } } ] } } } GET /tom_es/_search { "query": { "bool": { "should": [ { "exists": { "field": "age" } }, { "range": { "age": { "gte": 22 } } }, { "terms": { "tags.keyword": [ "1234", "Java" ] } }, { "wildcard": { "name": { "value": "*to*" } } } ], "minimum_should_match": 4 } } } GET /tom_es/_search { "query": { "bool": { "filter": [ { "term": { "age": "21" } }, { "prefix": { "name.keyword": "G" } } ] } } } GET /tom_es/_search { "query": { "bool": { "must_not": [ { "term": { "age": { "value": "21" } } }, { "prefix": { "name": { "value": "go" } } } ] } } } ``` - function_score ### 地理空间 ### 向量搜索 es提供一个向量搜索的能力,可以搜索和es库中最相似的向量。 所谓的最相似是按照一定的算法计算距离,从而计算出最相似的向量。 > 例如有一些图片,现在要把这些图片分类。 > 我们可以提取图片的特征值存到es当中,然后使用es的knn搜索出最相似的那个向量,用这个向量的分类作为图片的分类 ```json PUT /my_index { "mappings": { "properties": { "my_vector": { "type": "dense_vector", "dims": 64 }, "name": { "type": "keyword" }, "type": { "type": "keyword" } } } } GET /my_index/_search { "query": { "script_score": { "script": { "source": "cosineSimilarity(params.query_vector, 'my_vector') + 1.0", "params": { "query_vector": [ -0.052465736865997314, 0.06686755269765854, -0.0197349451482296, 0.12447277456521988, 0.1684354841709137, 0.06995005160570145, 0.02436789870262146, -0.04119948297739029, 0.04229716211557388, -0.05350141227245331, 0.02346373163163662, 0.14289121329784393, -0.08461371064186096, -0.01573389209806919, -0.1451040506362915, 0.15288014709949493, 0.027316728606820107, -0.05092369019985199, -0.23252595961093903, 0.011131656356155872, -0.2181379199028015, 0.22075271606445312, -0.3232687711715698, -0.1525573581457138, 0.11929531395435333, -0.1342940479516983, -0.12722180783748627, -0.1243152990937233, -0.061551421880722046, -0.18154969811439514, 0.06307222694158554, -0.1082453653216362, 0.04693552479147911, -0.009701441042125225, -0.040614038705825806, -0.12689073383808136, 0.04515693336725235, 0.2432403564453125, 0.0675167664885521, -0.0683896541595459, 0.05995598062872887, -0.05492861941456795, 0.048851460218429565, 0.04870501905679703, -0.077973373234272, 0.15658307075500488, -0.11271629482507706, 0.09099866449832916, -0.09567387402057648, 0.11415769904851913, 0.22527021169662476, -0.021581288427114487, 0.004927851725369692, 0.15819551050662994, -0.015539818443357944, -0.13765941560268402, 0.35521090030670166, -0.12759709358215332, 0.12144555151462555, -0.04522712156176567, -0.08986946195363998, -0.012624382972717285, 0.09026962518692017, 0.13853789865970612 ] } }, "query": { "match_all": {} } } }, "size": 1 } { "took": 101, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 302, "relation": "eq" }, "max_score": 1.9820285, "hits": [ { "_index": "my_index", "_id": "26", "_score": 1.9820285, "_source": { "type": "mouth", "name": "db1dabc311e04a7eb2a9815ffb8d7c2c.jpg", "my_vector": [ -0.04869825392961502, 0.05390739440917969, -0.02175898104906082, 0.09262216091156006, 0.16843010485172272, 0.12381266057491302, 0.02636176347732544, -0.03789805248379707, 0.009996248409152031, -0.06244434416294098, 0.02190050669014454, 0.136209636926651, -0.06564811617136002, 0.005759200546890497, -0.13960932195186615, 0.1501823514699936, -0.014341948553919792, -0.06037351116538048, -0.2035580277442932, 0.04085583984851837, -0.21931546926498413, 0.2009453922510147, -0.30170938372612, -0.16312004625797272, 0.1384287029504776, -0.08882473409175873, -0.15987977385520935, -0.13311447203159332, -0.06797291338443756, -0.2223595827817917, 0.0797848105430603, -0.11950202286243439, 0.038658056408166885, 0.009725899435579777, -0.014337616972625256, -0.12699589133262634, 0.08310988545417786, 0.25570443272590637, 0.09609447419643402, -0.0361400730907917, 0.06898024678230286, -0.068844273686409, 0.051702797412872314, 0.05947507917881012, -0.060713134706020355, 0.19760574400424957, -0.1156400516629219, 0.08879996836185455, -0.07864942401647568, 0.08734862506389618, 0.22903987765312195, -0.06818096339702606, -0.004230738151818514, 0.16004931926727295, -0.03717225417494774, -0.17137299478054047, 0.3070693016052246, -0.0994674488902092, 0.14519992470741272, -0.04367852583527565, -0.05290397256612778, -0.009954518638551235, 0.13943342864513397, 0.12258586287498474 ] } } ] } } ``` ## 聚合 用于对索引中的文档进行统计分析 ### 语法 ```json { "aggs": { "aggs_custome_name": { "aggs_type": { "field": "XXX" } } } } ``` ### 分类 - 指标聚合 - 类似SQL的min、count、max ```json POST /product/_search { "aggs": { "avg_age_": { "avg": { "field": "price" # 计算price的最小值 } }, "max_age_": { "max": { "field": "price" } } }, "size": 0 } POST /tom_es/_search { "aggs": { "stats_age_": { "stats": { "field": "age" # 统计age【count/min/max/avg/sum】 } } }, "size": 0 } POST /tom_es/_search { "aggs": { "age_distincation": { "cardinality": { "field": "name.keyword" # 去重个数 } } }, "size": 0 } GET /tom_es/_search { "aggs": { "count": { "value_count": { "field": "name.keyword" # 字段个数 } } }, "size": 0 } ``` - 桶聚合 - 类似group by,进行分组。把文档按照规则聚合到一个桶之中,支持嵌套 ```json GET /tom_es/_search { "aggs": { "age_bucket": { "terms": { "field": "age" }, "aggs": { "name_bucket": { "terms": { "field": "name.keyword" } } } } }, "size": 0 } { "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 5, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "age_bucket": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 30, "doc_count": 3, "name_bucket": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Go", "doc_count": 1 }, { "key": "GoJava", "doc_count": 1 }, { "key": "tom lixi", "doc_count": 1 } ] } }, { "key": 21, "doc_count": 2, "name_bucket": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "GoGo", "doc_count": 2 } ] } } ] } } } ``` - 管道聚合 - 管道 计算avg价格最大的产品 ```json GET /product/_search { "aggs": { "name_bucket": { "terms": { "field": "name.keyword" }, "aggs": { "avg_price": { "max": { "field": "price" } } } }, "max_price": { "max_bucket": { # 管道类型 "buckets_path": "name_bucket>avg_price" } } }, "size": 0 } { "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 3, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "name_bucket": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "2", "doc_count": 1, "avg_price": { "value": 2.244999885559082 } }, { "key": "3", "doc_count": 1, "avg_price": { "value": 3.244999885559082 } }, { "key": "tom", "doc_count": 1, "avg_price": { "value": 1.2345000505447388 } } ] }, "max_price": { "value": 3.244999885559082, "keys": [ "3" ] } } } ``` ### 使用场景 - 电商 - 社交 - 金融 - 物联网 ## 相关性 文档和查询语句的匹配程度的度量