# elasticsearch-study **Repository Path**: wlyfree/elasticsearch-study ## Basic Information - **Project Name**: elasticsearch-study - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-12-12 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # `ElasticSearch` `Elastic Stack` - `LogStash`、`Beats` - 收集数据到`Elasticsearch`中 - `Elasticsearch` - 索引数据、搜索数据 - `kibana` - 友好客户端,图形化界面,和`es`交互搜索数据并展示 ## 特点 `json`格式存储 倒排索引 基于`lucene` 支持结构化查询(类似`SQL`)和全文检索 分布式 支持分片(主分片和副本分片),副本分片做热备 创建索引,主分片固定,副本分片任意数量。动态增删节点时,`es`会自动迁移分片平衡集群 灾备方案:跨集群复制`CCR(Cross-cluster replication)` 禁用`swap`:使用交换分区可能导致`GC`时间过长、节点响应缓慢、甚至断开与集群的连接。使用交换分区的时候,会影响到磁盘的`page`页,导致大量磁盘抖动。 必须保证有足够的虚拟内存(`virtual memory`):`ES`默认使用`mmapfs`目录来存储其索引。`Mmap`技术的操作系统限制可能太低,可能导致内存不足异常。 ```shell # 更新/etc/sysctl.conf的vm.max_map_count. ``` 必须保证`es`用户创建的线程数至少为4096个 ```shell # /etc/security/limits.conf nproc ``` ## 对比`mysql` | | elasticsearch | mysql | | ------ | ------------- | -------- | | 数据库 | index | database | | 表 | type | table | | 行 | document | row | | 字段 | field | column | ## 安装部署 ### 下载启动 ```shell # 下载 https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.4.2-linux-x86_64.tar.gz # 解压,进入bin目录 ... ``` ### 目录结构 ```shell [es@VM_0_9_centos elasticsearch-7.4.2]$ ll total 572 drwxr-xr-x 2 es root 4096 Oct 29 04:45 bin drwxr-xr-x 2 es root 4096 Nov 28 11:37 config drwxrwxr-x 3 es es 4096 Nov 28 11:38 data drwxr-xr-x 9 es root 4096 Oct 29 04:45 jdk drwxr-xr-x 3 es root 4096 Oct 29 04:45 lib -rw-r--r-- 1 es root 13675 Oct 29 04:38 LICENSE.txt drwxr-xr-x 2 es root 4096 Nov 28 11:37 logs drwxr-xr-x 37 es root 4096 Oct 29 04:45 modules -rw-r--r-- 1 es root 523209 Oct 29 04:45 NOTICE.txt drwxr-xr-x 2 es root 4096 Oct 29 04:45 plugins -rw-r--r-- 1 es root 8500 Oct 29 04:38 README.textile ``` ### 启动服务 ```shell # 使用命令行参数 # 启动服务 -d:以守护进程启动 -Ecluster.name=集群名称 -Enode.name=节点名称 [es@VM_0_9_centos bin]$ ./elasticsearch -d -Ecluster.name=my_cluster -Enode.name=node_1 # 使用配置文件参数 # 修改配置文件 [es@VM_0_9_centos elasticsearch-7.4.2]$ cat config/elasticsearch.yml # 默认仅允许本机访问,设置0.0.0.0不限制IP network.host: 0.0.0.0 # 集群名称 cluster.name: wly_cluster # 节点名称 node.name: wly_node1 # 初始化master节点 cluster.initial_master_nodes: ["wly_node1"] # 启动 [es@VM_0_9_centos bin]$ ./elasticsearch -d # 遇到的问题和解决方案 # 问题:不能以root身份启动 java.lang.RuntimeException: can not run elasticsearch as root # 解决:创建单独账号(如es),并赋予权限 ... # 问题 max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] # 解决:切换到root(es账号没权限),在/etc/sysctl.conf中添加一行 vm.max_map_count=262144 vim /etc/sysctl.conf vm.max_map_count=262144 # 问题 the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured # 解决:在${es}/config/elasticsearh.yml中指定初始化的master节点 vi config/elasticsearch.yml cluster.initial_master_nodes: ["wly_node1"] ``` ### 配置文件 ```shell # elasticsearch.yml:es配置文件 # [root@VM_0_9_centos config]# ll total 40 -rw-rw---- 1 es root 199 Nov 28 11:09 elasticsearch.keystore -rw-rw---- 1 es root 2944 Dec 2 19:47 elasticsearch.yml -rw-rw---- 1 es root 3593 Oct 29 04:38 jvm.options -rw-rw---- 1 es root 17545 Oct 29 04:45 log4j2.properties -rw-rw---- 1 es root 473 Oct 29 04:45 role_mapping.yml -rw-rw---- 1 es root 197 Oct 29 04:45 roles.yml -rw-rw---- 1 es root 0 Oct 29 04:45 users -rw-rw---- 1 es root 0 Oct 29 04:45 users_roles ``` ### 日志文件 ```shell # 查看日志文件 [es@VM_0_9_centos elasticsearch-7.4.2]$ ll logs | grep wly -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_audit.json -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_deprecation.json -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_deprecation.log -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_index_indexing_slowlog.json -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_index_indexing_slowlog.log -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_index_search_slowlog.json -rw-rw-r-- 1 es es 0 Dec 2 19:51 wly_cluster_index_search_slowlog.log -rw-rw-r-- 1 es es 16152 Dec 2 19:54 wly_cluster.log -rw-rw-r-- 1 es es 30200 Dec 2 19:54 wly_cluster_server.json ``` ### 校验服务是否正常运行 ```shell # 校验服务是否正常启动 [es@VM_0_9_centos elasticsearch-7.4.2]$ curl -X GET "localhost:9200/?pretty" { "name" : "wly_node1", "cluster_name" : "wly_cluster", "cluster_uuid" : "QAGA9z1zR_en9Y-AH6LItw", "version" : { "number" : "7.4.2", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "2f90bbf7b93631e52bafb59b3b049cb44ec25e96", "build_date" : "2019-10-28T20:40:44.881551Z", "build_snapshot" : false, "lucene_version" : "8.2.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" } ``` ### 查看集群的健康状况 >集群健康状况有3种 > >red:不是所有primary shard都是active状态,部分索引数据丢失了 > >yellow:所有primary shard都是active状态,部分replica shard不是active状态 > >green:所有primary shard、replica shard都是active状态 ```shell # 查看集群的健康状况 [es@VM_0_9_centos logs]$ curl -X GET "localhost:9200/_cat/health?v&pretty" epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1575288284 12:04:44 wly_cluster yellow 1 1 2 2 0 0 2 0 - 50.0% ``` ## 索引 ### 创建索引 ```shell # 创建索引 [es@VM_0_9_centos elasticsearch-7.4.2]$ curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'{"name":"John Doe"}' # 响应结果 { "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 } ``` ### 查看索引 ```shell # 查看索引 [es@VM_0_9_centos elasticsearch-7.4.2]$ curl -X GET "localhost:9200/customer/_doc/1?pretty" # 响应结果 { "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "John Doe" } } ``` ### 批量创建索引 ```shell # 批量创建索引 [es@VM_0_9_centos elasticsearch-7.4.2]$ curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@/home/es/accounts.json" ``` ### 查看索引情况 ```shell [es@VM_0_9_centos elasticsearch-7.4.2]$ curl "localhost:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open bank -fT_wpSfTiCgfXdC9XPTng 1 1 1000 0 428.8kb 428.8kb yellow open customer 1_PuHnNcSWq6sNuBNHGkgg 1 1 1 0 3.5kb 3.5kb ``` ## 搜索 ### 匹配所有、排序、分页 ```shell # match_all:查询所有 # sort:排序,asc:正序 # from:分页参数,从那条开始 size:分页参数,显示几条 # 将所有数据按照account_number正序排序后,从第10条开始,查询10条数据 curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ], "from": 10, "size": 10 } ' ``` ### 按单词匹配 ```shell # match:匹配单词,多个关键字用空格隔开(代表or) # 匹配address字段包含mill或lane的数据 curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match": { "address": "mill lane" } } } ' ``` ### 按短语匹配 ```shell # match_phrase:匹配短语 # 匹配address包含mill lane的数据 curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_phrase": { "address": "mill lane" } } } ' ``` ### 复杂查询 ```shell # bool:多个查询条件的复杂查询,需要使用bool关键字,bool需要结合must、should、must not查询子句使用 # must、should:影响评分,默认情况下es会返回按照相关性评分排序的数据 # must not:筛选器,过滤数据但不影响评分 # 查询年限为40岁 && state != ID的数据 curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } } ' ``` ### 范围过滤 ```shell # filter:过滤 range:区间操作 # 查询月为20000~30000的数据 curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } ' ``` ### 聚合数据 ```shell # 按照state分组 curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" } } } } ' ``` ```shell # curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } ' ``` ## 查询 `Query DSL(Domain Specific Language)` `Leaf query clauses`:叶子查询子句,普通匹配查询,如`match`、`term`、`range`等 `Compound query clasuses`:复合查询子句,如`bool`、`dis_max`、`const_score`等 `bool query` `boosting query` `constant_score query` `dis_max query` `function_score query`