类型 表示的数据类型 String string Whole number byte , short , integer , long Floating point float , double Boolean boolean Date date
curl -H 'Content-Type: application/json' -XPUT localhost:9200/website/blog/123?pretty -d '{"title": "My first blog entry","text": "I am starting to get the hang of this...","date": "2014/01/01"}'
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
curl -H 'Content-Type: application/json' -XPOST localhost:9200/website/blog?pretty -d '{"title": "My second blog entry","text": "I am using auto generatedId","date": "2014/01/03"}'
{
"_index" : "website",
"_type" : "blog",
"_id" : "QSP_XmMBCiEWwbhxyNxi",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
只有在相同的 _index
、 _type
和 _id
不存在时才接受我们的索引请求
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/website/blog/123?op_type=create&pretty' -d '{"title": "My first blog entry","text": "I am starting to get the hang of this...","date": "2014/01/02"}'
{
"error": {
"type": "version_conflict_engine_exception",
"reason": "[blog][123]: version conflict, document already exists (current version [2])",
"index_uuid": "5WSzbmd_Qs-irXSQpAoWbQ",
"shard": "0",
"index": "website"
},
"status": 409
}
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/website/blog/123/_create?pretty' -d '{"title": "My first blog entry","text": "I am starting to get the hang of this...","date": "2014/01/02"}'
结果跟上面的一样
GET /_all/tweet/_search?q=tweet:elasticsearch
GET /_search?q=%2Bname%3Ajohn+%2Btweet%3Amary
注:"+"前缀表示语句匹配条件必须被满足,"-"前缀表示条件必须不被满足,上时url编码编码后字符串
GET /_search?q=mary
curl -I http://localhost:9200/website/blog/123
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 192
curl http://localhost:9200/website/blog/123?pretty
{
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 2,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "I am starting to get the hang of this...",
"date" : "2014/01/02"
}
}
curl -H 'Content-Type:application/json' -XGET 'http://localhost:9200/website/blog/_search?pretty' -d '{"query":{"match":{"_id":"123"}}}'
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_score" : 1.0,
"_source" : {
"title" : "My first blog entry",
"text" : "I am starting to get the hang of this...",
"date" : "2014/01/02"
}
}
]
}
}
只查询_source部分
curl -H 'Content-Type:application/json' -XGET 'http://localhost:9200/website/blog/123/_source?pretty'
{
"title" : "My first blog entry",
"text" : "I am starting to get the hang of this...",
"date" : "2014/01/02"
}
检索文档的一部分
curl -H 'Content-Type:application/json' -XGET 'http://localhost:9200/website/blog/123/_source?_source=title,text&pretty=true'
{
"text" : "I am starting to get the hang of this...",
"title" : "My first blog entry"
}
curl -XDELETE http://localhost:9200/website/blog/123?pretty
{
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 3,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 3
}
curl -H 'Content-Type: application/json' -XPUT localhost:9200/website/blog/123?pretty -d '{"title": "My first blog entry","text": "I am starting to get the hang of this...","date": "2014/01/02"}'
{
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
创建doc
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/website/blog/1/_create?pretty' -d '{"title": "My first blog entry", "text": "Just trying this out..."}'
curl 'localhost:9200/website/blog/1?pretty'
{
"_index" : "website",
"_type" : "blog",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Just trying this out..."
}
}
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/website/blog/1?version=1&pretty' -d '{"title": "My first blog entry", "text": "Starting to get the hang of this..."}'
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 3
}
继续用version=1去更新时会返回错误,因为当前的version=2
{
"error": {
"type": "version_conflict_engine_exception",
"reason": "[blog][1]: version conflict, current version [2] is different than the one provided [1]",
"index_uuid": "5WSzbmd_Qs-irXSQpAoWbQ",
"shard": "3",
"index": "website"
},
"status": 409
}
version_type=external
不再检查_version 是否与请求中指定的一致,而是检查是否小于指定的版本。如果请求成功,外部版本号就会被存储。版
号必须是整数,大于零小于 9.2e+18 ——Java中的正的long
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/website/blog/2?version=5&version_type=external&pretty' -d '{"title": "My first blog entry", "text": "Starting to get the hang of this..."}'
{
"_index": "website",
"_type": "blog",
"_id": "2",
"_version": 5,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 3
}
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/blog/1/_update?pretty' -d '{ "doc":{"tags":["testing"],"views":0}}'
{
"_index": "website",
"_type": "blog",
"_id": "1",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 3
}
脚本能够使用updateAPI改变_source字段的内容,它在脚本内部以ctx._source表示。例如,我们可以使用脚本增加博客的views数量:
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/blog/1/_update?pretty' -d '{"script":"ctx._source.views+=1"}'
查看更新后的结果:
curl localhost:9200/website/blog/1?pretty
{
"_index" : "website",
"_type" : "blog",
"_id" : "1",
"_version" : 4,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Starting to get the hang of this...",
"views" : 1,
"tags" : [
"testing"
]
}
}
以使用脚本增加一个新标签到tags数组中,使用脚本参数:
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/blog/1/_update?pretty' -d '{"script":{"inline":"ctx._source.tags.add(params.new_tag)","params":{"new_tag":"search"}}}'
查看更新后的结果:
curl 'localhost:9200/website/blog/1?pretty'
{
"_index" : "website",
"_type" : "blog",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Starting to get the hang of this...",
"views" : 1,
"tags" : [
"testing",
"search"
]
}
}
通过设置ctx.op为delete我们可以根据内容删除文档
注:5.6.4、6.2.4版本测试不支持下面的操作
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/blog/1/_update?pretty' -d '{"script":{"inline":"ctx.op=ctx._source.views==params.count?'delete':'none'","params":{"count":1}}}'
在这种情况下,我们可以使用upsert参数定义文档来使其不存在时被创建。
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/pageviews/1/_update?pretty' -d ' {"script":"ctx._source.views+=1","upsert":{"views":1}}'
查看更新后的结果:
curl 'localhost:9200/website/pageviews/1?pretty'
{
"_index" : "website",
"_type" : "pageviews",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"views" : 1
}
}
默认值为0
对于多用户的局部更新,文档被修改了并不要紧。例如,两个进程都要增加页面浏览量,增加的顺序我们并不关心——如果冲突发生,我们唯一要做的仅仅是重新尝试更新既可。 这些可以通过retry_on_conflict参数设置重试次数来自动完成,这样update操作将会在发生错误前重试,下面的例子中retry_on_conflict=5
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/pageviews/1/_update?retry_on_conflict=5&pretty' -d ' {"script":"ctx._source.views+=1","upsert":{"views":0}}'
查看更新后的结果:
curl localhost:9200/website/pageviews/1?pretty
{
"_index" : "website",
"_type" : "pageviews",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"views" : 2
}
}
接受一个version参数
许你使用乐观并发控制(optimistic concurrency control)来指定你要更细文档的版本。
参数是一个docs数组,数组的每个节点定义一个文档的_index
、_type
、_id
元数据。如果你只想检索一个或几个确定的字段,也可以定义一个_source参数:
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/_mget?pretty' -d '{"docs":[{"_index":"website","_type":"blog","_id":2},{"_index":"website","_type":"pageviews","_id":1,"_source":"views"}]}'
响应体也包含一个docs数组,每个文档还包含一个响应,它们按照请求定义的顺序排列。每个这样的响应与单独使用get request响应体相同:
{
"docs" : [
{
"_index" : "website",
"_type" : "blog",
"_id" : "2",
"_version" : 5,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Starting to get the hang of this..."
}
},
{
"_index" : "website",
"_type" : "pageviews",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"views" : 0
}
}
]
}
如果你想检索的文档在同一个_index
中(甚至在同一个_type
中),你就可以在URL中定义 一个默认的/_index
或者/_index
/_type
。
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/blog/_mget?pretty' -d '{"docs":[{"_id":2},{"_type":"pageviews","_id":1}]}'
{
"docs" : [
{
"_index" : "website",
"_type" : "blog",
"_id" : "2",
"_version" : 5,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Starting to get the hang of this..."
}
},
{
"_index" : "website",
"_type" : "pageviews",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"views" : 0
}
}
]
}
如果所有文档具有相同_index
和_type
,你可以通过简单的ids数组来代替完整 的docs数组:
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/website/blog/_mget?pretty' -d '{"ids":["2","1"]}'
{
"docs" : [
{
"_index" : "website",
"_type" : "blog",
"_id" : "2",
"_version" : 5,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Starting to get the hang of this..."
}
},
{
"_index" : "website",
"_type" : "blog",
"_id" : "1",
"_version" : 2,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Starting to get the hang of this..."
}
}
]
}
bulk API允许我们使用单一请求来实现多个 文档的create、index、update或delete。这对索引类似于日志活动这样的数据流非常有 用,它们可以以成百上千的数据为一个批次按序进行索引
bulk请求体如下,它有一点不同寻常:
{ action: { metadata }}\n { request body }\n { action: { metadata }}\n { request body }\n ...
注意:
行为(action)必须是以下几种:
行为 | 解释 |
---|---|
create | 当文档不存在时创建之。详见《创建文档》 |
index | 创建新文档或替换已有文档。见《索引文档》和《更新文档》 |
update | 局部更新文档。见《局部更新》 |
delete | 删除一个文档。见《删除文档》 |
注:delete操作不需要请求体(requestbody),create、index、update操作需要请求体。
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"Myfirstblogpost"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"Mysecondblogpost"}
{"update":{"_index":"website","_type":"blog","_id":"123","_retry_on_conflict":"5"}}
{"doc":{"title":"Myupdatedblogpost"}}
注:在shell中输入时,每行尾使用enter键代替\n
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/_bulk?pretty' -d '
{"delete":{"_index":"website","_type":"blog","_id":"123"}}\n
{"create":{"_index":"website","_type":"blog","_id":"123"}}\n
{"title":"Myfirstblogpost"}\n
{"index":{"_index":"website","_type":"blog"}}\n
{"title":"Mysecondblogpost"}\n
{"update":{"_index":"website","_type":"blog","_id":"123","_retry_on_conflict":"5"}}\n
{"doc":{"title":"Myupdatedblogpost"}}\n'
{
"took" : 291,
"errors" : false,
"items" : [
{
"delete" : {
"found" : false,
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 404
}
},
{
"create" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 2,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true,
"status" : 201
}
},
{
"index" : {
"_index" : "website",
"_type" : "blog",
"_id" : "AWNsti9ewhD4-ecniVQP",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true,
"status" : 201
}
},
{
"update" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 200
}
}
]
}
有一个功能叫做聚合(aggregations),它允许你在数据上生成复杂的分析统计。它很像SQL 中的GROUP BY但是功能更强大。举个例子,让我们找到所有职员中最大的共同点(兴趣爱好)是什么:
GET /megacorp/employee/_search
{
"aggs": {
"all_interests": {
"terms": {"field": "interests"}
}
}
}
如果我们想知道所有姓"Smith"的人最大的共同点(兴趣爱好),我们只需要增加合适的语句既可:
GET /megacorp/employee/_search
{
"query": {
"match": {"last_name": "smith"}
},
"aggs": {
"all_interests": {
"terms": {"field": "interests"}
}
}
}
all_interests聚合已经变成只包含和查询语句相匹配的文档了:
...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "sports",
"doc_count": 1
}
]
}
}
size:结果数,默认 10 from:跳过开始的结果数,默认 0
如果你想每页显示5个结果,页码从1到3,那请求如下:
GET /_search?size=5
GET /_search?size=5&from=5
GET /_search?size=5&from=10
指定分析器解析
GET /_analyze?analyzer=standard&text=Text to analyze
GET /index/_mapping/type
{
"website" : {
"mappings" : {
"blog" : {
"properties" : {
"date" : {
"type" : "date",
"format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis"
},
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
{
"tag": {
"type": "string",
"index": "not_analyzed",
"analyzer": "english"
}
}
Elasticsearch支持以下简单字段类型:
类型 | 表示的数据类型 |
---|---|
String | string |
Whole number | short, integer, long |
Floating point | float ,double |
Boolean | boolean |
Date | date |
参数控制字符串以何种方式被索引。它包含以下三个值当中的一个:
值 | 解释 |
---|---|
analyzed | 首先分析这个字符串,然后索引。换言之,以全文形式索引此字段。 |
not_analyzed | 索引这个字段,使之可以被搜索,但是索引内容和指定值一样。不分析此字段。 |
no | 索引这个字段。这个字段不能为搜索到 |
对于 analyzed类型的字符串字段,使用 analyzer参数来指定哪一种分析器将在搜索和索引 的时候使用。默认的,Elasticsearch使用 standard
分析器,但是你可以通过指定一个内建的 分析器来更改它
例如 whitespace
、 simple
或 english
。
你可以向已有映射中增加字段,但你不能修改它。
curl -H 'Content-Type: application/json' -XPUT 'localhost:9200/gb?pretty' -d '{"mappings":{"tweet":{"properties":{"tweet":{"type":"string","analyzer":"english"},"date":{"type":"date"},"name":{"type":"string"},"user_id":{"type":"long"}}}}}'
下方两行代码是等价的
curl 'localhost:9200/_search?pretty' -d '{"query":{"match_all":{}}}'
curl 'localhost:9200/_search?pretty' -d '{}'
查询title字段中包含“my”
curl 'localhost:9200/_search?pretty' -d '{"query":{"match":{"title":"my"}}}'
multi_match 查询允许你做match查询的基础上同时搜索多个字段:
{
"multi_match": {
"query": "full text search",
"fields": ["title", "body"] }
}
查询子句就像是搭积木一样,可以合并简单的子句为一个复杂的查询语句。
例如,bool子句允许你合并其他的合法子 句,must,must_not或者should,如果可能的话:
{
"bool": {
"must": {"match": {"tweet": "elasticsearch"} },
"must_not": {"match": {"name": "mary"} },
"should": {"match": {"tweet": "fulltext"} }
}
}
term主要用于精确匹配哪些值,比如数字,日期,布尔值或 not_analyzed的字符串(未经分析的文本数据类型):
{"term":{"age":26}}
{"term":{"date":"2014-09-01"}}
{"term":{"public":true}}
{"term":{"tag":"full_text"}}
terms跟term有点类似,但terms允许指定多个匹配条件。如果某个字段指定了多个 值,那么文档需要一起去做匹配:
{
"terms": {
"tag": ["search", "full_text", "nosql"]
}
}
range过滤允许我们按照指定范围查找一批数据:
{
"range": {
"age": {
"gte": 20,
"lt": 30
}
}
}
范围操作符包含:
名称 | 解释 |
---|---|
gt | 大于 |
gte | 大于等于 |
lt | 小于 |
lte | 小于等于 |
exists和missing过滤可以用于查找文档中是否包含指定字段或没有某个字段,类似于SQL语句中的 IS_NULL条件
{
"exists": {"field": "title"}
}
bool过滤可以用来合并多个过滤条件查询结果的布尔逻辑,它包含一下操作符:
操作符 | 解释 |
---|---|
must | 多个查询条件的完全匹配,相当于and。 |
must_not | 多个查询条件的相反匹配,相当于not。 |
should | 至少有一个查询条件匹配,相当于 or。 |
这些参数可以分别继承一个过滤条件或者一个过滤条件的数组:
{
"bool": {
"must": {
"term": {"folder": "inbox"}
},
"must_not": {
"term": {"tag": "spam"}
},
"should": [
{"term": {"starred": true } },
{"term": {"unread": true } }
]
}
}
searchAPI中只能包含query语句,所以我们需要用filtered来同时包含"query"和 "filter"子句,在外层再加入query的上下文关系
{
"query":{
"filtered": {
"query": {"match": {"email": "businessopportunity"}},
"filter": {"term": {"folder": "inbox"} }
}
}
}
Elasticsearch 从 5.0 开始,为日志场景的用户提供了一个很不错的接口,叫 rollover。其作用是:当某个别名指向的实际索引过大的时候,自动将别名指向下一个实际索引。
创建一个开始滚动的起始索引:
curl -XPUT 'http://localhost:9200/logstash-2016.11.25-1' -d '{
"aliases":{
"logstash":{}
}
}'
然后就可以尝试发起 rollover 请求了:
curl -XPOST 'http://localhost:9200/logstash/_rollover' -d '{
"conditions":{
"max_age":"1d",
"max_docs":10000000,
"max_size":"500gb",
}
}'
上面的定义意思就是:当索引超过 1 天,或者索引内的数据量超过一千万条的时候,自动创建并指向下一个索引。
curl -XGET 'localhost:9200/_cat/indices?h=i&s=i:desc'
健康指标只显示index名:h=i
按索引名倒排序:s=i:desc
删除索引
curl -XGET 'localhost:9200/index_name'
为了使结果可以按照相关性进行排序,我们需要一个相关性的值。在ElasticSearch
的查询结果中,相关性分值会用_score
字段来给出一个浮点型的数值,所以默认情况下,结果集以 _score
进行倒序排列。
curl -XGET 'localhost:9200/_search' -d '{"query":{"filtered":{"filter":{"term":{"user_id":1}}}},"sort":{"date":{"order":"desc"}}}'
_score
和max_score
字段都为null。计算_score
是比较消耗性能的,而且通常主要用作排序--我们不是用相关性进行排序的时候,就不需要统计其相关性。如果你想强 制计算其相关性,可以设置track_scores
为true
。
注:字段值默认以顺序排列,而_score
默认以倒序排列。
curl -XGET 'localhost:9200/_search' -d '{"query":{"filtered":{"filter":{"term":{"user_id":1}}}},"sort":[{"date":{"order":"desc"}},{"location":{"order":"desc"}}]}'
GET /_search?sort=date:desc&sort=_score&q=search
你可以从多个值中取出一个来进行排序,你可以使用min,max,avg或sum这些模式。比说你可以在dates字段中用最早的日期来进行排序:
{"sort": {
"dates": {
"order": "asc",
"mode": "min"
}
} }
修改多值字段的mapping,新增的tweet.raw子字段索引方式是not_analyzed。
{"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}}
现在,在给数据重建索引后,我们既可以使用tweet字段进行全文本搜索,也可以用tweet.raw字段进行排序:
警告:对analyzed字段进行强制排序会消耗大量内存。详情请查阅《字段类型简介》相关内容。
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。