在我之前的文章文章:
-
Elasticsearch:有用的 Elasticsearch 查詢示例
-
開始使用 Elasticsearch (2)
我列舉了很多關于 Elasticsearch 查詢的例子,抱著多多益善的想法,在今天的文章中,我給大家帶來更多的例子給大家練習,希望大家對 Elasticsearch 有更多的認識,
Elasticsearch 提供了一組強大的選項來查詢各種用例的檔案,因此了解將哪個查詢應用于特定案例很有用, 以下是一個動手教程,可幫助你利用 Elasticsearch 提供的最重要的查詢,
在本指南中,你將學習許多 帶有詳細解釋的流行查詢示例, 此處涵蓋的每個查詢將分為 2 種型別:
- 結構化查詢:用于檢索結構化資料的查詢,例如日期、數字、密碼等,
- 全文查詢:用于查詢純文本的查詢,
請注意:在本文章中, 我將使用最新的 Elastic Stack 7.16.3 發布來進行展示,有于文章比較長,所以分為兩個部分:
- Elasticsearch:Elasticsearch 查詢示例 - 動手練習(一)
- Elasticsearch:Elasticsearch 查詢示例 - 動手練習(二)
設定演示索引
讓我們首先使用一些示例資料創建一個新索引,以便你可以按照每個搜索示例進行操作,創建一個名為 “employees” 的索引:
PUT employees
為包含在攝入檔案中的欄位 (比如,date_of_birth) 定義映射(模式):
PUT employees/_mapping
{
"properties": {
"date_of_birth": {
"type": "date",
"format": "dd/MM/yyyy"
}
}
}
上面顯示,我們檔案的日期格式是 dd/MM/yyyy,
現在讓我們將一些檔案攝入到我們新創建的索引中,如下面的示例所示,使用 Elasticsearch 的 _bulk API:
POST _bulk
{"index":{"_index":"employees","_id":"1"}}
{"id":1,"name":"Huntlee Dargavel","email":"hdargavel0@japanpost.jp","gender":"male","ip_address":"58.11.89.193","date_of_birth":"11/09/1990","company":"Talane","position":"Research Associate","experience":7,"country":"China","phrase":"Multi-channelled coherent leverage","salary":180025}
{"index":{"_index":"employees","_id":"2"}}
{"id":2,"name":"Othilia Cathel","email":"ocathel1@senate.gov","gender":"female","ip_address":"3.164.153.228","date_of_birth":"22/07/1987","company":"Edgepulse","position":"Structural Engineer","experience":11,"country":"China","phrase":"Grass-roots heuristic help-desk","salary":193530}
{"index":{"_index":"employees","_id":"3"}}
{"id":3,"name":"Winston Waren","email":"wwaren2@4shared.com","gender":"male","ip_address":"202.37.210.94","date_of_birth":"10/11/1985","company":"Yozio","position":"Human Resources Manager","experience":12,"country":"China","phrase":"Versatile object-oriented emulation","salary":50616}
{"index":{"_index":"employees","_id":"4"}}
{"id":4,"name":"Alan Thomas","email":"athomas2@example.com","gender":"male","ip_address":"200.47.210.95","date_of_birth":"11/12/1985","company":"Yamaha","position":"Resources Manager","experience":12,"country":"China","phrase":"Emulation of roots heuristic coherent systems","salary":300000}
現在我們已經有了一個包含檔案的索引和一個指定的映射,我們已經準備好開始使用示例搜索了,為方便大家閱讀,我把其中的一個檔案的欄位列出來:
GET employees/_doc/1?filter_path=_source
{
"_source" : {
"id" : 1,
"name" : "Huntlee Dargavel",
"email" : "hdargavel0@japanpost.jp",
"gender" : "male",
"ip_address" : "58.11.89.193",
"date_of_birth" : "11/09/1990",
"company" : "Talane",
"position" : "Research Associate",
"experience" : 7,
"country" : "China",
"phrase" : "Multi-channelled coherent leverage",
"salary" : 180025
}
}
從上面的回傳資料中,我們可以看到包含在每個檔案中的欄位,整個索引我們只包含4個檔案,但是它足以讓我們了解各個搜索,較少的資料集可以讓我們看得更加清楚,
Match query
“match” 查詢 是 Elasticsearch 中最基本、最常用的查詢之一,起到全文查詢的作用, 我們可以使用這個查詢來搜索文本、數字或布林值,
讓我們在之前提取的檔案中搜索名為 “phrase” 的欄位中包含的單詞 “heuristic”,
POST employees/_search
{
"query": {
"match": {
"phrase": {
"query": "heuristic"
}
}
}
}
在我們索引中的 4 個檔案中,只有 2 個檔案中的 “phrase” 欄位中的包含 “heuristic”一詞:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6785375,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6785375,
"_source" : {
"id" : 2,
"name" : "Othilia Cathel",
"email" : "ocathel1@senate.gov",
"gender" : "female",
"ip_address" : "3.164.153.228",
"date_of_birth" : "22/07/1987",
"company" : "Edgepulse",
"position" : "Structural Engineer",
"experience" : 11,
"country" : "China",
"phrase" : "Grass-roots heuristic help-desk",
"salary" : 193530
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.62577873,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
如果我們要搜索多個單詞會發生什么? 使用我們剛剛執行的相同查詢,讓我們搜索 “heuristic roots help”:
POST employees/_search
{
"query": {
"match": {
"phrase": {
"query": "heuristic roots help"
}
}
}
}
這將回傳與以前相同的檔案,因為默認情況下,Elasticsearch 使用 OR 運算子處理搜索查詢中的每個單詞, 在我們的例子中,查詢將匹配任何包含 “heuristic” 或 “roots” 或 “help” 的檔案,
應用于多詞搜索的 OR 運算子是 match 的默認行為,但是我們可以使用與 “match” 查詢一起傳遞的 “operator” 引數來更改,我們可以使用 “OR” 或 “AND” 值指定 operator 引數,
讓我們看看當我們在之前執行的同一查詢中提供運算子引數 “AND” 時會發生什么,
POST employees/_search
{
"query": {
"match": {
"phrase": {
"query": "heuristic roots help",
"operator": "AND"
}
}
}
}
現在結果將只回傳一個檔案(檔案 id=2),因為這是在 “phrase” 欄位中同時包含所有三個搜索關鍵字的唯一檔案,
minimum_should_match
更進一步,我們可以為檔案必須包含的最小匹配詞設定一個閾值, 例如,如果我們將此引數設定為 1,則查詢將檢查至少有 1 個匹配詞的任何檔案,
現在,如果我們將 “minium_should_match” 引數設定為 3,那么所有三個單詞都必須出現在檔案中才能被歸類為匹配項,
在我們的例子中,以下查詢將僅回傳 1 個檔案(id=2),因為它是唯一符合我們條件的檔案:
POST employees/_search
{
"query": {
"match": {
"phrase": {
"query" : "heuristic roots help",
"minimum_should_match": 3
}
}
}
}
Mulit-Match Query
到目前為止,我們一直在處理單個欄位上的匹配項——也就是說,我們在名為 “phrase” 的單個欄位中搜索關鍵字,但是,如果我們需要在檔案的多個欄位中搜索關鍵字怎么辦? 這就是多匹配(multi-match)查詢發揮作用的地方,
讓我們嘗試在檔案中包含的 “position” 和 “phrase” 欄位中搜索關鍵字 “research help” 的示例,
POST employees/_search
{
"query": {
"multi_match": {
"query": "research help",
"fields": [
"position",
"phrase"
]
}
}
}
這將導致以下回應:
{
"took" : 24,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.2613049,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.2613049,
"_source" : {
"id" : 1,
"name" : "Huntlee Dargavel",
"email" : "hdargavel0@japanpost.jp",
"gender" : "male",
"ip_address" : "58.11.89.193",
"date_of_birth" : "11/09/1990",
"company" : "Talane",
"position" : "Research Associate",
"experience" : 7,
"country" : "China",
"phrase" : "Multi-channelled coherent leverage",
"salary" : 180025
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1785964,
"_source" : {
"id" : 2,
"name" : "Othilia Cathel",
"email" : "ocathel1@senate.gov",
"gender" : "female",
"ip_address" : "3.164.153.228",
"date_of_birth" : "22/07/1987",
"company" : "Edgepulse",
"position" : "Structural Engineer",
"experience" : 11,
"country" : "China",
"phrase" : "Grass-roots heuristic help-desk",
"salary" : 193530
}
}
]
}
}
從上面的結果中,我們可以看出來任何在 position 或 phrase 欄位包含 research 或者 help 的檔案都將被搜索到,
Match Phrase
Match_phrase 是另一種常用的查詢,正如其名稱所示,它匹配欄位中的短語,
如果我們需要在員工索引的 “phrase” 欄位中搜索短語 “roots heuristic coherent”,我們可以使用 “match_phrase” 查詢:
GET employees/_search
{
"query": {
"match_phrase": {
"phrase": {
"query": "roots heuristic coherent"
}
}
}
}
這將回傳具有確切短語 “roots heuristic coherent” 的檔案,包括單詞的順序, 在我們的例子中,我們只有一個符合上述條件的結果,如下面的回應所示:
{
"took" : 23,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.877336,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.877336,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
slop 引數
我們可以在 match_phrase 查詢中使用的一個有用功能是 “slop” 引數,它允許我們創建更靈活的搜索,
假設我們使用 match_phrase 查詢搜索 “roots coherent”, 我們不會收到從員工索引回傳的任何檔案, 這是因為要匹配 match_phrase,這些術語需要按照準確的順序排列,
現在,讓我們使用 slop 引數,看看會發生什么:
GET employees/_search
{
"query": {
"match_phrase": {
"phrase": {
"query": "roots coherent",
"slop": 1
}
}
}
}
當 slop=1 時,查詢表明可以移動一個單詞進行匹配,因此我們將收到以下回應, 在下面的回應中,你可以看到 “roots coherent” 與 “roots heuristic coherent” 檔案相匹配, 這是因為 slop 引數允許跳過 1 個術語,
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.7873249,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.7873249,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
Match Phrase Prefix
match_phrase_prefix 查詢類似于 match_phrase 查詢,但這里將搜索關鍵字的最后一個詞視為前綴,用于匹配以該前綴詞開頭的任何詞,
首先,讓我們在索引中插入一個檔案,以更好地理解 match_phrase_prefix 查詢:
PUT employees/_doc/5
{
"id": 4,
"name": "Jennifer Lawrence",
"email": "jlaw@example.com",
"gender": "female",
"ip_address": "100.37.110.59",
"date_of_birth": "17/05/1995",
"company": "Monsnto",
"position": "Resources Manager",
"experience": 10,
"country": "Germany",
"phrase": "Emulation of roots heuristic complete systems",
"salary": 300000
}
現在讓我們應用 match_phrase_prefix:
GET employees/_search
{
"_source": [
"phrase"
],
"query": {
"match_phrase_prefix": {
"phrase": {
"query": "roots heuristic co"
}
}
}
}
在下面的結果中,我們可以看到具有 coherent 和 complete 的檔案與查詢匹配, 我們還可以在 “match_phrase” 查詢中使用 slop 引數,
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 3.0871696,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 3.0871696,
"_source" : {
"phrase" : "Emulation of roots heuristic coherent systems"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "5",
"_score" : 3.0871696,
"_source" : {
"phrase" : "Emulation of roots heuristic complete systems"
}
}
]
}
}
注意:“match_phrase_query” 嘗試匹配最后提供的關鍵字(在我們的示例中為 co)的 50 個擴展(默認情況下),也就是說搜索到包含有 co 的50個結果, 這可以通過指定 “max_expansions” 引數來增加或減少,
GET employees/_search
{
"_source": [
"phrase"
],
"query": {
"match_phrase_prefix": {
"phrase": {
"query": "roots heuristic co",
"max_expansions": 1
}
}
}
}
比如,上面的查詢將只回傳一個檔案:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.805721,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.805721,
"_source" : {
"phrase" : "Emulation of roots heuristic coherent systems"
}
}
]
}
}
由于這個前綴屬性和 match_phrase_prefix 查詢的易于設定的屬性,它通常用于自動完成功能,
現在讓我們洗掉剛剛添加的 id=5 的檔案:
DELETE employees/_doc/5
Term 級查詢
術語級查詢用于查詢結構化資料,通常是精確值,
Term Query/Terms Query
這是最簡單的術語級別查詢, 此查詢針對檔案中的欄位搜索搜索關鍵字(keyword)的完全匹配,
例如,如果我們對 “gender” 欄位使用術語查詢來搜索 “Male” 這個詞,它會完全按照這個詞進行搜索,即使有大小寫也是如此,
這可以通過以下兩個查詢來證明:
GET employees/_search
{
"query": {
"term": {
"gender": {
"value": "female"
}
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.87546873,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.87546873,
"_source" : {
"id" : 2,
"name" : "Othilia Cathel",
"email" : "ocathel1@senate.gov",
"gender" : "female",
"ip_address" : "3.164.153.228",
"date_of_birth" : "22/07/1987",
"company" : "Edgepulse",
"position" : "Structural Engineer",
"experience" : 11,
"country" : "China",
"phrase" : "Grass-roots heuristic help-desk",
"salary" : 193530
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.87546873,
"_source" : {
"id" : 4,
"name" : "Jennifer Lawrence",
"email" : "jlaw@example.com",
"gender" : "female",
"ip_address" : "100.37.110.59",
"date_of_birth" : "17/05/1995",
"company" : "Monsnto",
"position" : "Resources Manager",
"experience" : 10,
"country" : "Germany",
"phrase" : "Emulation of roots heuristic complete systems",
"salary" : 300000
}
}
]
}
}
上面的搜索回傳有兩個結果,如果我們做如下的查詢:
GET employees/_search
{
"query": {
"term": {
"gender": {
"value": "Female"
}
}
}
}
在上面,我們把 female 修改為 Female,那么我們將搜索不到任何的檔案,在上述情況下,兩個查詢之間的唯一區別是搜索關鍵字的大小寫不同, 案例 1 全部為小寫,這是匹配的,因為這個欄位的值就是按照小寫保存的, 但是對于案例 2,搜索沒有得到任何結果,因為沒有針對帶有大寫 “F” 的 “gender” 欄位的此類 token,
我們還可以使用 terms query 傳遞多個要在同一欄位上搜索的術語, 讓我們在性別欄位中搜索 “female” 和 “male”, 為此,我們可以使用以下 terms query:
POST employees/_search
{
"query": {
"terms": {
"gender": [
"female",
"male"
]
}
}
}
上面的查詢將回傳所有的4個檔案,
Exists 查詢
有時會發生欄位沒有索引值,或者檔案中不存在該欄位, 在這種情況下,它有助于識別此類檔案并分析影響,
例如,讓我們將下面的檔案索引到 “employee” 索引
PUT employees/_doc/5
{
"id": 5,
"name": "Michael Bordon",
"email": "mbordon@example.com",
"gender": "male",
"ip_address": "10.47.210.65",
"date_of_birth": "12/12/1995",
"position": "Resources Manager",
"experience": 12,
"country": null,
"phrase": "Emulation of roots heuristic coherent systems",
"salary": 300000
}
此檔案沒有名為 “company” 的欄位,并且 “country” 欄位的值為 null,
現在,如果我們想查找欄位為 “company” 的檔案,我們可以使用如下的 exist 查詢:
GET employees/_search
{
"query": {
"exists": {
"field": "company"
}
}
}
上面的查詢將列出所有具有 “company” 欄位的檔案,上面查詢的結果將回傳4個檔案,并且它們都含有 company 欄位,而 id=5 的檔案不被搜索到,
也許更有用的解決方案是列出所有沒有 “company” 欄位的檔案, 這也可以通過使用如下的存在查詢來實作:
GET employees/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "company"
}
}
]
}
}
}
bool 查詢將在以下部分中詳細說明,上面的查詢將只回傳 id=5 的檔案:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"id" : 5,
"name" : "Michael Bordon",
"email" : "mbordon@example.com",
"gender" : "male",
"ip_address" : "10.47.210.65",
"date_of_birth" : "12/12/1995",
"position" : "Resources Manager",
"experience" : 12,
"country" : null,
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
讓我們從索引中洗掉現在插入的檔案,為了方便和統一,通過鍵入以下請求:
DELETE employees/_doc/5
Range queries
Elasticsearch 世界中另一個最常用的查詢是范圍查詢, 范圍查詢允許我們獲取包含指定范圍內的術語的檔案, 范圍查詢是術語級別的查詢(表示用于查詢結構化資料),可用于數值欄位、日期欄位等,
數值欄位的 range 查詢
例如,在我們創建的資料集中,如果我們需要過濾掉 experience 水平在 5 到 10 年之間的人,我們可以對其應用以下范圍查詢:
POST employees/_search
{
"query": {
"range": {
"experience": {
"gte": 5,
"lte": 10
}
}
}
}
什么是 gte、gt、lt 和 lt?
- gte 大于等于,gte: 5 表示大于等于5,其中包括5,greater than or equal to
- gt 大于,gt: 5 ,表示大于5,不包括5,greater than
- lte 小于或等于,lte: 5 ,表示小于等于5,其中包括5,less than or equal to
- lt 小于,less than
- gt: 5 ,表示小于5,不包括5, greater than
上面查詢的結果為:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "Huntlee Dargavel",
"email" : "hdargavel0@japanpost.jp",
"gender" : "male",
"ip_address" : "58.11.89.193",
"date_of_birth" : "11/09/1990",
"company" : "Talane",
"position" : "Research Associate",
"experience" : 7,
"country" : "China",
"phrase" : "Multi-channelled coherent leverage",
"salary" : 180025
}
}
]
}
}
也就是說 experience 在 5 和 10 之間的只有一個檔案,
日期欄位的 range query
同樣,范圍查詢也可以應用于日期欄位, 如果我們需要找出 1986 年之后出生的人,我們可以發出如下所示的查詢:
GET employees/_search
{
"query": {
"range" : {
"date_of_birth" : {
"gte" : "01/01/1986"
}
}
}
}
這將為我們獲取僅在 1986 年之后具有 date_of_birth 欄位的檔案,
Ids queries
ids 查詢是一個相對較少使用的查詢,但它是最有用的查詢之一,因此有資格在此串列中, 有時我們需要根據檔案的 ID 來檢索檔案, 這可以使用單個 get 請求來實作,如下所示:
GET indexname/_doc/documentId
如果一個 ID 只能獲取一個檔案,這可能是一個很好的解決方案,但是如果我們有更多檔案怎么辦?
這就是 ids 查詢非常方便的地方, 使用 Ids 查詢,我們可以在單個請求中完成此操作,
在下面的示例中,我們通過單個請求從 employees 索引中獲取 id 為 1 和 4 的檔案,
POST employees/_search
{
"query": {
"ids" : {
"values" : ["1", "4"]
}
}
}
上面查詢將回傳:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "Huntlee Dargavel",
"email" : "hdargavel0@japanpost.jp",
"gender" : "male",
"ip_address" : "58.11.89.193",
"date_of_birth" : "11/09/1990",
"company" : "Talane",
"position" : "Research Associate",
"experience" : 7,
"country" : "China",
"phrase" : "Multi-channelled coherent leverage",
"salary" : 180025
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
Prefix Queries
前綴查詢(Prefix query)用于獲取包含給定搜索字串作為指定欄位前綴的檔案,
假設我們需要在 “name” 欄位中獲取所有包含 “al” 作為前綴的檔案,那么我們可以使用前綴查詢如下:
GET employees/_search
{
"query": {
"prefix": {
"name": "al"
}
}
}
這將導致以下回應:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
由于前綴查詢是一個術語查詢,它將按原樣傳遞搜索字串, 那就是搜索 “al” 和 “Al” 是不同的, 如果在上面的示例中,我們搜索 “Al”,我們將得到 0 個結果,因為在 “name” 欄位的倒排索引中沒有以 “Al” 開頭的 token, 但是,如果我們查詢“name.keyword”欄位,使用 “Al” 我們將得到上述結果,在這種情況下,查詢 “al” 將導致零命中,
Wildcard quieries
這個也叫做通配符查詢(wildcard query),將獲取具有與給定通配符模式匹配的術語的檔案,例如,讓我們在欄位 “country” 上使用通配符查詢來搜索 “c*a”,如下所示:
GET employees/_search
{
"query": {
"wildcard": {
"country": {
"value": "c*a"
}
}
}
}
上面的查詢將獲取所有以 “c” 開頭并以 “a” 結尾的 “country” 名稱的檔案(例如:China、Canada、Cambodia 等),
這里 * 運算子可以匹配零個或多個字符,
Regexp
這個是正則查詢,這類似于我們上面看到的 “通配符” 查詢,但將接受正則運算式作為輸入并獲取匹配的檔案,
GET employees/_search
{
"query": {
"regexp": {
"position": "res[a-z]*"
}
}
}
上面的查詢將得到匹配正則運算式 res[a-z]* 的單詞的檔案,
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "Huntlee Dargavel",
"email" : "hdargavel0@japanpost.jp",
"gender" : "male",
"ip_address" : "58.11.89.193",
"date_of_birth" : "11/09/1990",
"company" : "Talane",
"position" : "Research Associate",
"experience" : 7,
"country" : "China",
"phrase" : "Multi-channelled coherent leverage",
"salary" : 180025
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"id" : 3,
"name" : "Winston Waren",
"email" : "wwaren2@4shared.com",
"gender" : "male",
"ip_address" : "202.37.210.94",
"date_of_birth" : "10/11/1985",
"company" : "Yozio",
"position" : "Human Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Versatile object-oriented emulation",
"salary" : 50616
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"id" : 4,
"name" : "Alan Thomas",
"email" : "athomas2@example.com",
"gender" : "male",
"ip_address" : "200.47.210.95",
"date_of_birth" : "11/12/1985",
"company" : "Yamaha",
"position" : "Resources Manager",
"experience" : 12,
"country" : "China",
"phrase" : "Emulation of roots heuristic coherent systems",
"salary" : 300000
}
}
]
}
}
Fuzzy
模糊查詢可用于回傳包含與搜索詞相似的詞的檔案, 這在處理拼寫錯誤時尤其有用,即使我們使用模糊查詢搜索 “Chnia” 而不是“China”,我們也可以獲得結果,
讓我們看一個例子:
GET employees/_search
{
"query": {
"fuzzy": {
"country": {
"value": "Chnia",
"fuzziness": "2"
}
}
}
}
這里的模糊度是匹配允許的最大編輯距離, 我們在 “match_phrase” 查詢中看到的 “max_expansions” 等引數也可以使用, 更多相關檔案可以在這里找到
模糊查詢也可以與 “match” 查詢型別一起出現, 以下示例顯示了在 multi_match 查詢中使用的模糊性:
POST employees/_search
{
"query": {
"multi_match": {
"query": "heursitic reserch",
"fields": [
"phrase",
"position"
],
"fuzziness": 2
}
},
"size": 10
}
盡管查詢中存在拼寫錯誤,上述查詢仍將回傳匹配 “heuristic” 或 “research” 的檔案,
Boosting
在查詢時,首先獲得更受歡迎的結果通常會有所幫助, 執行此操作的最簡單方法在 Elasticsearch 中稱為 boosting, 當我們查詢多個欄位時,這會派上用場, 例如,考慮以下查詢:
POST employees/_search
{
"query": {
"multi_match" : {
"query" : "versatile Engineer",
"fields": ["position^3", "phrase"]
}
}
}
這將回傳與 “position” 欄位匹配的檔案位于頂部的回應,而不是與“phrase”欄位匹配的檔案,

Sorting - 排序
默認排序
當搜索請求中沒有指定排序引數時,Elasticsearch 根據 “_score” 欄位的降序回傳檔案, 這個“_score”是根據使用 Elasticsearch 的默認評分方法的查詢匹配程度來計算的, 在我們上面討論的所有示例中,你可以在結果中看到相同的行為,
只有當我們使用 “filter” 背景關系時,才不會計算評分,以便更快地回傳結果,

如何根據欄位來進行排名
Elasticsearch 為我們提供了基于欄位進行排序的選項, 比如說,讓我們需要根據員工的經驗降序對員工進行排序, 我們可以使用啟用了排序選項的以下查詢來實作:
GET employees/_search
{
"_source": [
"name",
"experience",
"salary"
],
"sort": [
{
"experience": {
"order": "desc"
}
}
]
}
上述查詢的結果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "Winston Waren",
"experience" : 12,
"salary" : 50616
},
"sort" : [
12
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "Alan Thomas",
"experience" : 12,
"salary" : 300000
},
"sort" : [
12
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Othilia Cathel",
"experience" : 11,
"salary" : 193530
},
"sort" : [
11
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "Huntlee Dargavel",
"experience" : 7,
"salary" : 180025
},
"sort" : [
7
]
}
]
}
}
從上面的回應中可以看出,結果是根據員工體驗的降序排列的,此外,還有兩名員工,他們的經驗水平與 12 級相同,
如何根據多欄位來進行排名
在上面的示例中,我們看到有兩個員工的經驗等級相同,均為 12,但我們需要根據薪水的降序再次排序, 我們也可以提供多個欄位進行排序,如下面的查詢所示:
GET employees/_search?filter_path=**.hits
{
"_source": [
"name",
"experience",
"salary"
],
"sort": [
{
"experience": {
"order": "desc"
}
},
{
"salary": {
"order": "desc"
}
}
]
}
現在我們得到以下結果:
{
"hits" : {
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "Alan Thomas",
"experience" : 12,
"salary" : 300000
},
"sort" : [
12,
300000
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "Winston Waren",
"experience" : 12,
"salary" : 50616
},
"sort" : [
12,
50616
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Othilia Cathel",
"experience" : 11,
"salary" : 193530
},
"sort" : [
11,
193530
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "Huntlee Dargavel",
"experience" : 7,
"salary" : 180025
},
"sort" : [
7,
180025
]
}
]
}
}
在上面的結果中,你可以看到,在具有相同經驗級別的員工中,薪水最高的人在訂單中被提前了(Alan 和 Winston 的經驗級別相同,但與之前的搜索結果不同,這里 Alan 的排名被提升因為他的薪水更高),
注意:如果我們改變排序陣列中排序引數的順序,即先保留 “salary” 引數,然后保留 “experience” 引數,那么搜索結果也會發生變化, 結果將首先根據薪水引數進行排序,然后將考慮經驗引數,而不影響基于薪水的排序,
讓我們將上述查詢的排序順序顛倒一下,即先保留 “salary”,然后是 “experience”,如下所示:
GET employees/_search?filter_path=**.hits
{
"_source": [
"name",
"experience",
"salary"
],
"sort": [
{
"salary": {
"order": "desc"
}
},
{
"experience": {
"order": "desc"
}
}
]
}
結果如下:
{
"hits" : {
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : null,
"_source" : {
"name" : "Alan Thomas",
"experience" : 12,
"salary" : 300000
},
"sort" : [
300000,
12
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "2",
"_score" : null,
"_source" : {
"name" : "Othilia Cathel",
"experience" : 11,
"salary" : 193530
},
"sort" : [
193530,
11
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : null,
"_source" : {
"name" : "Huntlee Dargavel",
"experience" : 7,
"salary" : 180025
},
"sort" : [
180025,
7
]
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "3",
"_score" : null,
"_source" : {
"name" : "Winston Waren",
"experience" : 12,
"salary" : 50616
},
"sort" : [
50616,
12
]
}
]
}
}
你可以看到經驗值 12 的候選人低于經驗值 7 的候選人,因為后者的薪水高于前者,
繼續閱讀 “Elasticsearch:Elasticsearch 查詢示例 - 動手練習(二)”
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/423243.html
標籤:其他
下一篇:ES報錯Native controller process has stopped - no new native processes can be started
