1、簡介

Elasticsearch是一個開源的分布式、RESTful 風格的搜索和資料分析引擎，在大資料檢索的效率下，比mysql等傳統資料庫效率高幾十倍，淘寶，京東，首頁分類檢索底層都是用至少Elasticsearch級別資料搜索框架，

2、基本概念

2.1 索引（Index）就像mysql的資料庫

Elastic 會索引所有欄位，經過處理后寫入一個反向索引（Inverted Index），查找資料的時候，直接查找該索引，

所以，Elastic 資料管理的頂層單位就叫做 Index（索引），它是單個資料庫的同義詞，每個 Index （即資料庫）的名字必須是小寫，

索引做動詞，相當于mysql的insert
索引做名詞，相當于mysql的database

2.2 Type（型別）

在 Index（索引）中，可以定義一個或多個型別，

類似于 MySQL 的 Table，每一種類型的資料存放在一起，

但是：在Elasticsearch6.0之后，Type 型別被移除，

ElasticSearch7-去掉type概念：

關系型資料庫中兩個資料表示是獨立的，即使他們里面有相同名稱的列也不影響使用，但ES中不是這樣的，elasticsearch是基于Lucene開發的搜索引擎，而ES中不同type下名稱相同的filed最終在Lucene中的處理方式是一樣的，

兩個不同type下的兩個user_name，在ES同一個索引下其實被認為是同一個filed，你必須在兩個不同的type中定義相同的filed映射，否則，不同type中的相同欄位名稱就會在處理中出現沖突的情況，導致Lucene處理效率下降，
去掉type就是為了提高ES處理資料的效率，
Elasticsearch 7.x URL中的type引數為可選，比如，索引一個檔案不再要求提供檔案型別，
Elasticsearch 8.x 不再支持URL中的type引數，

解決辦法：

將索引從多型別遷移到單型別，每種型別檔案一個獨立索引
將已存在的索引下的型別資料，全部遷移到指定位置即可，詳見資料遷移

2.3 Document（檔案）

保存在某個 Index（索引）下，某種 Type（型別）的一個資料，Document（檔案）是JSON格式的，Document 就像是 MySQL 中某個 Table 里面每一行的資料，欄位就是Document里的屬性，

2.4 欄位（Fields）

每個Document都類似一個JSON結構，它包含了許多欄位，每個欄位都有其對應的值，多個欄位組成了一個 Document，可以類比關系型資料庫資料表中的欄位，
??在 Elasticsearch 中，檔案（Document）歸屬于一種型別（Type），而這些型別存在于索引（Index）中，下圖展示了Elasticsearch與傳統關系型資料庫的類比：

3、Elasticsearch-基本使用

3.1 安裝并運行Elasticsearch

網上找教程吧，還是挺麻煩的，

啟動成功后，Elasticsearch運行在本地的9200埠，在瀏覽器中輸入網址“http://localhost:9200/”，如果看到以下資訊就說明你的電腦已成功安裝Elasticsearch：

{
  "name" : "YTK8L4q",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "hB2CZPlvSJavhJxx85fUqQ",
  "version" : {
    "number" : "6.5.4",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "d2ef93d",
    "build_date" : "2018-12-17T21:17:40.758843Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

3.2 _cat 查看節點

（1） /_cat/nodes：查看所有節點

介面： GET http://192.168.163.131:9200/_cat/nodes

用PostMan 來模擬，訪問該介面

（2）/_cat/health：查看ES健康狀況

介面：GET http://192.168.163.131:9200/_cat/health

（3）/_cat/master：查看主節點資訊

介面：GET http://192.168.163.131:9200/_cat/master

（4） /_cat/indicies：查看所有索引

等價于 mysql 資料庫的 show databases;

介面：GET http://192.168.163.131:9200/_cat/indices

3.3 索引一個檔案（新增）

即保存一條資料，保存在哪個索引的哪個型別下，指定用哪個唯一標識，

（1）PUT 請求

介面：PUT http://192.168.163.131:9200/customer（索引）/external（type）/1（id）

（2）POST請求

介面：POST http://192.168.163.131:9200/customer/external/

案例就沒帶id，但是自動生成了id

（3）小結

PUT和POST：

POST新增，如果不指定id，會自動生成id，指定id就會修改這個資料，并新增版本號；
PUT可以新增也可以修改，PUT必須指定id；由于PUT需要指定id，我們一般用來做修改操作，不指定id會報錯，

3.4 查看檔案

/index/type/id

介面：GET http://192.168.163.131:9200/customer（索引）/external（type）/1（id）

解釋：

{
    "_index": "customer",  # 在哪個索引(庫)
    "_type": "external",   # 在哪個型別(表)
    "_id": "1",						 # 檔案id(記錄)
    "_version": 5,				 # 版本號
    "_seq_no": 4,					 # 并發控制欄位，每次更新都會+1，用來做樂觀鎖
    "_primary_term": 1,		 # 同上，主分片重新分配，如重啟，就會變化
    "found": true,
    "_source": {					 # 資料
        "name": "zhangsan"
    }
}

# 樂觀鎖更新時攜帶 ?_seq_no=0&_primary_term=1  當攜帶資料與實際值不匹配時更新失敗

3.5 更新檔案 /index/type/id/_update

介面：POST http://192.168.163.131:9200/customer/external/1/_update

幾種更新檔案的區別

在上面索引檔案即保存檔案的時候介紹，還有兩種更新檔案的方式：

當PUT請求帶id，且有該id資料存在時，會更新檔案；
當POST請求帶id，與PUT相同，該id資料已經存在時，會更新檔案；

這兩種請求類似，即帶id，且資料存在，就會執行更新操作，

類比：

請求體的報文格式不同，_update方式要修改的資料要包裹在 doc 鍵下
_update方式不會重復更新，資料已存在不會更新，版本號不會改變，另兩種方式會重復更新（覆寫原來資料），版本號會改變

這幾種方式在更新時都可以增加屬性，PUT請求帶id更新和POST請求帶id更新，會直接覆寫原來的資料，不會在原來的屬性里面新增屬性

3.6 洗掉檔案&索引

（1）洗掉檔案

介面：DELETE http://192.168.163.131:9200/customer/external/1

（2）洗掉索引

介面：DELETE http://192.168.163.131:9200/customer

4、 bulk-批量操作資料

語法格式：

{action:{metadata}}\n   // 例如index保存記錄，update更新
{request body  }\n  //檔案的內容了

{action:{metadata}}\n
{request body  }\n

4.1 指定索引和型別的批量操作

介面：POST /customer/external/_bulk

引數：

{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}

在Kibana中使用dev-tools測驗批量：

4.2 對所有索引執行批量操作

介面：POST /_bulk

引數：

{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog post"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}

這里的批量操作，當發生某一條執行發生失敗時，其他的資料仍然能夠接著執行，也就是說彼此之間是獨立的，
bulk api以此按順序執行所有的action（動作），如果一個單個的動作因任何原因失敗，它將繼續處理它后面剩余的動作，

當bulk api回傳時，它將提供每個動作的狀態（與發送的順序相同），所以您可以檢查是否一個指定的動作是否失敗了，

5、檢索案例

本節參考官方檔案檢索示例

（1）5.1 匯入樣本測驗資料

準備一份顧客銀行賬戶資訊的虛構的JSON檔案樣本，每個檔案都有下列的 schema（模式），

{
    "account_number": 1,
    "balance": 39225,
    "firstname": "Amber",
    "lastname": "Duke",
    "age": 32,
    "gender": "M",
    "address": "880 Holmes Lane",
    "employer": "Pyrami",
    "email": "amberduke@pyrami.com",
    "city": "Brogan",
    "state": "IL"
}

指令： POST并且沒帶id，就是新增如圖資料

POST bank/account/_bulk

（2）請求方式

下面的請求都是在Kibana dev-tools 操作

ES支持兩種基本方式檢索；

通過REST request uri 發送搜索引數（uri +檢索引數）；
通過REST request body 來發送它們（uri+請求體）；

GET bank/_search?q=*&sort=account_number:asc

# q=* 查詢所有

# sort=account_number:asc 按照account_number進行升序排列

_search：搜索bank索引中的所有檔案

GET /bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": "asc"
    }
  ]
}
# query 查詢條件
# sort 排序條件

（3）請求后收到的回應資料

{
  "took" : 7, #took – how long it took Elasticsearch to run the query, in milliseconds
  "timed_out" : false,# whether or not the search request timed out
  "_shards" : { #how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : { #how many matching documents were found
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null, #the score of the most relevant document found
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "0",
        "_score" : null, #the document’s relevance score (not applicable when using match_all)
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "bradshawmckenzie@euron.com",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [ #the document’s sort position (when not sorting by relevance score)
          0
        ]
      },
      ...
    ]
  }
}

回應結果說明：

Elasticsearch 默認會分頁回傳10條資料，不會一下回傳所有資料，

6、Query DSL

本小節參考官方檔案：Query DSL

Elasticsearch提供了一個可以執行查詢的Json風格的DSL，這個被稱為Query DSL，該查詢語言非常全面，

6.1 基本語法格式

查詢陳述句典型結構：

QUERY_NAME:{
   ARGUMENT:VALUE,
   ARGUMENT:VALUE,...
}

如果針對于某個欄位，那么它的結構如下：

{
  QUERY_NAME:{
     FIELD_NAME:{
       ARGUMENT:VALUE,
       ARGUMENT:VALUE,...
      }   
   }
}

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 5,
  "sort": [
    {
      "account_number": {
        "order": "desc"
      },
      "balance": {
      	"order": "asc"
      }
    }
  ]
}

# match_all 查詢型別【代表查詢所有的所有】，es中可以在query中組合非常多的查詢型別完成復雜查詢；
# from+size 限定，完成分頁功能；從第幾條資料開始，每頁有多少資料
# sort 排序，多欄位排序，會在前序欄位相等時后續欄位內部排序，否則以前序為準；

6.2 基礎示例

（1）請求

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 5,
  "sort": [
    {
      "account_number": {
        "order": "desc"
      }
    }
  ],
  "_source": ["balance","firstname"]
}

# _source 指定回傳結果中包含的欄位名

（2）回傳結果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "999",
        "_score" : null,
        "_source" : {
          "firstname" : "Dorothy",
          "balance" : 6087
        },
        "sort" : [
          999
        ]
      },
    	...
    ]
 	}
}

6.3 match-匹配查詢

（1）精確查詢-基本資料型別（非文本）

GET bank/_search
{
  "query": {
    "match": {
      "account_number": 20
    }
  }
}
# 查找匹配 account_number 為 20 的資料 非文本推薦使用 term

（2）模糊查詢-文本字串

GET bank/_search
{
  "query": {
    "match": {
      "address": "mill lane"
    }
  }
}
# 查找匹配 address 包含 mill 或 lane 的資料

match即全文檢索，對檢索欄位進行分詞匹配，會按照回應的評分 _score 排序，原理是倒排索引，

（3）精確匹配-文本字串（注意和上面的模糊查詢對比）

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "288 Mill Street"
    }
  }
}
# 查找 address 為 288 Mill Street 的資料，
# 這里的查找是精確查找，只有完全匹配時才會查找出存在的記錄，
# 如果想模糊查詢應該使用match_phrase 短語匹配

（4）match_phrase-短語匹配

將需要匹配的值當成一整個單詞（不分詞）進行檢索

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}
# 這里會檢索 address 匹配包含短語 mill lane 的資料

（5）multi_math-多欄位匹配

GET bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill",
      "fields": [
        "city",
        "address"
      ]
    }
  }
}
# 檢索 city 或 address 匹配包含 mill 的資料，會對查詢條件分詞

6.4 bool - 復合查詢

復合陳述句可以合并，任何其他查詢陳述句，包括復合陳述句，這也就意味著，復合陳述句之間

可以互相嵌套，可以表達非常復雜的邏輯，

must：必須達到must所列舉的所有條件
must_not，必須不匹配must_not所列舉的所有條件，

should，應該滿足should所列舉的條件，

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ]
    }
  }
}
# 查詢 gender 為 M 且 address 包含 mill 的資料

說明：

在boolean查詢中，must, should 和must_not 元素都被稱為查詢子句，檔案是否符合每個“must”或“should”子句中的標準，決定了檔案的“相關性得分”，得分越高，檔案越符合您的搜索條件，默認情況下，Elasticsearch 回傳根據這些相關性得分排序的檔案，（must得分會高于should）

“must_not”子句中的條件被視為“過濾器”， 它影響檔案是否包含在結果中，但不影響檔案的評分方式，還可以顯式地指定任意過濾器來包含或排除基于結構化資料的檔案，

6.5 filter-結果過濾

并不是所有的查詢都需要產生分數，特別是哪些僅用于filtering過濾的檔案，為了不計算分數，elasticsearch會自動檢查場景并且優化查詢的執行，

filter 對結果進行過濾，且不計算相關性得分，

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "filter": {
        "range": {
          "balance": {
            "gte": "10000",
            "lte": "20000"
          }
        }
      }
    }
  }
}
# 這里先是查詢所有匹配 address 包含 mill 的檔案，
# 然后再根據 10000<=balance<=20000 進行過濾查詢結果

6.6 term-精確檢索（一般用于非文本型別）

在上文（3）精確匹配-文本字串中有介紹對于非文本欄位的精確查詢，Elasticsearch 官方對于這種非文本欄位，使用 term來精確檢索是一個推薦的選擇，

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/query-dsl-term-query.html

GET bank/_search
{
  "query": {
    "term": {
      "age": "28"
    }
  }
}
# 查找 age 為 28 的資料

注意：避免使用 term 查詢文本欄位，文本欄位使用（3）精確匹配-文本字串 來查詢

6.7 Aggregation-執行聚合

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/search-aggregations.html

聚合語法：

GET /my-index-000001/_search
{
  "aggs":{
    "aggs_name":{ # 這次聚合的名字，方便展示在結果集中
        "AGG_TYPE":{ # 聚合的型別(avg,term,terms)
        }	
     }
	}
}

（1）搜索address中包含mill的所有人的年齡分布以及平均余額

GET bank/_search
{
  "query": {
    "match": {
      "address": "Mill"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      }
    },
    "ageAvg": {
      "avg": {
        "field": "age"
      }
    },
    "balanceAvg": {
      "avg": {
        "field": "balance"
      }
    }
  },
  "size": 0
}
# "ageAgg": {   				  --- 聚合名為 ageAgg
#   "terms": {				    --- 聚合型別為 term
#     "field": "age",     --- 聚合欄位為 age
#     "size": 10			    --- 取聚合后前十個資料
#   }
# },
# ------------------------
# "ageAvg": {   				  --- 聚合名為 ageAvg
#   "avg": {				      --- 聚合型別為 avg 求平均值
#     "field": "age"	    --- 聚合欄位為 age
#   }
# },
# ------------------------
# "balanceAvg": {				  --- 聚合名為 balanceAvg
#   "avg": {				      --- 聚合型別為 avg 求平均值
#     "field": "balance"  --- 聚合欄位為 balance
#   }
# }
# ------------------------
# "size": 0               --- 不顯示命中結果，只看聚合資訊

回傳結果：

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 38,
          "doc_count" : 2
        },
        {
          "key" : 28,
          "doc_count" : 1
        },
        {
          "key" : 32,
          "doc_count" : 1
        }
      ]
    },
    "ageAvg" : {
      "value" : 34.0
    },
    "balanceAvg" : {
      "value" : 25208.0
    }
  }
}

（2）按照年齡聚合，并且求這些年齡段的這些人的平均薪資

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 100
      },
      "aggs": {
        "ageAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}

回傳結果：

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "ageAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "ageAvg" : {
            "value" : 25269.583333333332
          }
        },
        {
          "key" : 26,
          "doc_count" : 59,
          "ageAvg" : {
            "value" : 23194.813559322032
          }
        },
        {
          "key" : 32,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 23951.346153846152
          }
        },
        {
          "key" : 35,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 22136.69230769231
          }
        },
        {
          "key" : 36,
          "doc_count" : 52,
          "ageAvg" : {
            "value" : 22174.71153846154
          }
        },
        {
          "key" : 22,
          "doc_count" : 51,
          "ageAvg" : {
            "value" : 24731.07843137255
          }
        },
        {
          "key" : 28,
          "doc_count" : 51,
          "ageAvg" : {
            "value" : 28273.882352941175
          }
        },
        {
          "key" : 33,
          "doc_count" : 50,
          "ageAvg" : {
            "value" : 25093.94
          }
        },
        {
          "key" : 34,
          "doc_count" : 49,
          "ageAvg" : {
            "value" : 26809.95918367347
          }
        },
        {
          "key" : 30,
          "doc_count" : 47,
          "ageAvg" : {
            "value" : 22841.106382978724
          }
        },
        {
          "key" : 21,
          "doc_count" : 46,
          "ageAvg" : {
            "value" : 26981.434782608696
          }
        },
        {
          "key" : 40,
          "doc_count" : 45,
          "ageAvg" : {
            "value" : 27183.17777777778
          }
        },
        {
          "key" : 20,
          "doc_count" : 44,
          "ageAvg" : {
            "value" : 27741.227272727272
          }
        },
        {
          "key" : 23,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27314.214285714286
          }
        },
        {
          "key" : 24,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 28519.04761904762
          }
        },
        {
          "key" : 25,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27445.214285714286
          }
        },
        {
          "key" : 37,
          "doc_count" : 42,
          "ageAvg" : {
            "value" : 27022.261904761905
          }
        },
        {
          "key" : 27,
          "doc_count" : 39,
          "ageAvg" : {
            "value" : 21471.871794871793
          }
        },
        {
          "key" : 38,
          "doc_count" : 39,
          "ageAvg" : {
            "value" : 26187.17948717949
          }
        },
        {
          "key" : 29,
          "doc_count" : 35,
          "ageAvg" : {
            "value" : 29483.14285714286
          }
        }
      ]
    }
  }
}

（3）查出所有年齡分布，并且這些年齡段中M的平均薪資和F的平均薪資以及這個年齡段的總體平均薪資

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 100
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "ageBalanceAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  },
  "size": 0
}
# "field": "gender.keyword" gender是txt沒法聚合 必須加.keyword精確替代

回傳結果：

{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "ageAgg" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 31,
          "doc_count" : 61,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "M",
                "doc_count" : 35,
                "balanceAvg" : {
                  "value" : 29565.628571428573
                }
              },
              {
                "key" : "F",
                "doc_count" : 26,
                "balanceAvg" : {
                  "value" : 26626.576923076922
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 28312.918032786885
          }
        },
        {
          "key" : 39,
          "doc_count" : 60,
          "genderAgg" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "F",
                "doc_count" : 38,
                "balanceAvg" : {
                  "value" : 26348.684210526317
                }
              },
              {
                "key" : "M",
                "doc_count" : 22,
                "balanceAvg" : {
                  "value" : 23405.68181818182
                }
              }
            ]
          },
          "ageBalanceAvg" : {
            "value" : 25269.583333333332
          }
        },
        ...
      ]
    }
  }
}

7、Elasticsearch-Mapping（映射）

官方檔案：https://www.elastic.co/guide/en/elasticsearch/reference/7.11/mapping.html

7.1 簡介

Maping是用來定義一個檔案（document），以及它所包含的屬性（欄位field）是如何存盤和索引的，

比如：使用maping來定義：

哪些字串屬性應該被看做全文本屬性（full text fields）；
哪些屬性包含數字，日期或地理位置；

檔案中的所有屬性是否都能被索引（all 配置）；
日期的格式；

自定義映射規則來執行動態添加屬性；

查看mapping資訊

GET bank/_mapping

{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

7.2 屬性型別

比如上文的keyword, 太多了，不用記憶，用的時候自行查官網手冊

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-types.html#_core_datatypes

7.3、映射操作

7.3.1 創建索引映射

創建索引并指定屬性的映射規則（相當于新建表并指定欄位和欄位型別）

PUT /my_index
{
  "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      },
      "email": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

回傳結果：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_index"
}

7.3.2 給已有映射增加欄位

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/explicit-mapping.html#add-field-mapping

PUT /my_index/_mapping
{
  "properties": {
    "employee-id": {
      "type": "keyword",
      "index": false
    }
  }
}

# 這里的 "index": false，表明新增的欄位不能被檢索，默認是true
# https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-index.html

回傳結果：

{
  "acknowledged" : true
}

7.3.3 查看映射

GET /my_index/_mapping
# 查看某一個欄位的映射
GET /my_index/_mapping/field/employee-id

回傳結果：

{
  "my_index" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "email" : {
          "type" : "keyword"
        },
        "employee-id" : {
          "type" : "keyword",
          "index" : false
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}
# index false 表示不能被索引找到

7.3.4 更新映射資料遷移

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/explicit-mapping.html#update-mapping

對于已經存在的欄位映射，我們不能更新，更新必須創建新的索引，進行資料遷移，

7.3.5 資料遷移

（1）無type資料遷移（Elasticsearch7 以后）

POST reindex [固定寫法]
{
  "source":{
      "index":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}

（2）有type資料遷移

POST reindex [固定寫法]
{
  "source":{
      "index":"twitter",
      "twitter":"twitter"
   },
  "dest":{
      "index":"new_twitters"
   }
}

（3）資料遷移實體

對于我們的測驗資料,是包含 type 的索引 bank，

現在我們創建新的索引 newbank 并修改一些欄位的型別來演示當需要更新映射時的資料遷移操作，

① 查看索引 bank 當前欄位映射型別

GET /bank/_mapping
# 結果
{
  "bank" : {
    "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

② 創建新索引 newbank 并修改欄位型別

PUT /newbank
{
  "mappings": {
    "properties": {
      "account_number": {
        "type": "long"
      },
      "address": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "balance": {
        "type": "long"
      },
      "city": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "employer": {
        "type": "keyword"
      },
      "firstname": {
        "type": "text"
      },
      "gender": {
        "type": "keyword"
      },
      "lastname": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "state": {
        "type": "keyword"
      }
    }
  }
}

③ 資料遷移

POST _reindex
{
  "source": {
    "index": "bank",
    "type": "account"
  },
  "dest": {
    "index": "newbank"
  }
}

回傳結果：

#! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
{
  "took" : 269,
  "timed_out" : false,
  "total" : 1000,
  "updated" : 0,
  "created" : 1000,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

④ 查看遷移后的資料

GET /newbank/_search
# 結果： 遷移后 type 統一為 _doc 移除 type
{
  "took" : 367,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "newbank",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 1,
          "balance" : 39225,
          "firstname" : "Amber",
          "lastname" : "Duke",
          "age" : 32,
          "gender" : "M",
          "address" : "880 Holmes Lane",
          "employer" : "Pyrami",
          "email" : "amberduke@pyrami.com",
          "city" : "Brogan",
          "state" : "IL"
        }
      },
      ...

8、Elasticsearch-分詞

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/analysis.html

8.1 簡介

一個tokenizer（分詞器）接收一個字符流，將之分割為獨立的tokens（詞元，通常是獨立的單詞），然后輸出tokens流，

例如：whitespace tokenizer遇到空白字符時分割文本，它會將文本“Quick brown fox!”分割為[Quick,brown,fox!]，

該tokenizer（分詞器）還負責記錄各個terms(詞條)的順序或position位置（用于phrase短語和word proximity詞近鄰查詢），以及term（詞條）所代表的原始word（單詞）的start（起始）和end（結束）的character offsets（字串偏移量）（用于高亮顯示搜索的內容），

elasticsearch提供了很多內置的分詞器，可以用來構建custom analyzers（自定義分詞器），

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

默認的分詞器一般都是針對于英文，對于中文我們需要安裝額外的分詞器來進行分詞，

8.2 IK分詞器

IK分詞器是一個非常好用的中文分詞器，下載，安裝，測驗程序就自行百度了，本文不再贅述

8.3 自定義擴展分詞庫（結合Nginx）

我們在 nginx 中自定義分詞檔案，通過配置 es 的 ik 組態檔來遠程呼叫 nginx 中的分詞檔案來實作自定義擴展詞庫，

注：默認 nginx 請求的是資料目錄的 html 靜態目錄

（1）step1：在nginx分詞檔案中（fenci.txt）自定義新的分詞內容

nginx 默認請求地址為 ip:port/fenci.txt；本機為：127.0.0.1:80/fenci.txt

如果想要增加新的詞語，只需要在該檔案追加新的行并保存新的詞語即可，

echo "加拿大電鰻" > /mydata/nginx/html/fenci.txt

（2）step2 給 es 配置自定義詞庫

打開并編輯 ik 插件組態檔

vim /mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml

修改為以下內容

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 擴展配置</comment>
        <!--用戶可以在這里配置自己的擴展字典 -->
        <entry key="ext_dict"></entry>
         <!--用戶可以在這里配置自己的擴展停止詞字典-->
        <entry key="ext_stopwords"></entry>
        <!--用戶可以在這里配置遠程擴展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <entry key="remote_ext_dict">http://192.168.163.131/fenci.txt</entry>
        <!--用戶可以在這里配置遠程擴展停止詞字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

（3）step3：重啟 elasticsearch 容器

docker restart elasticsearch

（4）step4：測驗自定義詞庫

GET my_index/_analyze
{
   "analyzer": "ik_max_word", 
   "text":"加拿大電鰻"
}

回傳結果：

{
  "tokens": [
    {
      "token" : "加拿大電鰻",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 0
    }
  ]
}

9、Elasticsearch—Java API（重點）

和Mysql一樣，Elasticsearch同樣提供了API來讓java程式員來快捷的操作Elasticsearch

9.1 Elasticsearch-Rest-Client 客戶端

官方RestClient，封裝了ES操作，API層次分明，上手簡單；

官方檔案：https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html

9.2 使用示例

9.2.1：添加依賴

 <dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.4.2</version>
</dependency>

特別注意有坑：

elasticsearch-rest-high-level-client 有個子依賴 elasticsearch，這個子依賴在springboot專案中被springboot鎖定了版本（具體鎖定的版本跟springboot版本有關），springboot比較強勢，該子依賴版本elasticsearch會跟著springboot走，而不是跟著elasticsearch-rest-high-level-client走，

比如在springboot 2.1.8中，elasticsearch 版本被鎖定為6.8.6 ，現在我們想讓elasticsearch版本和elasticsearch-rest-high-level-client 的7.4.2版本匹配，就需要做以下步驟：

（1）繼承的springboot （springboot是父pom 即parent）大多數情況是這種，

只需要添加合適的<properties>元素，瀏覽spring-boot-dependencies POM可以獲取一個全面的屬性串列，例如，想要選擇一個不同的elasticsearch版本，你可以添加以下內容：

<properties>
    <elasticsearch.version>7.4.2</elasticsearch.version>
</properties>

（2）傳遞來的springboot

如果你使用<scope>import</scope>，將spring-boot-dependencies添加到自己的dependencyManagement片段，那你必須自己重新定義artifact而不是覆寫屬性，

需要在 <dependencyManagement> 下重新定義 artifact

<dependency>
  <groupId>org.elasticsearch.client</groupId>
  <artifactId>elasticsearch-rest-high-level-client</artifactId>
  <version>${elasticsearch.version}</version>
</dependency>
<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch</artifactId>
  <version>${elasticsearch.version}</version>
</dependency>
<dependency>
  <groupId>org.elasticsearch.client</groupId>
  <artifactId>elasticsearch-rest-client</artifactId>
  <version>${elasticsearch.version}</version>
</dependency>

9.2.2 撰寫配置類

@Configuration
public class ElasticSearchConfig {
    /**
     * 1、匯入依賴
     * 2、撰寫配置，給容器中注入RestHighLevelClient
     *
     * 參考官方API完成操作：
     * https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-low-usage-requests.html
     */

    //通用設定項，設定的時候解開注釋
    public static final RequestOptions COMMON_OPTIONS;

    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
        // builder.addHeader("Authorization", "Bearer " + TOKEN);
        // builder.setHttpAsyncResponseConsumerFactory(
        //         new HttpAsyncResponseConsumerFactory
        //                 .HeapBufferedResponseConsumerFactory(30 * 1024 * 1024 * 1024));
        COMMON_OPTIONS = builder.build();
    }

    @Bean
    public RestHighLevelClient esRestClient() {
        //直接去官方檔案可以看到此配置
        return new RestHighLevelClient(
                RestClient.builder(  //指定es服務端的ip地址，埠號，協議名
                        new HttpHost("192.168.163.131", 9200, "http")));
    }
}

9.2.3 注入RestHighLevelClient

@RunWith(SpringRunner.class)
@SpringBootTest
class SearchApplicationTests {

    @Autowired
    RestHighLevelClient client;

//....
}

9.2.4 功能演示1：存盤資料（更新）

假裝我們已經封裝好了 java 物件SkuEsModel，該物件封裝了客戶需要大量多次查詢的資訊，現在將該物件轉成 ES可識別的Json，并傳入Es中，

SkuEsModel 如下：（假裝已經封裝了若干SkuEsModel物件）

@Data
public class SkuEsModel {

    private Long skuId;  //pms_sku_info資料表

    private Long spuId;  //pms_sku_info資料表

    private String skuTitle;  //pms_sku_info資料表

    private BigDecimal skuPrice;  //pms_sku_info資料表

    private String skuImg;  //pms_sku_info資料表

    private Long saleCount;  //pms_sku_info資料表

    /**
     * 是否有庫存
     */
    private Boolean hasStock; //wms_ware_sku

    /**
     * 熱度
     */
    private Long hotScore;

    private Long brandId; //pms_brand資料表  //pms_sku_info資料表

    private Long catalogId;

    private String brandName; //pms_brand資料表

    private String brandImg;

    private String catalogName;

    private List<Attrs> attrs;

    //pms_attr 資料表
    @Data
    public static class Attrs {

        private Long attrId;

        private String attrName;

        private String attrValue;
    }
}

轉換SkuEsModel 并存盤進Es的代碼

@Slf4j
@Service("productSaveService")
public class ProductSaveServiceImpl implements ProductSaveService {

    @Autowired
    private RestHighLevelClient restHighLevelClient; //用這個esclient來操作

    @Override
    public boolean productStatusUp(List<SkuEsModel> skuEsModels) throws IOException {
        //1.在es中建立索引，建立號映射關系（doc/json/product-mapping.json）

        //2. 在ES中保存這些資料，  資料已經組裝好了，只需要通過es客戶端操作這些資料保存進es（linux中）就好

        //bulkRequest是后面bulk方法的形參1，先構造出來并且設定好引數
        BulkRequest bulkRequest = new BulkRequest();
        for (SkuEsModel model : skuEsModels) { //for(型別 型別對應的形參 : 要遍歷的集合)
            //構造保存請求
            IndexRequest indexRequest = new IndexRequest(EsConstant.PRODUCT_INDEX); //EsConstant中設定好的索引傳入
            indexRequest.id(model.getSkuId().toString());//索引中，添加id  es只要字串，因此要轉成字串
            String s = JSON.toJSONString(model);//用json工具，把已經從資料庫中查好的skuEsModel物件轉換成json
            indexRequest.source(s, XContentType.JSON);
            bulkRequest.add(indexRequest);
        }
        //bulk方法，批量保存資料 形參bulk(BulkRequest bulkRequest, 配置類中的配置)
        //保存完成后，還會將bulk回傳，封裝了很多關于操作結果的方法，比如錯誤處理方法
        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, GuliMallElasticSearchConfig.COMMON_OPTIONS);

        //如果有錯誤，一般都沒得錯誤
        boolean b = bulk.hasFailures();

        List<String> collect = Arrays.stream(bulk.getItems()).map(item -> {
            return item.getId();
        }).collect(Collectors.toList());

        log.info("商品上架完成：{},回傳資料:{}", collect,bulk.toString());

        return b;
    }
}

總結：非常爽吧，前面學那么多，實際你只用配參索引和 id，其他全部都是自動化，就把javaBean轉成es的json格式，并存盤進去了，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/290816.html

標籤：其他

上一篇：Hadoop學習筆記（四）HDFS單機版Windows安裝

下一篇：Zookeeper 單機部署

Elastisearch 簡介 使用 Query DSL 映射 分詞 Elasticsearch-Rest-Client

1、簡介

2、基本概念

2.1 索引（Index） 就像mysql的資料庫

2.2 Type（型別）

2.3 Document（檔案）

2.4 欄位（Fields）

3、Elasticsearch-基本使用

3.1 安裝并運行Elasticsearch

3.2 _cat 查看節點

3.3 索引一個檔案（新增）

3.4 查看檔案

3.5 更新檔案 /index/type/id/_update

幾種更新檔案的區別

3.6 洗掉檔案&索引

4、 bulk-批量操作資料

4.1 指定索引和型別的批量操作

4.2 對所有索引執行批量操作

5、檢索案例

6、Query DSL

6.1 基本語法格式

6.2 基礎示例

6.3 match-匹配查詢

6.4 bool - 復合查詢

6.5 filter-結果過濾

6.6 term-精確檢索（一般用于非文本型別）

6.7 Aggregation-執行聚合

（2）按照年齡聚合，并且求這些年齡段的這些人的平均薪資

7、Elasticsearch-Mapping（映射）

7.1 簡介

7.2 屬性型別

7.3、 映射操作

7.3.1 創建索引映射

7.3.2 給已有映射增加欄位

7.3.3 查看映射

7.3.4 更新映射 資料遷移

7.3.5 資料遷移

8、Elasticsearch-分詞

8.1 簡介

8.2 IK分詞器

8.3 自定義擴展分詞庫（結合Nginx）

9、Elasticsearch—Java API（重點）

9.1 Elasticsearch-Rest-Client 客戶端

9.2 使用示例

9.2.1： 添加依賴

9.2.2 撰寫配置類

9.2.3 注入RestHighLevelClient

9.2.4 功能演示1：存盤資料（更新）

Elastisearch 簡介使用 Query DSL 映射分詞 Elasticsearch-Rest-Client

2.1 索引（Index）就像mysql的資料庫

7.3、映射操作

7.3.4 更新映射資料遷移

9.2.1：添加依賴