elasticsearch系統學習筆記9-聚合分析 Aggregations
- 概念
- 分類
- 指標聚合
- 資料準備
- max 統計最大值
- min 統計最小值
- value_count 統計檔案數量
- cardinality 基數統計(統計去重后的檔案數量)
- avg 計算平均值
- sum 計算總和
- stats 基本統計
- extended_stats 高級統計
- percentiles 百分位統計
- 桶聚合
- terms 分組聚合
- filter 過濾器聚合
- filters 多過濾器聚合
- missing 空值聚合
- 組合使用案例1
概念
- 桶(Buckets)
- 滿足特定條件的檔案的集合;(類似 SQL 中的 group by)
- 指標(Metrics)
- 對桶內的檔案進行統計計算;(類似 SQL 中的統計函式 COUNT() 、 SUM() 、 MAX() 等等)
分類
聚合分析的功能主要有:
- 指標聚合
- 桶聚合
- 管道聚合
- 矩陣聚合
指標聚合
對一組資料進行統計,例如:求最大值、最小值、計算總數、求平均值、求和等等;
類似 SQL 中的 max、min、count、avg、sum 等統計函式;
資料準備
PUT /books
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "float"
},
"type": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
POST /books/_doc/_bulk
{"index":{"_id":1}}
{"name":"C語言編程","price":23.5,"type":"c"}
{"index":{"_id":2}}
{"name":"資料結構與演算法","price":34.5,"type":"ideas"}
{"index":{"_id":3}}
{"name":"計算機組成原理","price":34.5,"type":"Computer"}
{"index":{"_id":4}}
{"name":"計算機網路","price":32.5,"type":"Computer"}
{"index":{"_id":5}}
{"name":"計算機作業系統","price":44.5,"type":"Computer"}
{"index":{"_id":6}}
{"name":"Java 編程","price":13.5,"type":"java"}
{"index":{"_id":7}}
{"name":"資料庫原理","price":36.0,"type":"Database"}
{"index":{"_id":8}}
{"name":"ElasticSearch搜索引擎","price":34.8,"type":"search_engine"}
{"index":{"_id":9}}
{"name":"Lucene 原理","price":29.8,"type":"search_engine"}
{"index":{"_id":10}}
{"name":"JVM 技術","price":34.8,"type":"java"}
{"index":{"_id":11}}
{"name":"設計模式","price":27.8,"type":"ideas"}
max 統計最大值
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"max": {
"field": "price"
}
}
}
}
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"value": 44.5
}
}
}
min 統計最小值
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"min": {
"field": "price"
}
}
}
}
value_count 統計檔案數量
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"value_count": {
"field": "price"
}
}
}
}
cardinality 基數統計(統計去重后的檔案數量)
類似 SQL 中的 select count(distinct price) from books
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"cardinality": {
"field": "price"
}
}
}
}
avg 計算平均值
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"avg": {
"field": "price"
}
}
}
}
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"value": 31.472726995294746
}
}
}
sum 計算總和
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"sum": {
"field": "price"
}
}
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"value": 346.1999969482422
}
}
}
這里發現一個小問題,手動計算總和應為 346.2 ;這里為 346.1999969482422 ;猜測應該是 Java 中關于小數二進制保存不準確導致的;
stats 基本統計
一次性回傳總數,最大值,最小值,平均值,總和的結果
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"stats": {
"field": "price"
}
}
}
}
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"count": 11,
"min": 13.5,
"max": 44.5,
"avg": 31.472726995294746,
"sum": 346.1999969482422
}
}
}
extended_stats 高級統計
包含基本統計的結果,另外還會統計:平方和,方差,標準差,平均值加減兩個標準差的區間
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"extended_stats": {
"field": "price"
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"count": 11,
"min": 13.5,
"max": 44.5,
"avg": 31.472726995294746,
"sum": 346.1999969482422,
"sum_of_squares": 11530.459805908205,
"variance": 57.691074198573254,
"std_deviation": 7.595464054195323,
"std_deviation_bounds": {
"upper": 46.66365510368539,
"lower": 16.2817988869041
}
}
}
}
percentiles 百分位統計
百分位數是一個統計術語,如果將一組資料從小到大排序,并計算相應的累計百分數,某一百分位所對應資料的值就稱為這一百分位的百分位數,
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"percentiles": {
"field": "price"
}
}
}
}
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"values": {
"1.0": 13.500000000000002,
"5.0": 14,
"25.0": 28.299999237060547,
"50.0": 34.5,
"75.0": 34.79999923706055,
"95.0": 44.074999999999996,
"99.0": 44.5
}
}
}
}
桶聚合
當聚合開始被執行,每個檔案里面的值通過計算來決定符合哪個桶的條件,如果匹配到,檔案將放入相應的桶并接著進行聚合操作,
terms 分組聚合
類似 select count(*) from books group by price
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"terms": {
"field": "type"
}
}
}
}
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "computer",
"doc_count": 3
},
{
"key": "ideas",
"doc_count": 2
},
{
"key": "java",
"doc_count": 2
},
{
"key": "search_engine",
"doc_count": 2
},
{
"key": "c",
"doc_count": 1
},
{
"key": "database",
"doc_count": 1
}
]
}
}
}
精彩的來了,桶聚合與指標聚合可以結合使用,更加豐富了聚合分析的功能
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"terms": {
"field": "type"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
},
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "computer",
"doc_count": 3,
"avg_price": {
"value": 37.166666666666664
},
"sum_price": {
"value": 111.5
}
},
{
"key": "ideas",
"doc_count": 2,
"avg_price": {
"value": 31.149999618530273
},
"sum_price": {
"value": 62.29999923706055
}
},
{
"key": "java",
"doc_count": 2,
"avg_price": {
"value": 24.149999618530273
},
"sum_price": {
"value": 48.29999923706055
}
},
{
"key": "search_engine",
"doc_count": 2,
"avg_price": {
"value": 32.29999923706055
},
"sum_price": {
"value": 64.5999984741211
}
},
{
"key": "c",
"doc_count": 1,
"avg_price": {
"value": 23.5
},
"sum_price": {
"value": 23.5
}
},
{
"key": "database",
"doc_count": 1,
"avg_price": {
"value": 36
},
"sum_price": {
"value": 36
}
}
]
}
}
}
filter 過濾器聚合
把符合條件的檔案放到一個桶里進行統計相關指標;
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"filter": {
"match": {
"name": "java"
}
}
}
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count": 1
}
}
}
filters 多過濾器聚合
把符合多個過濾器的檔案分到不同的桶里進行統計
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"filters": {
"filters": [
{
"match": {
"name": "java"
}
},
{
"match": {
"name": "c"
}
}
]
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"buckets": [
{
"doc_count": 1
},
{
"doc_count": 1
}
]
}
}
}
missing 空值聚合
把索引中的缺失欄位的檔案分到一個桶里,類似 select count(*) from books where filedA is null
GET books/_search
{
"size": 0,
"aggs": {
"my_result": {
"missing": {
"field": "price"
}
}
}
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"my_result": {
"doc_count": 0
}
}
}
組合使用案例1
GET books/_search
{
"size": 0,
"aggs": {
"missing_result": {
"missing": {
"field": "price"
}
},
"sum_result": {
"sum": {
"field": "price"
}
}
}
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/423762.html
標籤:其他
