elasticsearch系統學習筆記9-聚合分析 Aggregations-有解無憂

elasticsearch系統學習筆記9-聚合分析 Aggregations

- 概念
- 分類
- 指標聚合
- - - 資料準備
    - max 統計最大值
    - min 統計最小值
    - value_count 統計檔案數量
    - cardinality 基數統計（統計去重后的檔案數量）
    - avg 計算平均值
    - sum 計算總和
    - stats 基本統計
    - extended_stats 高級統計
    - percentiles 百分位統計
- 桶聚合
- - - terms 分組聚合
    - filter 過濾器聚合
    - filters 多過濾器聚合
    - missing 空值聚合
    - 組合使用案例1

概念

桶（Buckets）
- 滿足特定條件的檔案的集合；（類似 SQL 中的 group by）
指標（Metrics）
- 對桶內的檔案進行統計計算；（類似 SQL 中的統計函式 COUNT() 、 SUM() 、 MAX() 等等）

指標聚合

對一組資料進行統計，例如：求最大值、最小值、計算總數、求平均值、求和等等；
類似 SQL 中的 max、min、count、avg、sum 等統計函式；

資料準備

PUT /books
{
    "mappings": {
        "_doc": {
            "properties": {
                "name": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "price": {
                    "type": "float"
                },
                "type": {
                    "type": "text",
                    "fielddata": true,
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

POST /books/_doc/_bulk
{"index":{"_id":1}}
{"name":"C語言編程","price":23.5,"type":"c"}
{"index":{"_id":2}}
{"name":"資料結構與演算法","price":34.5,"type":"ideas"}
{"index":{"_id":3}}
{"name":"計算機組成原理","price":34.5,"type":"Computer"}
{"index":{"_id":4}}
{"name":"計算機網路","price":32.5,"type":"Computer"}
{"index":{"_id":5}}
{"name":"計算機作業系統","price":44.5,"type":"Computer"}
{"index":{"_id":6}}
{"name":"Java 編程","price":13.5,"type":"java"}
{"index":{"_id":7}}
{"name":"資料庫原理","price":36.0,"type":"Database"}
{"index":{"_id":8}}
{"name":"ElasticSearch搜索引擎","price":34.8,"type":"search_engine"}
{"index":{"_id":9}}
{"name":"Lucene 原理","price":29.8,"type":"search_engine"}
{"index":{"_id":10}}
{"name":"JVM 技術","price":34.8,"type":"java"}
{"index":{"_id":11}}
{"name":"設計模式","price":27.8,"type":"ideas"}

max 統計最大值

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "max": {
        "field": "price"
      }
    }
  }
}

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "value": 44.5
    }
  }
}

min 統計最小值

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "min": {
        "field": "price"
      }
    }
  }
}

value_count 統計檔案數量

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "value_count": {
        "field": "price"
      }
    }
  }
}

cardinality 基數統計（統計去重后的檔案數量）

類似 SQL 中的 select count(distinct price) from books

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "cardinality": {
        "field": "price"
      }
    }
  }
}

avg 計算平均值

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "avg": {
        "field": "price"
      }
    }
  }
}

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "value": 31.472726995294746
    }
  }
}

sum 計算總和

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "sum": {
        "field": "price"
      }
    }
  }
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "value": 346.1999969482422
    }
  }
}

這里發現一個小問題，手動計算總和應為 346.2 ；這里為 346.1999969482422 ；猜測應該是 Java 中關于小數二進制保存不準確導致的；

stats 基本統計

一次性回傳總數，最大值，最小值，平均值，總和的結果

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "stats": {
        "field": "price"
      }
    }
  }
}

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "count": 11,
      "min": 13.5,
      "max": 44.5,
      "avg": 31.472726995294746,
      "sum": 346.1999969482422
    }
  }
}

extended_stats 高級統計

包含基本統計的結果，另外還會統計：平方和，方差，標準差，平均值加減兩個標準差的區間

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "extended_stats": {
        "field": "price"
      }
    }
  }
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "count": 11,
      "min": 13.5,
      "max": 44.5,
      "avg": 31.472726995294746,
      "sum": 346.1999969482422,
      "sum_of_squares": 11530.459805908205,
      "variance": 57.691074198573254,
      "std_deviation": 7.595464054195323,
      "std_deviation_bounds": {
        "upper": 46.66365510368539,
        "lower": 16.2817988869041
      }
    }
  }
}

percentiles 百分位統計

百分位數是一個統計術語，如果將一組資料從小到大排序，并計算相應的累計百分數，某一百分位所對應資料的值就稱為這一百分位的百分位數，

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "percentiles": {
        "field": "price"
      }
    }
  }
}

{
  "took": 24,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "values": {
        "1.0": 13.500000000000002,
        "5.0": 14,
        "25.0": 28.299999237060547,
        "50.0": 34.5,
        "75.0": 34.79999923706055,
        "95.0": 44.074999999999996,
        "99.0": 44.5
      }
    }
  }
}

桶聚合

當聚合開始被執行，每個檔案里面的值通過計算來決定符合哪個桶的條件，如果匹配到，檔案將放入相應的桶并接著進行聚合操作，

terms 分組聚合

類似 select count(*) from books group by price

GET books/_search
{
  "size": 0, 
  "aggs": {
    "my_result": {
      "terms": {
        "field": "type"
      }
    }
  }
}

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "computer",
          "doc_count": 3
        },
        {
          "key": "ideas",
          "doc_count": 2
        },
        {
          "key": "java",
          "doc_count": 2
        },
        {
          "key": "search_engine",
          "doc_count": 2
        },
        {
          "key": "c",
          "doc_count": 1
        },
        {
          "key": "database",
          "doc_count": 1
        }
      ]
    }
  }
}

精彩的來了，桶聚合與指標聚合可以結合使用，更加豐富了聚合分析的功能

GET books/_search
{
  "size": 0,
  "aggs": {
    "my_result": {
      "terms": {
        "field": "type"
      },
      "aggs": {
        "sum_price": {
          "sum": {
            "field": "price"
          }
        },
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}


{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "computer",
          "doc_count": 3,
          "avg_price": {
            "value": 37.166666666666664
          },
          "sum_price": {
            "value": 111.5
          }
        },
        {
          "key": "ideas",
          "doc_count": 2,
          "avg_price": {
            "value": 31.149999618530273
          },
          "sum_price": {
            "value": 62.29999923706055
          }
        },
        {
          "key": "java",
          "doc_count": 2,
          "avg_price": {
            "value": 24.149999618530273
          },
          "sum_price": {
            "value": 48.29999923706055
          }
        },
        {
          "key": "search_engine",
          "doc_count": 2,
          "avg_price": {
            "value": 32.29999923706055
          },
          "sum_price": {
            "value": 64.5999984741211
          }
        },
        {
          "key": "c",
          "doc_count": 1,
          "avg_price": {
            "value": 23.5
          },
          "sum_price": {
            "value": 23.5
          }
        },
        {
          "key": "database",
          "doc_count": 1,
          "avg_price": {
            "value": 36
          },
          "sum_price": {
            "value": 36
          }
        }
      ]
    }
  }
}

filter 過濾器聚合

把符合條件的檔案放到一個桶里進行統計相關指標；

GET books/_search
{
  "size": 0,
  "aggs": {
    "my_result": {
      "filter": {
        "match": {
          "name": "java"
        }
      }
    }
  }
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "doc_count": 1
    }
  }
}

filters 多過濾器聚合

把符合多個過濾器的檔案分到不同的桶里進行統計

GET books/_search
{
  "size": 0,
  "aggs": {
    "my_result": {
      "filters": {
        "filters": [
          {
            "match": {
              "name": "java"
            }
          },
          {
            "match": {
              "name": "c"
            }
          }
        ]
      }
    }
  }
}


{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "buckets": [
        {
          "doc_count": 1
        },
        {
          "doc_count": 1
        }
      ]
    }
  }
}

missing 空值聚合

把索引中的缺失欄位的檔案分到一個桶里，類似 select count(*) from books where filedA is null

GET books/_search
{
  "size": 0,
  "aggs": {
    "my_result": {
      "missing": {
        "field": "price"
      }
    }
  }
}

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 11,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "my_result": {
      "doc_count": 0
    }
  }
}

組合使用案例1

GET books/_search
{
  "size": 0,
  "aggs": {
    "missing_result": {
      "missing": {
        "field": "price"
      }
    },
    "sum_result": {
      "sum": {
        "field": "price"
      }
    }
  }
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/423762.html

標籤：其他

上一篇：2022年值得關注的 8 個人工智能趨勢及中國人工智能行業發展情況

下一篇：09、Hadoop框架Zookeeper Java API

elasticsearch系統學習筆記9-聚合分析 Aggregations

elasticsearch系統學習筆記9-聚合分析 Aggregations

概念

分類

指標聚合

資料準備

max 統計最大值

min 統計最小值

value_count 統計檔案數量

cardinality 基數統計（統計去重后的檔案數量）

avg 計算平均值

sum 計算總和

stats 基本統計

extended_stats 高級統計

percentiles 百分位統計

桶聚合

terms 分組聚合

filter 過濾器聚合

filters 多過濾器聚合

missing 空值聚合

組合使用案例1