前言
在大型分布式集群系統中由于日志是分布在不同服務器上,在排錯程序中需要登錄不同服務器grep | tail 查看日志是非常不方便,所以需要統一的日志管理平臺聚合收集日志,比如阿里的sls(收費產品)…
但是一般實際開發程序中存在4套環境,dev(開發),test(測驗),pre(預發驗證),和prod(生產),如果你的生產環境使用的是全套云服務且忽略成本,那你可以直接使用云服務廠商的日志組件,單如果你想節約成本,有沒有不收費又好用的日志組件呢?有,ELK,
ELK是目前最流行的日志搜索組合組件,分別是Elasticsearch+Logstash+Kibana組件的簡稱,但是FileBeat又是什么玩意兒呢?請往下看,
本文環境:
Windows10+(Elasticsearch+Logstash+Kibana+FileBeat)7.8.0
為什么是windows環境,首先在搭建ELK環境程序中需要大量撰寫測驗一些組態檔要不停除錯,組態檔在(Linux | windows)環境是通用的,為了方便且更好的寫好這篇文章,所以直接用我本地環境,本文以實戰為主,拒絕花里胡巧沒用的,ok,在動手搭建之前先簡單介紹一下這4個組件到底是干什么的,
1、ElasticSearch
1、基于Lucene的分布式全文搜索引擎,2、基于rest介面,3、java語言開發,原始碼開放,從這3點可以看出,1、擴展方便為分布式而生,2、基于rest介面訪問,無關對接語言,3、開源免費,研發實力足夠強可以自己定制,es天生適合做大資料搜索存盤
2、Logstash
Logstash是一個開源的服務器端資料處理管道,可以同時從多個資料源獲取資料,并對其進行轉換,然后將其發送到你最喜歡的“存盤
簡單解釋下,這個組件是專門收集日志,并且對日志進行加工處理格式化,然后分發到你指定的地方(mysql,mq,nosql)的一個資料處理管道,基于java環境,但是特別占用記憶體
3、Kibana
開源的分析和可視化web平臺,主要是和es搭配使用,
4、FileBeat
輕量級的日志采集工具,它和Logstash是同一個作者,因為Logstash太笨重且吃記憶體,所有作者新出了這么一個組件,FileBeat可以一個行程搜集服務器中所有指定的多個日志檔案,Logstash做不到的,但是Logstash強大的資料處理和資料分發能力比FileBeat做的好,
下面是這4個組件簡單的邏輯關系圖

在搭建之前先下載這4個組件,https://www.elastic.co/cn/downloads/
1、啟動Elasticsearch
如果你是Linux,請異步 這里
修改es組態檔

以下是我的配置:
# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: # #cluster.name: my-application # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: #節點,名字自定義 node.name: node-1 # # Add custom attributes to the node: # #node.attr.rack: r1 # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): #資料存盤位置 path.data: E:\elk\elasticsearch-7.8.0-windows-x86_64\elasticsearch-7.8.0\data # # Path to log files: #日志存盤位置 path.logs: E:\elk\elasticsearch-7.8.0-windows-x86_64\elasticsearch-7.8.0\logs # # ----------------------------------- Memory ----------------------------------- # # Lock the memory on startup: # #bootstrap.memory_lock: true # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # Set the bind address to a specific IP (IPv4 or IPv6): #系結訪問的主機ip,0.0.0.0 是不限制 network.host: 0.0.0.0 # # Set a custom port for HTTP: #系結的訪問埠,默認就是9200 http.port: 9200 # # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when this node is started: # The default list of hosts is ["127.0.0.1", "[::1]"] # #discovery.seed_hosts: ["host1", "host2"] # # Bootstrap the cluster using an initial set of master-eligible nodes: #集群初始化的主節點,這個需要包含node.name 否則會報錯 cluster.initial_master_nodes: ["node-1"] # # For more information, consult the discovery and cluster formation module documentation. # # ---------------------------------- Gateway ----------------------------------- # # Block initial recovery after a full cluster restart until N nodes are started: # #gateway.recover_after_nodes: 3 # # For more information, consult the gateway module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: true # 這些是es-head需要的配置 http.cors.enabled: true http.cors.allow-origin: "*" node.master: true node.data: true
修改完成直接雙擊啟動

瀏覽器訪問如下,則啟動成功,

2、Kibana啟動
修改Kibana組態檔
# Kibana is served by a back end server. This setting specifies the port to use. #訪問埠,默認就是5601 server.port: 5601 # Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values. # The default is 'localhost', which usually means remote machines will not be able to connect. # To allow connections from remote users, set this parameter to a non-loopback address. server.host: "0.0.0.0" # Enables you to specify a path to mount Kibana at if you are running behind a proxy. # Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath # from requests it receives, and to prevent a deprecation warning at startup. # This setting cannot end in a slash. #server.basePath: "" # Specifies whether Kibana should rewrite requests that are prefixed with # `server.basePath` or require that they are rewritten by your reverse proxy. # This setting was effectively always `false` before Kibana 6.3 and will # default to `true` starting in Kibana 7.0. #server.rewriteBasePath: false # The maximum payload size in bytes for incoming server requests. #server.maxPayloadBytes: 1048576 # The Kibana server's name. This is used for display purposes. #server.name: "your-hostname" # The URLs of the Elasticsearch instances to use for all your queries. elasticsearch.hosts: ["http://localhost:9200"]
雙擊啟動即可
啟動成功頁面

啟動成功之后可以在Kibana中查看一下當前es中存在的索引

3、啟動logstash
撰寫一個logstash_test.conf組態檔,體驗一下Logstash
input {
stdin {}
}
output {
stdout{ }
}
啟動命令
.\bin\logstash.bat -f .\config\logstash_test.conf
直接在控制臺輸入test logstash,控制臺輸出我們輸入的內容

換一種輸出資料的格式,以json | rubydebug
stdout{ codec => rubydebug }
stdout{ codec => json}
大家可以自己測驗,看一下輸出的效果
Logstash接入FileBeat
組態檔做一下改動,多了inpu插件配置,監聽一個5044埠,這個就是FileBeat網路埠
# Sample Logstash configuration for creating a simple # Beats -> Logstash -> Elasticsearch pipeline. input { beats { port => 5044 } } output { stdout{ codec => rubydebug } }
4、啟動FileBeat
新建FileBeat組態檔filebeat_test.yml
# ============================== Filebeat inputs =============================== filebeat.inputs: - type: log #開啟日志讀取 enabled: true #日志路徑 paths: - D:\data\logs\demo\*.log #額外的欄位 fields: app: demo review: 1 #匹配多行,按照時間正則匹配 yyyy-MM-dd HH:mm:ss.SSS multiline.pattern: ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3} multiline.negate: true #日期之后匹配 multiline.match: after #tail_files = true 不收集存量日志 tail_files: true #額外增加的標簽欄位 tags: ["demo"] # ============================== Filebeat modules ============================== filebeat.config.modules: # Glob pattern for configuration loading path: ${path.config}/modules.d/*.yml # Set to true to enable config reloading reload.enabled: false # Period on which files under path should be checked for changes #reload.period: 10s # ======================= Elasticsearch template setting ======================= #es 索引模板 setup.template.settings: index.number_of_shards: 1 #index.codec: best_compression #_source.enabled: false # ------------------------------ Logstash Output ------------------------------- #輸出到logstash output.logstash: #The Logstash hosts hosts: ["localhost:5044"] processors: - add_host_metadata: ~ - add_cloud_metadata: ~ - add_docker_metadata: ~ - add_kubernetes_metadata: ~
啟動FileBeat
.\filebeat.exe -e -c .\filebeat_test.yml
訪問demo專案,列印一些日志,讓FileBeta讀取

Logstash控制臺,這是json形式列印出來的

我們挑一條日志格式化看看FileBeat采集的日志通過Logstash列印出來之后是什么樣子,
{ "@version":"1", "message":"2020-08-16 17:29:13.138 6a83f82acbf8000 [http-nio-8080-exec-9] DEBUG com.cd.demo.controllr.DemoController[28] - ======================debug", "fields":{ "app":"demo", "review":1 }, "tags":[ "demo", "beats_input_codec_plain_applied" ], "@timestamp":"2020-08-16T09:29:15.075Z", "ecs":{ "version":"1.5.0" }, "input":{ "type":"log" }, "host":{ "name":"WIN-IJE5R5BU096", "architecture":"x86_64", "id":"0ea80d06-30e3-4b1f-9e15-cf0006381169", "ip":[ "fe80::445f:b46c:2007:b14b", "192.168.1.83", "fe80::3c10:2e3f:fc46:9d28", "169.254.157.40", "fe80::9842:fa00:1199:8207", "169.254.130.7", "fe80::ac1b:b018:58f6:f20e", "169.254.242.14", "172.24.36.1", "fe80::2d69:dab9:f82c:33ce", "169.254.51.206" ], "hostname":"WIN-IJE5R5BU096", "mac":[ "f8:b4:6a:20:7f:f4", "c0:b5:d7:28:44:85", "c2:b5:d7:28:44:85", "e2:b5:d7:28:44:85", "00:ff:3f:88:6e:59" ], "os":{ "name":"Windows 10 Home China", "version":"10.0", "family":"windows", "build":"18363.1016", "platform":"windows", "kernel":"10.0.18362.1016 (WinBuild.160101.0800)" } }, "log":{ "file":{ "path":"D:\data\logs\demo\demo-info.log" }, "offset":2020 }, "agent":{ "name":"WIN-IJE5R5BU096", "version":"7.8.0", "ephemeral_id":"f073c7ec-d88b-4ddb-9870-bd6128d5497a", "type":"filebeat", "id":"e0b6d369-cc12-4132-bf81-5c2f85ce2b2a", "hostname":"WIN-IJE5R5BU096" } }
以上就是FileBeat收集通過Logstash未經過濾列印在控制臺的資料,可以看到FileBeat收集的日志還是很全面(軟體,硬體,網路),這些日志我們并不全部需要,我們僅把自己需要的欄位存盤即可,這就需要格式化資料,格式化資料需要借助Logstash的Filter插件,https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
找到grok插件,它的意思是將非結構化事件資料轉為欄位,轉為欄位之后我們方便存盤統計
我們真正要提取的是message欄位,這個欄位是我們實際的業務日志,使用grok撰寫正則去提取message欄位,可以用Kibana自帶的grok工具或者是 http://grokdebug.herokuapp.com/?#(grokdebug并不好用,時常無法訪問)
來測驗撰寫正則,匹配提取我們的日志,下圖就展示了將message抽取為一個個單獨的欄位

完整的Logasth配置
# Sample Logstash configuration for creating a simple # Beats -> Logstash -> Elasticsearch pipeline. input { beats { port => 5044 } } filter { #提取message欄位,這個欄位是業務日志,使用正則匹配的形式將message提取為一個個欄位,為什么有兩個message呢?假如你的日志格式不統一就需要多個正則去匹配,但是盡量避免這種情況的出現 #多個正則匹配,如果日志量比較大,會降低Logstash的處理效率, grok{ match => [ "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s+%{BASE16NUM:traceId}+\s\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s+%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}", "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s\s+\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s\s++%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}" ] } #使用業務日志時間替換Logstash的@timestamp時間,避免兩個時間不同步 date { match => ["logdate", "yyyy-MM-dd HH:mm:ss.SSS"] target => "@timestamp" remove_field => ["logdate"] } #去除一些不需要的欄位,注意一定要保留@timestamp欄位否則無法按照日期維度建立索引,同時也保留message欄位方便我們查看 #同時將額外添加的fields欄位當作一個新的欄位添加,這樣以便我們知道是哪個應用日志,也可以使用tags欄位來做定義 mutate{ remove_field => ["@version","@metadata","input","agent","ecs","fields"] add_field => { "appName" => "%{[fields][app]}" } } } #經過filter過濾之后的欄位繼續輸出到控制臺 output { stdout{ codec => json } }
重啟Logstash觀察控制臺日志
{ "tags":[ "demo", "beats_input_codec_plain_applied" ], "log":{ "file":{ "path":"D:\data\logs\demo\demo-info.log" }, "offset":2020 }, "host":{ "mac":[ "f8:b4:6a:20:7f:f4", "c0:b5:d7:28:44:85", "c2:b5:d7:28:44:85", "e2:b5:d7:28:44:85", "00:ff:3f:88:6e:59" ], "os":{ "build":"18363.1016", "platform":"windows", "name":"Windows 10 Home China", "kernel":"10.0.18362.1016 (WinBuild.160101.0800)", "family":"windows", "version":"10.0" }, "id":"0ea80d06-30e3-4b1f-9e15-cf0006381169", "name":"WIN-IJE5R5BU096", "ip":[ "fe80::445f:b46c:2007:b14b", "192.168.1.83", "fe80::3c10:2e3f:fc46:9d28", "169.254.157.40", "fe80::9842:fa00:1199:8207", "169.254.130.7", "fe80::ac1b:b018:58f6:f20e", "169.254.242.14", "172.24.36.1", "fe80::2d69:dab9:f82c:33ce", "169.254.51.206" ], "hostname":"WIN-IJE5R5BU096", "architecture":"x86_64" }, "level":"DEBUG", "traceId":"6a91634313f7000", "className":"com.cd.demo.controllr.DemoController", "msg":"======================debug", "appName":"demo", "classLine":"28", "message":"2020-08-17 09:07:13.761 6a91634313f7000 [http-nio-8080-exec-1] DEBUG com.cd.demo.controllr.DemoController[28] - ======================debug", "thread":"http-nio-8080-exec-1", "@timestamp":"2020-08-17T01:07:13.761Z" }
現在看這個日志基本上已經是我們想要的,包括,日志路徑,應用名,業務日志,主機資訊等核心日志資訊,
接著我們將日志結果輸出到ES中去 檔案地址:https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-index
高版本的Logstash日志索引默認建立方式是{now/d}-000001 格式,例如:logstash-2020.02.10-000001,如果想自己定義指定 ilm_enabled => false即可
Logstash配置輸出到ES完整配置,加了詳細配置說明
# Sample Logstash configuration for creating a simple # Beats -> Logstash -> Elasticsearch pipeline. input { beats { port => 5044 } } filter { #提取message欄位,這個欄位是業務日志,使用正則匹配的形式將message提取為一個個欄位,為什么有兩個message呢?假如你的日志格式不統一就需要多個正則去匹配,但是盡量避免這種情況的出現 #多個正則匹配,如果日志量比較大,會降低Logstash的處理效率, grok{ match => [ "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s+%{BASE16NUM:traceId}+\s\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s+%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}", "message" , "(?m)%{TIMESTAMP_ISO8601:logdate}\s\s+\[%{DATA:thread}\]\s+%{LOGLEVEL:level}\s\s++%{PROG:className}\[%{INT:classLine}\]\s+-+\s+%{GREEDYDATA:msg}" ] } #使用業務日志時間替換Logstash的@timestamp時間,避免兩個時間不同步 date { match => ["logdate", "yyyy-MM-dd HH:mm:ss.SSS"] target => "@timestamp" remove_field => ["logdate"] } #去除一些不需要的欄位,注意一定要保留@timestamp欄位否則無法按照日期維度建立索引,同時也保留message欄位方便我們查看 #同時將額外添加的fields欄位當作一個新的欄位添加,這樣以便我們知道是哪個應用日志,也可以使用tags欄位來做定義 mutate{ remove_field => ["@version","@metadata","input","agent","ecs","fields"] add_field => { "appName" => "%{[fields][app]}" } } } #經過filter過濾之后的欄位繼續輸出到控制臺 和 ES output { stdout{ codec => rubydebug } elasticsearch { hosts => ["127.0.0.1:9200"] #按照每天一個日志索引建立 index => "logstash-demo-%{+yyyy.MM.dd}" #關閉Logstash的ilm_enabled,否則會按照{now/d}-000001 方式創建索引檔案 #ilm_enabled => false } }

Kibana展示日志資料,Kibane讀取ES的資料索引

Kibana->Discover展示資料
全量message

message欄位決議之后的欄位展示

建立日志分析圖
餅圖,按照應用維度創建

柱形圖,按照日志級別維度創建

時間維度,最近30次日志分布

創建儀表盤



注意,如果你想改圖示名字,點擊save重新保存即可,圖示改名也是如此,

以上就是ELK+FileBeat全量配置,如果日志量太大可以優化Logsasth,比如加Logstash集群,不使用正則匹配日志等,本文ELK雖然是基于windows搭建,但配置資訊在Linux是可以使用,
linux環境下后臺啟動
nohup filebeat -e -c filebeat_pre.yml &
nohup ./bin/logstash -f config/logstash_pre.conf &
nohup ./bin/kibana --allow-root &
參考材料:
官方檔案:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
https://www.elastic.co/guide/en/kibana/current/index.html
https://www.elastic.co/guide/en/logstash/current/index.html
https://www.elastic.co/guide/en/beats/libbeat/current/index.html
Kibana中文檔案:
https://www.elastic.co/guide/cn/kibana/current/index.html
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/3857.html
標籤:其他
下一篇:ceph簡單了解
