Prometheus+Grafana全方位監控Kubernetes集群
文章目錄
- Prometheus+Grafana全方位監控Kubernetes集群
- 1.k8s監控指標
- 2.k8s基礎環境準備
- 2.1.環境準備
- 2.2.部署nfs作為prometheus存盤
- 2.3.獲取prometheus yaml檔案
- 2.4.創建命名空間prometheus
- 3.在k8s中部署prometheus
- 3.1.prometheus-yaml準備
- 3.2.創建rbac資源
- 3.3.創建configmap資源
- 3.4.創建statefulset資源
- 3.4.1.改造statefulst資源支持靜態pv
- 3.4.2.創建statefulset資源
- 3.5.創建service資源
- 3.5.1.修改service資源支持nodeport
- 3.5.2.創建service資源
- 3.6.查看創建的prometheus所有資源型別
- 3.7.訪問prometheus
- 3.8.prometheus組態檔解釋
- 3.8.1.進入prometheus容器
- 3.8.2.組態檔解釋
- 3.9.k8s metrics頁面
- 4.在k8s中部署grafana
- 4.1.撰寫granfana-pv-pvc資源
- 4.2.撰寫granfana-statefulset資源
- 4.3.撰寫granfana-svc資源
- 4.4.k8s創建grafana
- 4.5.查看資源運行狀態
- 4.6.登陸grafana
- 4.7.匯入k8s資源監控pod資源模板
- 4.8.解決模板運算式問題無法展現所有pod
- 4.8.1.問題描述
- 4.8.2.問題解決
- 5.監控k8s node節點
- 5.1.撰寫一鍵部署node_exporter腳本
- 5.2.對k8s的node進行執行node_exporter腳本
- 5.3.在prometheus的configmap資源中增加node節點配置
- 5.4.匯入k8s node主機監控模板
- 6.k8s使用kube-state-metrics-監控資源狀態
- 6.1.創建rbac資源
- 6.2.創建deployment資源
- 6.3.創建service資源
- 6.4.資源準備就緒
- 6.5.在prometheus查看是否獲取監控指標
- 6.6.匯入k8s 資源狀態模板
- 7.在k8s中部署alertmanager實作告警系統
- 7.1.創建alertmanager-pv-pvc資源
- 7.2.創建alertmanager-cm資源增加微信告警配置
- 7.3.創建alertmanager-deployment資源
- 7.4.創建alertmanager-service資源
- 7.5查看alertmanager所有資源
- 7.6.訪問alertmanager
- 8.配置alertmanager實作k8s告警系統
- 8.1.在NFS上準備兩個告警規則檔案
- 8.2.撰寫rules告警規則的pv、pvcyaml檔案
- 8.3.修改prometheus的statefulset資源集成rules
- 8.4.更新prometheus-statefulset資源
- 8.5.修改prometheus-configmap資源配置alertmanager地址
- 8.5.查看頁面是否增加告警規則
- 8.6.模擬node主機宕機并查看微信告警內容
1.k8s監控指標
kubernetes本身監控
- Node資源利用率
- Node數量
- Pods數量
- 資源物件狀態
Pod監控
- pod數量
- 容器資源利用率
- 應用程式
實作思路
- pod性能
- 使用cadvisor進行實作,監控容器的CPU、記憶體利用率
- Node性能
- 使用node-exporter實作,主要監控節點CPU、記憶體利用率
- K8S資源物件
- 使用kube-state-metrics實作,主要用于監控pod、deployment、service
k8s服務發現參考檔案: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config
本文將會實作k8s全方位監控,并配合grafana展示k8s資源物件的使用狀態,以及配合alertmanager告警
2.k8s基礎環境準備
2.1.環境準備
| IP | 角色 |
|---|---|
| 192.168.16.106 | k8s-master |
| 192.168.16.104 | k8s-node1 |
| 192.168.16.107 | k8s-node2 |
| 192.168.16.105 | nfs |
2.2.部署nfs作為prometheus存盤
[root@nfs ~]# mkdir /data/prometheus
[root@nfs ~]# yum -y install nfs-utils
[root@nfs ~]# vim /etc/exports
/data/prometheus 192.168.16.0/24(rw,sync,no_root_squash)
[root@nfs ~]# systemctl restart nfs
[root@nfs ~]# showmount -e
Export list for nfs:
/data/prometheus 192.168.16.0/24
[root@nfs ~]# chomd -R 777 /data
2.3.獲取prometheus yaml檔案
在這里下載
https://github.com/kubernetes/kubernetes/tree/release-1.16/cluster/addons/prometheus
直接克隆完整目錄也可以
https://github.com/kubernetes/kubernetes.git
已將所有yaml進行了修改,可以參考本人寫的yaml
本人yaml鏈接:鏈接:https://pan.baidu.com/s/1LN8AzLFo2JIvYX0nmgq0EQ
提取碼:u4t0
復制這段內容后打開百度網盤手機App,操作更方便哦prometheus在github的k8s目錄中master分支已經找不到了,可以在release-1.16這里找到
1.拉取prometheus yaml檔案
[root@k8s-master ~/k8s]# git clone https://github.com/kubernetes/kubernetes.git
2.將prometheus yaml檔案復制到其他目錄
[root@k8s-master ~/k8s]# cp -rp kubernetes/cluster/addons/prometheus/ .
本人的yaml檔案

官方yaml檔案

檔案說明
主要分為四部分:prometheus部署、alertmanager部署、kube-state-metrics部署、node-exporter部署
| 檔案名 | 作用 |
|---|---|
| alertmanager-configmap.yaml | alertmanager配置集合的yaml檔案 |
| alertmanager-deployment.yaml | alertmanager創建pod的yaml檔案 |
| alertmanager-pvc.yaml | alertmanager掛載存盤卷的yaml檔案 |
| alertmanager-service.yaml | alertmanager對外暴露埠的yaml檔案 |
| grafana_pv_pvc.yaml | grafana掛載存盤卷的yaml檔案 |
| grafana_statefulset.yaml | grafana發布pod的yaml檔案,采用statefulset資源 |
| grafana_svc.yaml | grafana對外暴露埠的yaml檔案 |
| install_node_exportes.sh | 批量在node節點安裝node_exporter的腳本 |
| k8s_time.yaml | k8s同步宿主機時間的yaml檔案 |
| kube-state-metrics-deployment.yaml | k8s采集資源狀態指標程式的yaml檔案 |
| kube-state-metrics-rbac.yaml | 8s采集程式授權的yaml檔案 |
| kube-state-metrics-service.yaml | k8s采集程式對外暴露的yaml檔案 |
| node-exporter-ds.yml | node_exporter部署的yaml檔案 |
| node-exporter-service.yaml | node_exporter對外暴露的yaml檔案 |
| prometheus-configmap.yaml | prometheus的組態檔集 |
| prometheus-pv-pvc.yaml | prometheus掛載存盤的yaml檔案 |
| prometheus-rbac.yaml | prometheus授權訪問api的yaml檔案 |
| prometheus-rules-pvc.yaml | prometheus告警規則存盤卷的yaml檔案 |
| prometheus-rules.yaml | prometheus將rule做成cm資源的yaml檔案 |
| prometheus-service.yaml | prometheus對外提供訪問的yaml檔案 |
| prometheus-statefulset-static-pv.yaml | prometheus程式部署的yaml檔案 |
2.4.創建命名空間prometheus
創建一個ns為prometheus,將除了kube-state-metrics外的yaml中的namespace修改為prometheus
1.創建ns
[root@k8s-master ~/k8s/prometheus]# kubectl create namespace prometheus
namespace/prometheus created
2.修改
用vim打開輸入以下命令
:%s/namespace: kube-system/namespace: prometheus/g
3.在k8s中部署prometheus
3.1.prometheus-yaml準備
主要用到以下幾個yaml
[root@k8s-master ~/k8s/prometheus]# ls prometheus-*
prometheus-configmap.yaml prometheus-rbac.yaml prometheus-service.yaml prometheus-statefulset.yaml
創建順序
先創建prometheus-rbac.yaml
在創建prometheus-configmap.yaml
在創建prometheus-statefulset.yaml
最后創建prometheus-service.yaml
3.2.創建rbac資源
[root@k8s-master ~/k8s/prometheus]# kubectl create -f prometheus-rbac.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
3.3.創建configmap資源
[root@k8s-master ~/k8s/prometheus]# kubectl create -f prometheus-configmap.yaml
configmap/prometheus-config created
3.4.創建statefulset資源
github上的statefulset資源使用的是storageclasee動態創建pv,由于不會使用storageclass,因此將statefulset資源進行改造,使用靜態pv做存盤
3.4.1.改造statefulst資源支持靜態pv
改造思路:在yaml中增加pv、pvc的配置,在將原來的storageclass配置項洗掉,在120行的volume中增加pvc的配置即可
[root@k8s-master ~/k8s/prometheus]# vim prometheus-statefulset-static-pv.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
namespace: prometheus
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 16Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-prometheus
spec:
capacity:
storage: 16Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/prometheus/prometheus_data
server: 192.168.16.105
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.2.1
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
- name: prometheus-server
image: "prom/prometheus:v2.23.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
subPath: ""
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: prometheus-data
persistentVolumeClaim:
claimName: prometheus-data
3.4.2.創建statefulset資源
[root@k8s-master ~/k8s/prometheus]# kubectl create -f prometheus-statefulset-static-pv.yaml
persistentvolumeclaim/prometheus-data created
persistentvolume/pv-prometheus created
statefulset.apps/prometheus created
3.5.創建service資源
3.5.1.修改service資源支持nodeport
[root@k8s-master ~/k8s/prometheus]# vim prometheus-service.yaml
kind: Service
apiVersion: v1
metadata:
name: prometheus
namespace: prometheus
labels:
kubernetes.io/name: "Prometheus"
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
type: NodePort
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
selector:
k8s-app: prometheus
3.5.2.創建service資源
[root@k8s-master ~/k8s/prometheus]# kubectl create -f prometheus-service.yaml
service/prometheus created
3.6.查看創建的prometheus所有資源型別
[root@k8s-master ~/k8s/prometheus]# kubectl get pod,svc,pv,pvc -n prometheus -o wide

主要訪問30387埠看到prometheus
3.7.訪問prometheus
使用任意node節點的ip加30387埠即可訪問:http://192.168.16.106:30387/

查看監控主機
可以看到已經有很多了,這些配置都是configmap資源中配置的

3.8.prometheus組態檔解釋
3.8.1.進入prometheus容器
語法:kubectl exec -it pod名 進入的環境 -c 容器名稱 -n 命名空間
[root@k8s-master ~/k8s/prometheus]# kubectl exec -it prometheus-0 sh -c prometheus-server -n prometheus
/prometheus $
組態檔位于:/etc/config/prometheus.yml
tsdb存盤位于:/data
3.8.2.組態檔解釋
/prometheus $ more /etc/config/prometheus.yml
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
靜態配置,將本機加到了prometheus監控,這個localhost就是運行prometheus容器的地址
kubernetes-apiservers自動發現
將apiserver的地址進行暴露并獲取監控指標
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

k8s的node節點自動發現
自動發現k8s中的所有node節點并進行監控
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __metrics_path__
replacement: /metrics/cadvisor
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

發現endpoint的資源
主要是發現endpoint資源型別的pod,可以通過kubectl get ep查看誰是endpoint資源
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name

發現services
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
發現pod
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
3.9.k8s metrics頁面
metrics頁面訪問如下也沒關系,metrics貌似只能集群內部進行訪問

只要能在prometheus搜索到container資料就可以

4.在k8s中部署grafana
4.1.撰寫granfana-pv-pvc資源
1.撰寫資源
[root@k8s-master ~/k8s/prometheus3]# vim grafana_pv_pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-ui-data
namespace: grafana-ui
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-grafana-ui-data
namespace: grafana-ui
spec:
capacity:
storage: 3Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/prometheus/grafana
server: 192.168.16.105
2.在nfs上創建對應的掛載點
[root@nfs ~]# mkdir /data/prometheus/grafana
4.2.撰寫granfana-statefulset資源
[root@k8s-master ~/k8s/prometheus3]# vim grafana_statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: grafana-ui
namespace: grafana-ui
spec:
serviceName: "grafana-ui"
replicas: 1
selector:
matchLabels:
app: grafana-ui
template:
metadata:
labels:
app: grafana-ui
spec:
containers:
- name: grafana-ui
image: grafana/grafana:6.6.2
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 3000
protocol: TCP
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- name: grafana-ui-data
mountPath: /var/lib/grafana
subPath: ""
securityContext:
fsGroup: 472
runAsUser: 472
volumes:
- name: grafana-ui-data
persistentVolumeClaim:
claimName: grafana-ui-data
4.3.撰寫granfana-svc資源
[root@k8s-master ~/k8s/prometheus3]# vim grafana_svc.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana-ui
namespace: grafana-ui
spec:
type: NodePort
ports:
- name: http
port: 3000
protocol: TCP
targetPort: 3000
selector:
app: grafana-ui
4.4.k8s創建grafana
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f grafana_pv_pvc.yaml
persistentvolumeclaim "grafana-ui-data" created
persistentvolume "pv-grafana-ui-data" created
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f prometheus-statefulset-static-pv.yaml
persistentvolumeclaim/prometheus-data created
persistentvolume/pv-prometheus created
statefulset.apps/prometheus created
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f grafana_svc.yaml
service/grafana-ui created
4.5.查看資源運行狀態
[root@k8s-master ~/k8s/prometheus3]# kubectl get pv,pvc,pod,statefulset,svc -n grafana-ui
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pv-grafana-ui-data 3Gi RWO Retain Terminating grafana-ui/grafana-ui-data 151m
persistentvolume/pv-prometheus 16Gi RWO Retain Bound kube-system/prometheus-data 47m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/grafana-ui-data Terminating pv-grafana-ui-data 3Gi RWO 151m
NAME READY STATUS RESTARTS AGE
pod/grafana-ui-0 1/1 Running 0 16m
NAME READY AGE
statefulset.apps/grafana-ui 1/1 16m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/grafana-ui NodePort 10.96.170.142 <none> 3000:32040/TCP 85m

4.6.登陸grafana
訪問:集群任意ip和32040埠即可訪問

4.7.匯入k8s資源監控pod資源模板
推薦模板:
- 集群資源監控:3119
- 資源狀態監控:6417
- node監控:9276

匯入成功

4.8.解決模板運算式問題無法展現所有pod
4.8.1.問題描述
模板中的圖形關于pod和docker的全部有問題,僅單單顯示一個pod

4.8.2.問題解決
修改他們的運算式就可以了,將pod_name修改為pod

修改為立馬顯示出所有pod,所有關于pod和docker的都是這么改

最終展示效果

5.監控k8s node節點
對于node節點的監控我們不用部署在k8s里,直接在每臺node機器上安裝node_exporter即可
5.1.撰寫一鍵部署node_exporter腳本
[root@k8s-master ~/k8s/prometheus3]# vim install_node_exportes.sh
#!/bin/bash
#批量安裝node_exporter
soft_dir=/root/soft
if [ ! -e $soft_dir ];then
mkdir $soft_dir
fi
netstat -lnpt | grep 9100
if [ $? -eq 0 ];then
use=`netstat -lnpt | grep 9100 | awk -F/ '{print $NF}'`
echo "9100埠已經被占用,占用者是 $use"
exit 1
fi
cd $soft_dir
wget http://192.168.16.106:888/prometheus/node_exporter-1.0.1.linux-amd64.tar.gz
tar xf node_exporter-1.0.1.linux-amd64.tar.gz
mv node_exporter-1.0.1.linux-amd64 /usr/local/node_exporter
cat <<EOF >/usr/lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|node_exporter).service
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter
netstat -lnpt | grep 9100
if [ $? -eq 0 ];then
ehoc "node_eporter install finish..."
fi
5.2.對k8s的node進行執行node_exporter腳本
在這里下載腳本用就行了

[root@k8s-node1 ~]# wget http://192.168.16.106:888/install_node_exportes.sh
[root@k8s-node1 ~]# sh install_node_exportes.sh
[root@k8s-node1 ~]# netstat -lnpt | grep 9100
tcp6 0 0 :::9100 :::* LISTEN 14906/node_exporter
5.3.在prometheus的configmap資源中增加node節點配置
[root@k8s-master ~/k8s/prometheus3]# vim prometheus-configmap.yaml
- job_name: k8s-node
static_configs:
- targets:
- 192.168.16.104:9100
- 192.168.16.107:9100

更新配置
更新完配置,prometheus頁面會立馬顯示,因此每當configmap一修改,prometheus容器就會多載
[root@k8s-master ~/k8s/prometheus3]# kubectl apply -f prometheus-configmap.yaml
configmap/prometheus-config created
成功添加node監控

5.4.匯入k8s node主機監控模板
node監控:9276

填寫資訊

查看圖形

6.k8s使用kube-state-metrics-監控資源狀態
6.1.創建rbac資源
[root@k8s-master ~/k8s/prometheus3]# kubectl create kube-state-metrics-rbac.yaml
6.2.創建deployment資源
deployment資源里面結合了configmap資源
需要把鏡像的地址修改成lizhenliang/kube-state-metrics:v1.8.0、lizhenliang/addon-resizer:1.8.6
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f kube-state-metrics-deployment.yaml

6.3.創建service資源
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f kube-state-metrics-service.yaml
6.4.資源準備就緒
[root@k8s-master ~/k8s/prometheus3]# kubectl get all -n kube-system | grep kube-state

6.5.在prometheus查看是否獲取監控指標
安裝完kube-state-metrics之后,直接就可以在prometheus上查詢監控指標,都是以kube開頭的

6.6.匯入k8s 資源狀態模板
資源狀態監控:6417

查看圖形,也是有很多監控不到

7.在k8s中部署alertmanager實作告警系統
7.1.創建alertmanager-pv-pvc資源
1.撰寫yaml
[root@k8s-master ~/k8s/prometheus3]# vim alertmanager-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager-data
namespace: alertmanager
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-alertmanager-data
namespace: alertmanager
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/prometheus/alertmanager
server: 192.168.16.105
2.創建
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f alertmanager-pvc.yaml
persistentvolumeclaim/alertmanager created
persistentvolume/pv-alertmanager-data create
7.2.創建alertmanager-cm資源增加微信告警配置
增加微信報警
1.增加微信告警配置
[root@k8s-master ~/k8s/prometheus3]# vim alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: alertmanager
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
receivers:
- name: 'wechat'
wechat_configs:
- corp_id: 'ww48f74fc8ed3a07ba'
to_party: '1'
agent_id: '1000003'
api_secret: 'j3ocaGJJM7KejlqzBIJ38b6D6t9QhqlIAh7k4fA1cT0'
send_resolved: true
route:
group_interval: 1m
group_wait: 10s
receiver: wechat
repeat_interval: 1m
2.創建資源
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f alertmanager-configmap.yaml
configmap/alertmanager-config created
7.3.創建alertmanager-deployment資源
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f alertmanager-deployment.yaml
deployment.apps/alertmanager created
7.4.創建alertmanager-service資源
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f alertmanager-service.yaml
service/alertmanager created
7.5查看alertmanager所有資源
[root@k8s-master ~/k8s/prometheus3]# kubectl get all,pv,pvc,cm -n alertmanager

7.6.訪問alertmanager
任意node節點+31831埠即可

組態檔已經支持微信報警

8.配置alertmanager實作k8s告警系統
8.1.在NFS上準備兩個告警規則檔案
我們對于告警規則檔案不采用configmap的方式而是采用pv、pvc的方式把告警規則掛載到容器里
1.在nfs上創建pv存盤路徑
[root@nfs ~]# mkdir /data/prometheus/rules
[root@nfs ~]# chmod -R 777 /data
[root@nfs ~]# cd /data/prometheus/rules
2.準備主機宕機的告警規則檔案
[root@nfs rules]# vim hostdown.yml
groups:
- name: general.rules
rules:
- alert: 主機宕機
expr: up == 0
for: 1m
labels:
serverity: error
annotations:
summary: "主機 {{ $labels.instance }} 停止作業"
description: "{{ $labels.instance }} job {{ $labels.job }} 已經宕機5分鐘以上!"
3.準備主機基礎監控告警規則檔案
[root@nfs rules]# vim node.yml
groups:
- name: node.rules
rules:
- alert: NodeFilessystemUsage
expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs",mountpoint="/"} / node_filesystem_size_bytes{fstype=~"ext4|xfs",mountpoint="/"} *100) > 80
for: 1m
labels:
serverity: warning
annotations:
summary: "主機 {{ $labels.instance }} : {{ $labels.mountpoint }} 磁盤使用率過高"
description: "{{ $labels.instance }} : {{ $labels.mountpoint }} 磁盤使用率超過80% (當前值: {{ $value }}) "
- alert: NodeMemoryUsage
expr: 100 - ((node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100) > 80
for: 1m
labels:
serverity: warning
annotations:
summary: "主機 {{ $labels.instance }} 記憶體使用率過高"
description: "{{ $labels.instance }} 記憶體使用率超過80% (當前值: {{ $value }}) "
- alert: NodeCpuUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode='idle'}[5m])) by (instance) *100) > 80
for: 1m
labels:
serverity: warning
annotations:
summary: "主機 {{ $labels.instance }} CPU使用率過高"
description: "{{ $labels.instance }} CPU使用率超過80% (當前值: {{ $value }}) "
8.2.撰寫rules告警規則的pv、pvcyaml檔案
1.撰寫資源檔案
[root@k8s-master ~/k8s/prometheus3]# vim prometheus-rules-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-rules
namespace: kube-system
labels:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-prometheus-rules
spec:
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
nfs:
path: /data/prometheus/rules
server: 192.168.16.105
2.創建資源
[root@k8s-master ~/k8s/prometheus3]# kubectl create -f prometheus-rules-pvc.yaml
persistentvolumeclaim/prometheus-rules created
persistentvolume/pv-prometheus-rules created
8.3.修改prometheus的statefulset資源集成rules
在prometheus的statefulset資源中增加rules的pvc掛載路徑
[root@k8s-master ~/k8s/prometheus3]# vim prometheus-statefulset-static-pv.yaml
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-rules
mountPath: /etc/config/rules
- name: prometheus-data
mountPath: /data
subPath: ""
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: prometheus-rules
persistentVolumeClaim:
claimName: prometheus-rules
- name: prometheus-data
persistentVolumeClaim:
claimName: prometheus-data

8.4.更新prometheus-statefulset資源
[root@k8s-master ~/k8s/prometheus3]# kubectl apply -f prometheus-statefulset-static-pv.yaml
8.5.修改prometheus-configmap資源配置alertmanager地址
1.修改配置增加alertmanager地址
[root@k8s-master ~/k8s/prometheus3]# vim prometheus-configmap.yaml
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.16.106:31831
2.更新資源
[root@k8s-master ~/k8s/prometheus3]# kubectl apply -f prometheus-configmap.yaml
configmap/prometheus-config configured
8.5.查看頁面是否增加告警規則
已經成功填加rules告警規則

8.6.模擬node主機宕機并查看微信告警內容
模擬觸發告警
將任意一個node節點的node_exporter停掉即可
[root@k8s-node1 ~]# systemctl stop node_exporter
告警已經產生

告警訊息已經發送

查看告警內容
已經成功收到告警,k8s監控系列篇到此結束

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/245808.html
標籤:其他
下一篇:日常求助
