一 Prometheus概述
1.1 Prometheus簡介
Prometheus是由SoundCloud公司開發的開源監控系統,是繼Kubernetes之后CNCF第2個畢業的專案,在容器和微服務領域得到了廣泛應用,Prometheus的主要特點如下:- 使用指標名稱及鍵值對標識的多維度資料模型,
- 采用靈活的查詢語言PromQL,
- 不依賴分布式存盤,為自治的單節點服務,
- 使用HTTP完成對監控資料的拉取,
- 支持通過網關推送時序資料,
- 支持多種圖形和Dashboard的展示,例如Grafana,
- Prometheus Server:負責監控資料采集和時序資料存盤,并提供資料查詢功能,
- 客戶端SDK:對接Prometheus的開發工具包,
- Push Gateway:推送資料的網關組件,
- 第三方Exporter:各種外部指標收集系統,其資料可以被Prometheus采集,
- AlertManager:告警管理器,
- 其他輔助支持工具,
- 從Kubernetes Master獲取需要監控的資源或服務資訊;
- 從各種Exporter抓取(Pull)指標資料,然后將指標資料保存在時序資料庫(TSDB)中;
- 向其他系統提供HTTP API進行查詢;
- 提供基于PromQL語言的資料查詢;
- 可以將告警資料推送(Push)給AlertManager,等等,
1.2 Prometheus組件架構圖
Prometheus 直接從jobs接識訓者通過中間的 Pushgateway 網關被動獲取指標資料,在本地存盤所有獲取的指標資料,并對這些資料進行一些規則整理,用來生成一些聚合資料或者報警資訊,然后可以通過 Grafana 或者其他工具來可視化這些資料,
其作業流程大致如下:
- Prometheus 服務器定期從配置好的 jobs 或者 exporters 中獲取度量資料;或者接收來自推送網關發送過來的度量資料,
- Prometheus 服務器在本地存盤收集到的度量資料,并對這些資料進行聚合;
- 運行已定義好的 alert.rules,記錄新的時間序列或者向告警管理器推送警報,
- 告警管理器根據組態檔,對接收到的警報進行處理,并通過email等途徑發出告警,
- Grafana等圖形工具獲取到監控資料,并以圖形化的方式進行展示,
1.3 Prometheus監控粒度
Prometheus作為監控系統主要在以下各層面實作監控:- 基礎設施層:監控各個主機服務器資源(包括Kubernetes的Node和非Kubernetes的Node),如CPU,記憶體,網路吞吐和帶寬占用,磁盤I/O和磁盤使用等指標,
- 中間件層:監控獨立部署于Kubernetes集群之外的中間件,例如:MySQL、Redis、RabbitMQ、ElasticSearch、Nginx等,
- Kubernetes集群:監控Kubernetes集群本身的關鍵指標
- Kubernetes集群上部署的應用:監控部署在Kubernetes集群上的應用
二 Prometheus相關概念
2.1 資料模型
Prometheus從根本上將所有資料存盤為時間序列:屬于相同度量標準和同一組標注尺寸的時間戳值流,除了存盤的時間序列之外,Prometheus可能會生成臨時派生時間序列作為查詢的結果,- 度量名稱和標簽
- 樣本
- 格式
2.2 度量型別
Prometheus 客戶端庫主要提供Counter、Gauge、Histogram和Summery四種主要的 metric 型別:- Counter(計算器)
- Gauge(測量)
- Histogram(直方圖)
- 觀察桶的累計計數器,暴露為 <basename>_bucket{le=”<upper inclusive bound>”}
- 所有觀察值的總和,暴露為<basename>_sum
- 已觀察到的事件的計數,暴露為<basename>_count(等同于<basename>_bucket{le=”+Inf”})
- Summery:類似于Histogram,Summery樣本觀察(通常是請求持續時間和回應大小),雖然它也提供觀測總數和所有觀測值的總和,但它計算滑動時間窗內的可配置分位數,在獲取資料期間,具有<basename>基本度量標準名稱的Summery會顯示多個時間序列:
- 流動φ分位數(0≤φ≤1)的觀察事件,暴露為<basename>{quantile=”<φ>”}
- 所有觀察值的總和,暴露為<basename>_sum
- 已經觀察到的事件的計數,暴露為<basename>_count
2.3 作業和實體
在Prometheus中,可以獲取資料的端點被稱為實體(instance),通常對應于一個單一的行程,具有相同目的的實體集合(例如為了可伸縮性或可靠性而復制的行程)稱為作業(job),2.4 標簽和時間序列
當Prometheus獲取目標時,它會自動附加一些標簽到所獲取的時間序列中,以識別獲取目標:- job:目標所屬的配置作業名稱,
- instance:<host>:<port>被抓取的目標網址部分,
- up{job=”<job-name>”, instance=”<instance-id>”}:1 如果實體健康,即可達;或者0抓取失敗,
- scrape_duration_seconds{job=”<job-name>”, instance=”<instance-id>”}:抓取的持續時間,
- scrape_samples_post_metric_relabeling{job=”<job-name>”, instance=”<instance-id>”}:應用度量標準重新標記后剩余的樣本數,
- scrape_samples_scraped{job=”<job-name>”, instance=”<instance-id>”}:目標暴露的樣本數量,
三 Prometheus部署
3.1 創建命名空間
[root@k8smaster01 study]# vi monitor-namespace.yaml1 apiVersion: v1 2 kind: Namespace 3 metadata: 4 name: monitoring 5[root@k8smaster01 study]# kubectl create -f monitor-namespace.yaml
3.2 獲取部署檔案
[root@k8smaster01 study]# git clone https://github.com/prometheus/prometheus3.3 創建RBAC
[root@k8smaster01 ~]# cd prometheus/documentation/examples/ [root@k8smaster01 examples]# vi rbac-setup.yml1 apiVersion: rbac.authorization.k8s.io/v1beta1 2 kind: ClusterRole 3 metadata: 4 name: prometheus 5 rules: 6 - apiGroups: [""] 7 resources: 8 - nodes 9 - nodes/proxy 10 - services 11 - endpoints 12 - pods 13 verbs: ["get", "list", "watch"] 14 - apiGroups: 15 - extensions 16 resources: 17 - ingresses 18 verbs: ["get", "list", "watch"] 19 - nonResourceURLs: ["/metrics"] 20 verbs: ["get"] 21 --- 22 apiVersion: v1 23 kind: ServiceAccount 24 metadata: 25 name: prometheus 26 namespace: monitoring #修改命名空間 27 --- 28 apiVersion: rbac.authorization.k8s.io/v1beta1 29 kind: ClusterRoleBinding 30 metadata: 31 name: prometheus 32 roleRef: 33 apiGroup: rbac.authorization.k8s.io 34 kind: ClusterRole 35 name: prometheus 36 subjects: 37 - kind: ServiceAccount 38 name: prometheus 39 namespace: monitoring #修改命名空間 40[root@k8smaster01 examples]# kubectl create -f rbac-setup.yml
3.4 創建Prometheus ConfigMap
[root@k8smaster01 examples]# cat prometheus-kubernetes.yml | grep -v ^$ | grep -v "#" >> prometheus-config.yaml [root@k8smaster01 examples]# vi prometheus-config.yaml1 apiVersion: v1 2 kind: ConfigMap 3 metadata: 4 name: prometheus-server-conf 5 labels: 6 name: prometheus-server-conf 7 namespace: monitoring #修改命名空間 8 data: 9 prometheus.yml: |- 10 global: 11 scrape_interval: 10s 12 evaluation_interval: 10s 13 14 scrape_configs: 15 - job_name: 'kubernetes-apiservers' 16 kubernetes_sd_configs: 17 - role: endpoints 18 scheme: https 19 tls_config: 20 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 21 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 22 relabel_configs: 23 - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] 24 action: keep 25 regex: default;kubernetes;https 26 27 - job_name: 'kubernetes-nodes' 28 scheme: https 29 tls_config: 30 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 31 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 32 kubernetes_sd_configs: 33 - role: node 34 relabel_configs: 35 - action: labelmap 36 regex: __meta_kubernetes_node_label_(.+) 37 - target_label: __address__ 38 replacement: kubernetes.default.svc:443 39 - source_labels: [__meta_kubernetes_node_name] 40 regex: (.+) 41 target_label: __metrics_path__ 42 replacement: /api/v1/nodes/${1}/proxy/metrics 43 44 - job_name: 'kubernetes-cadvisor' 45 scheme: https 46 tls_config: 47 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 48 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token 49 kubernetes_sd_configs: 50 - role: node 51 relabel_configs: 52 - action: labelmap 53 regex: __meta_kubernetes_node_label_(.+) 54 - target_label: __address__ 55 replacement: kubernetes.default.svc:443 56 - source_labels: [__meta_kubernetes_node_name] 57 regex: (.+) 58 target_label: __metrics_path__ 59 replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor 60 61 - job_name: 'kubernetes-service-endpoints' 62 kubernetes_sd_configs: 63 - role: endpoints 64 relabel_configs: 65 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] 66 action: keep 67 regex: true 68 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] 69 action: replace 70 target_label: __scheme__ 71 regex: (https?) 72 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] 73 action: replace 74 target_label: __metrics_path__ 75 regex: (.+) 76 - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] 77 action: replace 78 target_label: __address__ 79 regex: ([^:]+)(?::\d+)?;(\d+) 80 replacement: $1:$2 81 - action: labelmap 82 regex: __meta_kubernetes_service_label_(.+) 83 - source_labels: [__meta_kubernetes_namespace] 84 action: replace 85 target_label: kubernetes_namespace 86 - source_labels: [__meta_kubernetes_service_name] 87 action: replace 88 target_label: kubernetes_name 89 90 - job_name: 'kubernetes-services' 91 metrics_path: /probe 92 params: 93 module: [http_2xx] 94 kubernetes_sd_configs: 95 - role: service 96 relabel_configs: 97 - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] 98 action: keep 99 regex: true 100 - source_labels: [__address__] 101 target_label: __param_target 102 - target_label: __address__ 103 replacement: blackbox-exporter.example.com:9115 104 - source_labels: [__param_target] 105 target_label: instance 106 - action: labelmap 107 regex: __meta_kubernetes_service_label_(.+) 108 - source_labels: [__meta_kubernetes_namespace] 109 target_label: kubernetes_namespace 110 - source_labels: [__meta_kubernetes_service_name] 111 target_label: kubernetes_name 112 113 - job_name: 'kubernetes-ingresses' 114 kubernetes_sd_configs: 115 - role: ingress 116 relabel_configs: 117 - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe] 118 action: keep 119 regex: true 120 - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path] 121 regex: (.+);(.+);(.+) 122 replacement: ${1}://${2}${3} 123 target_label: __param_target 124 - target_label: __address__ 125 replacement: blackbox-exporter.example.com:9115 126 - source_labels: [__param_target] 127 target_label: instance 128 - action: labelmap 129 regex: __meta_kubernetes_ingress_label_(.+) 130 - source_labels: [__meta_kubernetes_namespace] 131 target_label: kubernetes_namespace 132 - source_labels: [__meta_kubernetes_ingress_name] 133 target_label: kubernetes_name 134 135 - job_name: 'kubernetes-pods' 136 kubernetes_sd_configs: 137 - role: pod 138 relabel_configs: 139 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] 140 action: keep 141 regex: true 142 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] 143 action: replace 144 target_label: __metrics_path__ 145 regex: (.+) 146 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] 147 action: replace 148 regex: ([^:]+)(?::\d+)?;(\d+) 149 replacement: $1:$2 150 target_label: __address__ 151 - action: labelmap 152 regex: __meta_kubernetes_pod_label_(.+) 153 - source_labels: [__meta_kubernetes_namespace] 154 action: replace 155 target_label: kubernetes_namespace 156 - source_labels: [__meta_kubernetes_pod_name] 157 action: replace 158 target_label: kubernetes_pod_name 159[root@k8smaster01 examples]# kubectl create -f prometheus-config.yaml
3.5 創建Prometheus Deployment
[root@k8smaster01 examples]# vi prometheus-deployment.yml1 apiVersion: apps/v1beta2 2 kind: Deployment 3 metadata: 4 labels: 5 name: prometheus-deployment 6 name: prometheus-server 7 namespace: monitoring 8 spec: 9 replicas: 1 10 selector: 11 matchLabels: 12 app: prometheus-server 13 template: 14 metadata: 15 labels: 16 app: prometheus-server 17 spec: 18 containers: 19 - name: prometheus-server 20 image: prom/prometheus:v2.14.0 21 command: 22 - "/bin/prometheus" 23 args: 24 - "--config.file=/etc/prometheus/prometheus.yml" 25 - "--storage.tsdb.path=/prometheus/" 26 - "--storage.tsdb.retention=72h" 27 ports: 28 - containerPort: 9090 29 protocol: TCP 30 volumeMounts: 31 - name: prometheus-config-volume 32 mountPath: /etc/prometheus/ 33 - name: prometheus-storage-volume 34 mountPath: /prometheus/ 35 serviceAccountName: prometheus 36 imagePullSecrets: 37 - name: regsecret 38 volumes: 39 - name: prometheus-config-volume 40 configMap: 41 defaultMode: 420 42 name: prometheus-server-conf 43 - name: prometheus-storage-volume 44 emptyDir: {} 45[root@k8smaster01 examples]# kubectl create -f prometheus-deployment.yml 提示:若需要持久存盤Prometheus,可提前創建相應sc和pvc,sc《044.集群存盤-StorageClass》,PVC可參考如下: [root@k8smaster01 examples]# vi prometheus-pvc.yaml
1 apiVersion: v1 2 kind: PersistentVolumeClaim 3 metadata: 4 name: prometheus-pvc 5 namespace: monitoring 6 annotations: 7 volume.beta.kubernetes.io/storage-class: ghstorageclass 8 spec: 9 accessModes: 10 - ReadWriteMany 11 resources: 12 requests: 13 storage: 5Gi[root@k8smaster01 examples]# kubectl create -f prometheus-pvc.yaml 將prometheus-deployment.yml存盤部分修改為:
1 …… 2 - name: prometheus-storage-volume 3 persistentVolumeClaim: 4 claimName: prometheus-pvc 5 …… 6
3.6 創建Prometheus Service
[root@k8smaster01 examples]# vi prometheus-service.yaml apiVersion: v1 kind: Service metadata: labels: app: prometheus-service name: prometheus-service namespace: monitoring spec: type: NodePort selector: app: prometheus-server ports: - port: 9090 targetPort: 9090 nodePort: 30909 [root@k8smaster01 examples]# kubectl create -f prometheus-service.yaml [root@k8smaster01 examples]# kubectl get all -n monitoring NAME READY STATUS RESTARTS AGE pod/prometheus-server-fd5479489-q584s 1/1 Running 0 92s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-service NodePort 10.107.69.147 <none> 9090:30909/TCP 29s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-server 1/1 1 1 92s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-server-fd5479489 1 1 1 92s3.7 測驗Prometheus
瀏覽器直接訪問:http://172.24.8.71:30909/
查看所有Kubernetes集群上的Endpoint通過服務發現的方式自動連接到了Prometheus,
通過圖形化界面查看記憶體,
Prometheus更多配置參考官網:https://prometheus.io/docs/prometheus/latest/configuration/configuration/
四 部署Grafana
4.1 獲取部署檔案
[root@uhost ~]# git clone https://github.com/liukuan73/kubernetes-addons4.2 部署grafana
[root@uhost ~]# cd /root/kubernetes-addons/monitor/prometheus+grafana [root@k8smaster01 prometheus+grafana]# vi grafana.yaml1 --- 2 apiVersion: v1 3 kind: Service 4 metadata: 5 name: grafana 6 namespace: monitoring 7 labels: 8 app: grafana 9 spec: 10 type: NodePort 11 ports: 12 - port: 3000 13 targetPort: 3000 14 nodePort: 30007 15 selector: 16 app: grafana 17 --- 18 apiVersion: extensions/v1beta1 19 kind: Deployment 20 metadata: 21 labels: 22 app: grafana 23 name: grafana 24 namespace: monitoring 25 spec: 26 replicas: 1 27 revisionHistoryLimit: 2 28 template: 29 metadata: 30 labels: 31 app: grafana 32 spec: 33 containers: 34 - name: gragana 35 image: grafana/grafana:5.0.0 36 imagePullPolicy: IfNotPresent 37 ports: 38 - containerPort: 3000 39 volumeMounts: 40 - mountPath: /var 41 name: grafana-storage 42 env: 43 - name: GF_AUTH_BASIC_ENABLED 44 value: "false" 45 - name: GF_AUTH_ANONYMOUS_ENABLED 46 value: "true" 47 - name: GF_AUTH_ANONYMOUS_ORG_ROLE 48 value: Admin 49 - name: GF_SERVER_ROOT_URL 50 # value: /api/v1/proxy/namespaces/default/services/grafana/ 51 value: / 52 readinessProbe: 53 httpGet: 54 path: /login 55 port: 3000 56 volumes: 57 - name: grafana-storage 58 emptyDir: {} 59 nodeSelector: 60 node-role.kubernetes.io/master: "true" 61 # tolerations: 62 # - key: "node-role.kubernetes.io/master" 63 # effect: "NoSchedule" 64[root@k8smaster01 prometheus+grafana]# kubectl label nodes k8smaster01 node-role.kubernetes.io/master=true [root@k8smaster01 prometheus+grafana]# kubectl label nodes k8smaster02 node-role.kubernetes.io/master=true [root@k8smaster01 prometheus+grafana]# kubectl label nodes k8smaster03 node-role.kubernetes.io/master=true [root@k8smaster01 prometheus+grafana]# kubectl taint node --all node-role.kubernetes.io- #允許Master部署應用 [root@k8smaster01 prometheus+grafana]# kubectl create -f grafana.yaml [root@k8smaster01 examples]# kubectl get all -n monitoring
4.3 確認驗證
瀏覽器訪問:http://172.24.8.71:30007,使用默認用戶名admin/admin登錄,
4.4 配置資料源
Configuration ----> Data Sources,
添加新資料源,
如下添加Prometheus資料源,本環境基于《附012.Kubeadm部署高可用Kubernetes》部署的高可用Kubernetes,存在vip:172.24.8.100,也可使用3.7步驟所測驗的Prometheus地址,
保存并測驗是否成功,
4.5 配置Grafana
配置dashboard,本實驗使用162號模板,此Dashboard 模板來展示 Kubernetes 集群的監控資訊,
選擇4.4所添加的Prometheus資料源,用于展示,
4.6 添加用戶
可添加普通用戶,并配置相應角色,
復制登錄鏈接:http://172.24.8.71:30007/invite/hlhkzz5O3dJj94OlHcKiqN8bPrZt40
進入鏈接,設定新用戶密碼并登錄:
4.7 其他設定
建議對時區進行設定,其他Grafana更多配置參考:https://grafana.com/docs/grafana/latest/installation/configuration/
4.8 查看監控
登錄http://172.24.8.71:30007/,即可查看相應Kubernetes監控了,
本方案參考鏈接:
https://www.kubernetes.org.cn/4184.html
https://www.kubernetes.org.cn/3418.html
https://www.jianshu.com/p/c2e549480c50
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/120968.html
標籤:Linux
下一篇:nano 編輯器快速入門
