作者:張延英(老Z),電信系統集成公司山東分公司運維架構師,云原生愛好者,目前專注于云原生運維,
1. 本文簡介
本文源于 KubeSphere 開源社區 8 群里的一個小伙伴 @Jam 提到的 Ectd 監控沒有資料,希望我幫忙看一下,本來我也是沒有啟用 Etcd 監控的,但是既然小伙伴如此信任我提了要求了,那必須安排,所以才有了本文,
經研究發現,KubeSphere 自帶的集群狀態監控中有 Etcd 監控的頁面展示,但是在 KubeSphere3.2.1 版本中,默認配置開啟 Etcd 監控后,集群狀態中的 Etcd 監控頁面確實沒有任何資料,本文將記錄里解決該問題的排障之旅,
本文知識點
- 定級:入門級
- Prometheus-Operator
- KubeSphere 開啟 Etcd 監控
演示服務器配置
| 主機名 | IP | CPU | 記憶體 | 系統盤 | 資料盤 | 用途 |
|---|---|---|---|---|---|---|
| zdeops-master | 192.168.9.9 | 2 | 4 | 40 | 200 | Ansible 運維控制節點 |
| ks-k8s-master-0 | 192.168.9.91 | 8 | 32 | 40 | 200 | KubeSphere/k8s-master/k8s-worker |
| ks-k8s-master-1 | 192.168.9.92 | 8 | 32 | 40 | 200 | KubeSphere/k8s-master/k8s-worker |
| ks-k8s-master-2 | 192.168.9.93 | 8 | 32 | 40 | 200 | KubeSphere/k8s-master/k8s-worker |
| glusterfs-node-0 | 192.168.9.95 | 4 | 8 | 40 | 200 | GlusterFS/ElasticSearch |
| glusterfs-node-1 | 192.168.9.96 | 4 | 8 | 40 | 200 | GlusterFS/ElasticSearch |
| glusterfs-node-2 | 192.168.9.97 | 4 | 8 | 40 | 200 | GlusterFS/ElasticSearch |
2. KubeSphere CRD 開啟 Etcd 監控
-
編輯 CRD 中的 ks-installer 的 YAML 組態檔,
在 YAML 檔案中,搜索 etcd,并將 monitoring 的 false 改為 true,
etcd: endpointIps: '192.168.9.91,192.168.9.92,192.168.9.93' monitoring: true port: 2379 tlsEnable: true -
所有配置完成后,點擊右下角的確定,保存配置,
-
在 kubectl 中執行以下命令檢查安裝程序,
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f結果不做展示,
-
驗證安裝結果,
登錄控制臺,平臺管理->集群管理->監控告警->集群狀態,檢查 etcd 監控標簽頁是否存在,如果存在,表明監控開啟成功,
-
雖然前面配置開啟了,但是此時監控資料并不存在,同時,檢查 prometheus-k8s 的 Pod 會發現如下報錯,

-
接下來我們會講解原因和配置方法,
3. 問題解決程序記錄
-
查找官方論壇,關鍵詞使用 etcd 找到了以下一篇看著比較接近的檔案,打開來看看,
etcd 使用自簽名證書,prometheus 報錯未知機構簽發 #2.11

但是檔案里并沒有詳細的問題解決程序,看的我是一頭霧水,但是獲得了很重要的配置步驟,

-
根據上面 get 到的關鍵點 1,用外部 etcd 的證書生成 secret,
這條命令就是為了根據 etcd 的 cert 生成一個 secret 配置,
# kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/ssl/etcd/ssl/ca.pem --from-file=etcd-client.crt=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk.pem --from-file=etcd-client.key=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk-key.pem先不急,先看看 secret 是否存在,如果不存在再根據命令生成,
[root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system NAME TYPE DATA AGE additional-scrape-configs Opaque 1 9d alertmanager-main Opaque 1 9d alertmanager-main-generated Opaque 1 9d alertmanager-main-tls-assets Opaque 0 9d alertmanager-main-token-7b9xc kubernetes.io/service-account-token 3 9d default-token-tnxh7 kubernetes.io/service-account-token 3 9d kube-etcd-client-certs Opaque 3 9d kube-state-metrics-token-czbrg kubernetes.io/service-account-token 3 9d node-exporter-token-qrhl7 kubernetes.io/service-account-token 3 9d notification-manager-sa-token-lc6z4 kubernetes.io/service-account-token 3 9d notification-manager-webhook-server-cert kubernetes.io/tls 2 9d prometheus-k8s Opaque 1 9d prometheus-k8s-tls-assets Opaque 0 9d prometheus-k8s-token-7fk45 kubernetes.io/service-account-token 3 9d prometheus-operator-token-wlmcf kubernetes.io/service-account-token 3 9d sh.helm.release.v1.notification-manager.v1 helm.sh/release.v1 1 9d居然發現了 kube-etcd-client-certs,
再看看具體內容 , 發現該有的都有,一個不少,
[root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system kube-etcd-client-certs -o yaml apiVersion: v1 data: etcd-client-ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM5VENDQWQyZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFTTVJBd0RnWURWUVFERXdkbGRHTmsKTFdOaE1CNFhEVEl5TURRd09URTBNekl5TjFvWERUTXlNRFF3TmpFME16SXlOMW93RWpFUU1BNEdBMVVFQXhNSApaWFJqWkMxallUQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQU53SnpobDFPSVpyCkZYOUNsbER3czVVdnA5NkxHOHpxWkZGbmRGZVBlb1RrTXlFSVpESFRQM0lYSFhzaFFPNjF3VlpVd3VvMmJoeTcKdTBLbEFUcXZmZ1ZJTWE2MlpKTFVNcGwrendvMnFDcWpzbHd1b3RacHArTHVYaldYRTFOeWcwWi9MRmd3NDArOQpGSDV3Y2VWK0FhNjhETElKQWw4a0l6VktScVgraENjZGVTOFRWbDNVeS9PMWRkRFJGODExYzB6VTNteEF2Z0h5CmlxOFF0S2dBQ3E0L294N3RPRFRZUVNlVVdOa25tZTBLMituWmR6M1RveHpUamdIZ2FDVlFXVW5nNFNyMVlSYWwKV2owTGlET2tWb2l3TlFrSVd6ZnBrVXUrM2RJUGNPL29Wc0E3eEJLenhGdEp2dmthTGU1ZDd6a3p2d2xVdE1NYgp2NzNzNERqNU0yc0NBd0VBQWFOV01GUXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CCkFmOHdIUVlEVlIwT0JCWUVGREh3WUNYcW90OG9oYWNZa1FBaHMrRjNSWW5tTUJJR0ExVWRFUVFMTUFtQ0IyVjAKWTJRdFkyRXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBS3l3SEJpVEkxYjExQjNrTDJNZFN0WGRaZ2ZNT05obApuZ1QyUjVuQWZISUVTZVRGNnpFbWh6QnBRb3ozMm1GbG1VdlRKMjdhdVk4UGh2cC9pT0pKbWZIZnY3RWcyYVpJCmlkK2w5YTJoQXFrMnVnNmV4NFpjUzgvOUxyTUV3SlhDOGZqeTA0OWdLQjIyMXFuSFh0Q3VyNE95MUFyMHBiUUwKaEQ4T0lpaExBbHpZNnIvQTlzVDYrNU12cy80OE5LeWN0Sy9KYzFhbVVQK0tnWXlPWDNWNXVsM096MFpIT2ptRAo5akIrdlNHUHM5REdrdnJEeFp4SDRIM0NhaTF5cHBlc29YVFZndS81UTFjcVlvdGNJalZpekx5eVNjZ1EzQ2ZqCmVvdnk3NW8vZUdiRmpYSmJQV0NncDhYV2RJWkVmcmNXMXZtWjZPZDVmcXIwblY5QVExekhueWs9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K etcd-client.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQrVENDQXVHZ0F3SUJBZ0lJT2Y3Ky90T3NYa013RFFZSktvWklodmNOQVFFTEJRQXdFakVRTUE0R0ExVUUKQXhNSFpYUmpaQzFqWVRBZUZ3MHlNakEwTURreE5ETXlNamRhRncwek1qQTBNRFl4TkRNeU16QmFNQ1F4SWpBZwpCZ05WQkFNVEdXVjBZMlF0Ym05a1pTMXJjeTFyT0hNdGJXRnpkR1Z5TFRBd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDN0NvS1dWKzJKeXRVRTc2VnhvU3lOZzZXOU4yRUlxaTA5UkQ3TThTYUMKZzNHSFZJcXRjWUZzWEhNSHNGeGkyc0ltRWdTblRQMU1sS2Y2Q2xoZ1llSUJqbHJjdWVGNzNDUW45dkw3bXdqMwpJVzV0cUJ4Z1BwRmpvc1FQcGs5eU5XWmpEVGJsbHJTbkZjTXNKekFEOXNIZjdiRWUrQTZJcnJDUnhLZGJWaVY1CnFveFR5THhJenF4c2NDMlMwclJCYk5YbHAzZFU1QStldGZhOUYxUFNCeDQxdmk1MXcvTnBVRkNOa2ZuaWhyZnUKcUVoYW0zNUdCbFYrRzd4ZENSVGt6K3h3V3IwdnhMUitueGZ5MElHL2hyYlIxL0RLbHo5Y3BnbHhTWUg5S3ZvbgpzVXRpemhQYXVsRFZIN2NFdTJGOWZuTHZlK2hZemt3c3hhS1RsQTFlQ2VEeEFnTUJBQUdqZ2dFL01JSUJPekFPCkJnTlZIUThCQWY4RUJBTUNCYUF3SFFZRFZSMGxCQll3RkFZSUt3WUJCUVVIQXdFR0NDc0dBUVVGQndNQ01Bd0cKQTFVZEV3RUIvd1FDTUFBd0h3WURWUjBqQkJnd0ZvQVVNZkJnSmVxaTN5aUZweGlSQUNHejRYZEZpZVl3Z2RvRwpBMVVkRVFTQjBqQ0J6NElFWlhSalpJSVFaWFJqWkM1cmRXSmxMWE41YzNSbGJZSVVaWFJqWkM1cmRXSmxMWE41CmMzUmxiUzV6ZG1PQ0ltVjBZMlF1YTNWaVpTMXplWE4wWlcwdWMzWmpMbU5zZFhOMFpYSXViRzlqWVd5Q0QydHoKTFdzNGN5MXRZWE4wWlhJdE1JSVBhM010YXpoekxXMWhjM1JsY2kweGdnOXJjeTFyT0hNdGJXRnpkR1Z5TFRLQwpFMnhpTG10MVltVnpjR2hsY21VdWJHOWpZV3lDQ1d4dlkyRnNhRzl6ZEljRWZ3QUFBWWNRQUFBQUFBQUFBQUFBCkFBQUFBQUFBQVljRXdLZ0pXNGNFd0tnSlhJY0V3S2dKWFRBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQXZOR2gKdHdlTG1QS2F2YjVhOFoxU2sxQkFZdzZ6dEdHTnJGdzg2M1dKRVBEblFFa3duOFhJNGh4SU82UVV3eHJic1MweAp0YUg2ZmRKeFZZcEN5UXVrV3JldHpkZ05zMTVWYnlNdUlqVkJRMytGZnBRaDB5T25tUXlmRWc2UWZNdU5IWGpJCjZCdVp5M0p0S0tFZGZmUFh4U3VlMFV2TG5idlN6U0tVQkRIcy9nNVV0Q3cyeHVIVFU5bFdoQXY2dm1WQ08yQW4KZmc2MjAzMUpUNG9ya2F6c1hmdENOTlZqUmdIZ2pjQ0NDZkMwY1hSRVZTVFZqZUFaZU40ZUdtYWlRcFdEUWkxbApUVWZJMlE0dGRySlFsOXk0dDNKRDgrSmFLT0VJWkt3NWVWaTc3cUZobWR1MmFkRThkODc0aVBnN2ZEYmVFS2tWCkYxVWVKb3NKOFN3Z1psWTRpQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K etcd-client.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3dJQkFBS0NBUUVBdXdxQ2xsZnRpY3JWQk8rbGNhRXNqWU9sdlRkaENLb3RQVVErelBFbWdvTnhoMVNLCnJYR0JiRnh6QjdCY1l0ckNKaElFcDB6OVRKU24rZ3BZWUdIaUFZNWEzTG5oZTl3a0ovYnkrNXNJOXlGdWJhZ2MKWUQ2Ulk2TEVENlpQY2pWbVl3MDI1WmEwcHhYRExDY3dBL2JCMysyeEh2Z09pSzZ3a2NTblcxWWxlYXFNVThpOApTTTZzYkhBdGt0SzBRV3pWNWFkM1ZPUVBuclgydlJkVDBnY2VOYjR1ZGNQemFWQlFqWkg1NG9hMzdxaElXcHQrClJnWlZmaHU4WFFrVTVNL3NjRnE5TDhTMGZwOFg4dENCdjRhMjBkZnd5cGMvWEtZSmNVbUIvU3I2SjdGTFlzNFQKMnJwUTFSKzNCTHRoZlg1eTczdm9XTTVNTE1XaWs1UU5YZ25nOFFJREFRQUJBb0lCQVFDamQ1c0x4SXNRMjFsegpOL0xUTFhhZnM0ZmRxQkhCSGVIdDRzQTBJeXB4OUdqN1NwTHM1UCtrOGVPQ3U4cnlocGdaNTdOemVDRUVsZ044Cnp4L1FGSndPbWhpbFFqdGtJZERqc0x0SjFJUndZQ0ovNmVYcTQ2UHpmV1IyL1BZQUxkVnZDalNKVVQ1UHJRQm4KalZRMGtxdDhodU0rMnJMeEdDT3ZNanpGNGJOYzhZZGFSOTI0c095Y1Q2UzI1Vzg3TklQWnVqY3VBUXIzaEE2bwpUbEdmVU44Q0hSM21jVnBIbEJ1NDhEeEpYaml2MkVKZTRHSmN2L0NWQTVqVGNNNlNoTjJuSGN3OGpHYVg0bGJtCjJYaktKemE0RStON3hGRXBRVEJRMUNqRGM1cndKY0tKUm9IQkxFUGtJVE5LWnNWSDlmK0tuNmpjQWtmOTZoWVkKKzY1TTMza1ZBb0dCQU5GMVdRNG4wcTE0YlpSY1FkbnFoWDdYT0pFbDBtOUZuYVhOTjNsb0M1SnNneGxkbXh5bgpRV1IvZkJVQnRaTUc5MmgzdTBheWUyaWdZdGtSc1pDV0wwL2VicmJGMWlmYXozR2Z1b3lSZWozMHVsRDJYY3phCmQzSEUwdVpTSVQrUkFSTTF1VjJUczVUSHJqUStIT3Z5cEpFQjFlSnY1L21LWmRpUTRtMzBGMDUzQW9HQkFPU1oKL21NWXd4V1Y4SFRtaENyNGsycDJQd1NLTHVrajhZaVJQZHhVSFpXWXdRTGFFRU1uSVVnUFJBSnFHc1VtWng5TApacDVjYXp3bW9ldDI0cXpGeVhkemFUMi96VGc1Rjg1d0FzRDl1WEZSWWYzc01OZ0VkazJkSmc1VGZmcWcrNlRQCjBla2VtWG9vSTYxTTc3VVFjWVdSVCtPWUtFd1V3dzZMcjJ3bGFKM1hBb0dBTzF3alVlU3RTeVllLy9XcFgrV2IKMFplUzIyZTVuSGxCTlRUVWJONjBzTmw1eWQyQ1VQdUJoOGF0VnBLMmI2V0F4aVZ3ZUplcWE3dFFhQzRnZ1ZaZQpzQ2JjZjRYUHJGblJnbVQvREVsS09IYTd1cWduYXgvYXkrNDR5cmNwM3dic0pCS01wdDF0L2xNY3BvZVgwTEppCk93b25JRllRaXVMUy9DNExUWmZvWnY4Q2dZQnIrMlhUajRYUE0zVlM4dlJwaStPdWZVNkZLWFRCUWU0OHNVYkUKUmFOMzM2RUVaTmNic1djaUw3dlRYQ1ZyRFJuWENYbmV3ZzhSYWJwQWpIYkVYK1VybklPUTNJSG0xZWt0NVhFWAprb0kvU2M3ODc4MmVySFRwY3ByZ1Y0WUJsbnRudlpjTkJCeEJQS2Fsbk5yNTcxdUFXVVNnWUdaZ2tjb1ZtOXZ3ClBMZHZId0tCZ0dYS2l5Y29zZzFuZHhkclQ0S05SSmdWZUd1M3ZqSjg4N0tQbThpbHB4alF3ekM2cjNRZDhYUWIKbGdWUnFBcG5mTnA1amM0WUZ5c2RvKzFhc2JrRTloczVUZk5sVUVtSWdvR3dxVnlmUkRiOEl0TklRQTBXZDZLdQpONy81UkZYRVlkUFR4YVhpNjl0cTZnRXp6cThTcnQyUUY5eEk5eG1EV0U5bGVEeDUwd1dZCi0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg== kind: Secret metadata: creationTimestamp: "2022-04-09T14:34:37Z" name: kube-etcd-client-certs namespace: kubesphere-monitoring-system resourceVersion: "856" uid: c74b122b-438d-4e40-8e1a-1b9445d4b3d5 type: Opaque看到這說明 secrets 暫時看著沒問題,最起碼資源配置存在,我們先繼續往后排查,不行的話再回來,
在寫檔案的程序中,Jam 反饋他的環境并沒有這個 secrets 資源配置,可以按照上面的命令生成一個 secrets,注意檢查 Etcd 密鑰的實際路徑,
-
根據上面 get 到的關鍵點 2,**用外部 etcd 各節點的 ip 生成 endpoint **,
先看看 prometheus-endpointsEtcd.yaml 檔案是個啥,
prometheus-endpointsEtcd.yaml
apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: etcd name: etcd namespace: kube-system subsets: - addresses: - ip: 127.0.0.1 ports: - name: metrics port: 2379 protocol: TCP再看看我們的 kubernetes 中有沒有 Endpoints 資源,
[root@ks-k8s-master-0 ~]# kubectl get endpoints -n kubesphere-monitoring-system NAME ENDPOINTS AGE alertmanager-main 10.233.116.11:9093,10.233.117.10:9093,10.233.87.9:9093 9d alertmanager-operated 10.233.116.11:9094,10.233.117.10:9094,10.233.87.9:9094 + 6 more... 9d kube-state-metrics 10.233.87.8:8443,10.233.87.8:9443 9d node-exporter 192.168.9.91:9100,192.168.9.92:9100,192.168.9.93:9100 9d notification-manager-controller-metrics 10.233.116.8:8443 9d notification-manager-svc 10.233.116.13:19093,10.233.116.14:19093 9d notification-manager-webhook 10.233.116.8:9443 9d prometheus-k8s 10.233.117.43:9090,10.233.87.160:9090 9d prometheus-operated 10.233.117.43:9090,10.233.87.160:9090 9d prometheus-operator 10.233.116.7:8443 9d thanos-ruler-operated 10.233.117.18:10902,10.233.87.17:10902,10.233.117.18:10901 + 1 more... 8d居然沒有跟 Etcd 相關的 Endpoints,需要新建?
正要根據組態檔重新創建的時候,突然發現了自己的錯誤,慣性思維,被上面的命令帶偏了,用錯了命令空間,組態檔實體的命令空間是 kube-system,
再次在 kube-system 中查詢,查詢到了我們要的資源配置,
[root@ks-k8s-master-0 ~]# kubectl get endpoints -n kube-system NAME ENDPOINTS AGE coredns 10.233.117.2:53,10.233.117.3:53,10.233.117.2:53 + 3 more... 9d etcd 192.168.9.91:2379,192.168.9.92:2379,192.168.9.93:2379 3d20h kube-controller-manager-svc 192.168.9.91:10257,192.168.9.92:10257,192.168.9.93:10257 9d kube-scheduler-svc 192.168.9.91:10259,192.168.9.92:10259,192.168.9.93:10259 9d kubelet 192.168.9.91:10250,192.168.9.92:10250,192.168.9.93:10250 + 6 more... 9d openebs.io-local <none> 9d看看組態檔內容,
[root@ks-k8s-master-0 ~]# kubectl get endpoints etcd -n kube-system -o yaml apiVersion: v1 kind: Endpoints metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"subsets":[{"addresses":[{"ip":"192.168.9.91"},{"ip":"192.168.9.92"},{"ip":"192.168.9.93"}],"ports":[{"name":"metrics","port":2379,"protocol":"TCP"}]}]} creationTimestamp: "2022-04-15T08:24:18Z" labels: k8s-app: etcd name: etcd namespace: kube-system resourceVersion: "1559305" uid: c6d0ee2c-a228-4ea8-8ef1-73b387030950 subsets: - addresses: - ip: 192.168.9.91 - ip: 192.168.9.92 - ip: 192.168.9.93 ports: - name: metrics port: 2379 protocol: TCP組態檔看著也正確,那我們繼續往下查,
-
根據上面 get 到的關鍵點 3,生成利用上述 endpoint 的 etcd service
先看看 prometheus-serviceEtcd.yaml 檔案是個啥,
prometheus-serviceEtcd.yaml
apiVersion: v1 kind: Service metadata: labels: k8s-app: etcd name: etcd namespace: kube-system spec: clusterIP: None ports: - name: metrics port: 2379 targetPort: 2379 selector: null再看看我們的 kubernetes 中有沒有 Service 資源,
[root@ks-k8s-master-0 ~]# kubectl get service -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 9d etcd ClusterIP None <none> 2379/TCP 3d21h kube-controller-manager-svc ClusterIP None <none> 10257/TCP 9d kube-scheduler-svc ClusterIP None <none> 10259/TCP 9d kubelet ClusterIP None <none> 10250/TCP,10255/TCP,4194/TCP 9d看看資源配置詳細內容,
[root@ks-k8s-master-0 ~]# kubectl get service etcd -n kube-system -o yaml apiVersion: v1 kind: Service metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"spec":{"clusterIP":"None","ports":[{"name":"metrics","port":2379,"targetPort":2379}],"selector":null}} creationTimestamp: "2022-04-15T08:24:18Z" labels: k8s-app: etcd name: etcd namespace: kube-system resourceVersion: "1559307" uid: cfd92ee5-dbd1-4ee4-a4c4-d683ca7a41ea spec: clusterIP: None clusterIPs: - None ipFamilies: - IPv4 - IPv6 ipFamilyPolicy: RequireDualStack ports: - name: metrics port: 2379 protocol: TCP targetPort: 2379 sessionAffinity: None type: ClusterIP status: loadBalancer: {}組態檔看著正確,那我們繼續往下查,
-
根據上面 get 到的關鍵點 4,**生成用于抓取 Etcd 資料的 ServiceMonitor **
先看看 prometheus-serviceMonitorEtcd.yaml 檔案是個啥,
prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: etcd name: etcd namespace: kubesphere-monitoring-system spec: endpoints: - interval: 1m port: metrics scheme: https tlsConfig: caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key serverName: etcd.kube-system.svc.cluster.local jobLabel: k8s-app namespaceSelector: matchNames: - kube-system selector: matchLabels: k8s-app: etcd再看看我們的 Kubernetes 中有沒有 ServiceMonitor 資源,
[root@ks-k8s-master-0 ~]# kubectl get servicemonitor -n kubesphere-monitoring-system NAME AGE alertmanager 9d coredns 9d devops-jenkins 8d etcd 3d21h kube-apiserver 9d kube-controller-manager 9d kube-scheduler 9d kube-state-metrics 9d kubelet 9d node-exporter 9d prometheus 9d prometheus-operator 9d s2i-operator 8d看看資源配置詳細內容,
[root@ks-k8s-master-0 ~]# kubectl get servicemonitor etcd -n kubesphere-monitoring-system -o yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"app.kubernetes.io/vendor":"kubesphere","k8s-app":"etcd"},"name":"etcd","namespace":"kubesphere-monitoring-system"},"spec":{"endpoints":[{"interval":"1m","port":"metrics","scheme":"https","tlsConfig":{"caFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt","certFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt","keyFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key"}}],"jobLabel":"k8s-app","namespaceSelector":{"matchNames":["kube-system"]},"selector":{"matchLabels":{"k8s-app":"etcd"}}}} creationTimestamp: "2022-04-15T08:24:18Z" generation: 1 labels: app.kubernetes.io/vendor: kubesphere k8s-app: etcd name: etcd namespace: kubesphere-monitoring-system resourceVersion: "1559308" uid: 386f16c0-74cd-4dbf-aa35-cc227062c881 spec: endpoints: - interval: 1m port: metrics scheme: https tlsConfig: caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key jobLabel: k8s-app namespaceSelector: matchNames: - kube-system selector: matchLabels: k8s-app: etcd組態檔看著也正確,那我們繼續往下查,
-
查到現在我發現自己能查的都查了,該有的配置都有,那為啥還有問題呢,參考檔案中也沒有更詳細的說明了,
-
這時我發現我忘記了一點,還有沒看過 Pod 的日志,趕緊去看看,
在集群管理->應用負載->作業負載->有狀態副本集, 選擇 kubesphere-monitoring-system 專案,找到 prometheus-k8s,

點擊 prometheus-k8s,進入詳細頁面,點擊容器組中的 prometheus-k8s-0 容器,

點擊按鈕容器日志,彈出容器日志頁面,

這時會發現有大量的報錯日志

詳細報錯日志,
level=error ts=2022-04-19T06:49:08.169Z caller=manager.go:188 component="scrape manager" msg="error creating nescrape pool" err="error creating HTTP client: unable to load specified CA cert /etc/prometheus/secrets/kube-etcclient-certs/etcd-client-ca.crt: open /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt: no sucfile or directory" scrape_pool=kubesphere-monitoring-system/etcd/0看到這我們發現了問題的原因,找不到檔案 /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt,
打開 Pod 的終端,進入系統里驗證,

結果顯示,整個檔案夾都不存在,

再去看一眼 Pod 的配置,是否有 secrets 的配置,


看到這,實錘了,我認為我發現了問題的根本,也想到了問題的解決辦法,那就是 Pod 中沒有掛載 kube-etcd-client-certs 這個 secrets,那我們想辦法掛載上,問題就能解決了???
在控制臺中,找到我們的有狀態副本集 prometheus-k8s,點擊更多操作->編輯設定,

在存盤卷中,掛載配置字典或保密字典,

選擇保密字典,只讀掛載 kube-etcd-client-certs 到 /etc/prometheus/secrets/kube-etcd-client-certs,最終確定,

點擊確定后,你會發現 Pod 開始重建,我以為這就可以了等著看效果就完了,結果,,,

待 Pod 重建成功后,我以為一切都被我掌控了,肯定沒有問題了,結果我發現,改過的配置又變回了原來的樣子,Pod 中根本沒有掛載我們想要的 secrets,配置跟原來一樣,
反復操作三次后,我崩潰了,幡然醒悟,我改的方法不對,這個是由 prometheus-operator,單獨修改不會配置不會生效的,
-
prometheus-operator,這玩意我以前沒玩過,不了解技術細節,咋辦,,,繼續百度,
-
百度,
關鍵字 prometheus operator etcd,

第一名看了一眼,沒啥幫助,不展示了,各位有興趣的可以自己看
2 分鐘后打開了排名第二的文章,文章思路程序比較清晰,迅速下翻,看到第三點找到了我要的方法,
-
細節我也不知道,但是我們的目的是為了掛載 secrets,既然這里提到了,那我們就去試試,
[root@ks-k8s-master-0 ~]# kubectl edit prometheuses -n kubesphere-monitoring-system# Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | ....檔案內容類似上面的,我們搜索 secret,出現報錯 E486: Pattern not found: secret,
說明默認配置里沒有 secret 的配置,我們自己添加 , 在檔案 78 行左右加入,
secrets: - kube-etcd-client-certs最終效果類似 (為了看的清楚,我加了行號):
71 securityContext: 72 fsGroup: 0 73 runAsNonRoot: false 74 runAsUser: 0 75 serviceAccountName: prometheus-k8s 76 serviceMonitorNamespaceSelector: {} 77 serviceMonitorSelector: {} 78 secrets: 79 - kube-etcd-client-certs 80 storage: 81 volumeClaimTemplate: 82 spec: 83 resources: 84 requests: 85 storage: 20Gi 86 tolerations: 87 - effect: NoSchedule 88 key: dedicated 89 operator: Equal 90 value: monitoring 91 version: v2.26.0保存退出,
我們再去查看有狀態副本集的配置,會發現多了一個保密字典的配置,

再去看 Pod 的具體配置,會發現 Pod 的配置也多了保密字典的配置

再看看 Pod 的日子,發現也沒有 error 了

感覺問題都解決了,那我們去看看監控是否有圖形了(還有點小期待呢),
-
揭曉最終答案的時刻,
先來一張全景圖,

再來幾張區域高清圖(后補的,剛開始沒抓),







-
至此,問題初步解決,不過還有很多細節需要我們在后面深入學習了解更深的底層知識,
4. Prometheus-Operator 監控 Etcd 的技術關鍵點
技術關鍵點
-
Etcd 的安裝方式
KubeSphere 安裝的 Etcd 為二進制方式,驗證方法如下,
## 看行程確認是二進制方式 [root@ks-k8s-master-0 ~]# ps -ef | grep etcd root 1158 56409 0 15:43 pts/0 00:00:00 grep --color=auto etcd root 15301 1 6 Apr09 ? 15:35:08 /usr/local/bin/etcd root 17247 17219 13 Apr09 ? 1-06:55:24 kube-apiserver --advertise-address=192.168.9.91 --allow-privileged=true --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0.pem --etcd-keyfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0-key.pem --etcd-servers=https://192.168.9.91:2379,https://192.168.9.92:2379,https://192.168.9.93:2379 --feature-gates=CSIStorageCapacity=true,RotateKubeletServerCertificate=true,TTLAfterFinished=true,ExpandCSIVolumes=true --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.233.0.0/18 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key ## 看 ssl 密鑰檔案有哪些 [root@ks-k8s-master-0 ~]# ll /etc/ssl/etcd/ssl/ total 80 -rw------- 1 root root 1675 Apr 9 22:32 admin-ks-k8s-master-0-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 admin-ks-k8s-master-0.pem -rw------- 1 root root 1679 Apr 9 22:32 admin-ks-k8s-master-1-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 admin-ks-k8s-master-1.pem -rw------- 1 root root 1679 Apr 9 22:32 admin-ks-k8s-master-2-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 admin-ks-k8s-master-2.pem -rw------- 1 root root 1675 Apr 9 22:32 ca-key.pem -rw-r--r-- 1 root root 1086 Apr 9 22:32 ca.pem -rw------- 1 root root 1679 Apr 9 22:32 member-ks-k8s-master-0-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 member-ks-k8s-master-0.pem -rw------- 1 root root 1675 Apr 9 22:32 member-ks-k8s-master-1-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 member-ks-k8s-master-1.pem -rw------- 1 root root 1675 Apr 9 22:32 member-ks-k8s-master-2-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 member-ks-k8s-master-2.pem -rw------- 1 root root 1675 Apr 9 22:32 node-ks-k8s-master-0-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 node-ks-k8s-master-0.pem -rw------- 1 root root 1679 Apr 9 22:32 node-ks-k8s-master-1-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 node-ks-k8s-master-1.pem -rw------- 1 root root 1679 Apr 9 22:32 node-ks-k8s-master-2-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 node-ks-k8s-master-2.pem -
Prometheus-Operator 監控 Etcd 的配置
- 用外部 Etcd 的證書生成 secret
- 用外部 Etcd 各節點的 ip 生成 endpoint
- 生成利用 Endpoint 的 etcd service
- 生成用于抓取 Etcd 資料的 ServiceMonitor
需要深入學習的地方(占位,待補充)
- Prometheus-Operator 的實作原理和技術細節,
- KubeSphere 對于 Prometheus-Operator 的配置程序,
5. 總結
本文根據運維實際需求,介紹了開啟 Etcd 監控的正確姿勢,同時也詳細介紹了解決該問題的排障流程,有需要開啟 KubeSphere 3.2.1 版本的 Etcd 監控功能的小伙伴,可以參考本文進行配置,
參考檔案
- etcd 使用自簽名證書,prometheus 報錯未知機構簽發 #2.11
- https://www.cnblogs.com/lvcisco/p/12575608.html?ivk_sa=1024320u
Get 檔案
- Github https://github.com/devops/z-notes
- Gitee https://gitee.com/zdevops/z-notes
Get 代碼
- Github https://github.com/devops/ansible-zdevops
- Gitee https://gitee.com/zdevops/ansible-zdevops
B 站
- 老Z 手記
本文由博客一文多發平臺 OpenWrite 發布!
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/463509.html
標籤:其他
上一篇:HCNP Routing&Switching之埠隔離
下一篇:Java基礎——介面組成更新
