?一文匯總Etcd資料庫幾種常見故障及排查思路?
文章目錄
- ?一文匯總Etcd資料庫幾種常見故障及排查思路?
- 1.etcd重建節點后無法加入集群
- 2.etcd集群初始化集群設定失敗
- 3.etcd報錯URL address does not have the form
- 4.etcd新節點加入集群報錯
- 5.etcd集群新增的節點IP不存在于證書檔案導致無法加入集群
1.etcd重建節點后無法加入集群
現象: 在集群中的一臺etcd節點,由于某種原因踢出了集群,現在需要重新加入集群
報錯內容如下
8月 27 16:40:17 binary-k8s-node1 etcd[30462]: {"level":"fatal","ts":"2021-08-27T16:40:17.603+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"open /data/etcd/ssl/server.pem: no such file or directory","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}

這是由于當前etcd節點已經加入過某個etcd集群導致的,再次嘗試加入新的集群就會報錯,解決問題的方法就是將該節點在原有集群里面踢出去或者將該節點的ETCD_INITIAL_CLUSTER_STATE引數設定成"existing"即可解決
2.etcd集群初始化集群設定失敗
報錯內容如下
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:117","msg":"configuring peer listeners","listen-peer-urls":["http://192.168.20.11:2380"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:465","msg":"starting with peer TLS","tls-info":"cert = /data/etcd/ssl/server.pem, key = /data/etcd/ssl/server-key.pem, trusted-ca = /data/etcd/ssl/ca.pem, client-cert-auth = false, crl-file = ","cipher-suites":[]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:502","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","peer-url":"http://192.168.20.11:2380"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.960+0800","caller":"embed/etcd.go:127","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:2379","http://192.168.20.11:2379"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:614","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","client-url":"http://127.0.0.1:2379"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:614","msg":"scheme is HTTP while key and cert files are present; ignoring key and cert files","client-url":"http://192.168.20.11:2379"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:360","msg":"closing etcd server","name":"etcd-4","data-dir":"/data/etcd/data","advertise-peer-urls":["http://192.168.20.11:2380"],"advertise-client-urls":["http://127.0.0.1:2379","http://192.168.20.11:2379"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"info","ts":"2021-09-10T11:01:06.961+0800","caller":"embed/etcd.go:364","msg":"closed etcd server","name":"etcd-4","data-dir":"/data/etcd/data","advertise-peer-urls":["http://192.168.20.11:2380"],"advertise-client-urls":["http://127.0.0.1:2379","http://192.168.20.11:2379"]}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"warn","ts":"2021-09-10T11:01:06.961+0800","caller":"etcdmain/etcd.go:176","msg":"failed to start etcd","error":"error setting up initial cluster: URL address does not have the form \"host:port\": http://ip:192.168.20.11:2380"}
9月 10 11:01:06 binary-k8s-master2 etcd[5213]: {"level":"fatal","ts":"2021-09-10T11:01:06.961+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"error setting up initial cluster: URL address does not have the form \"host:port\": http://ip:192.168.20.11:2380","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
在報錯中看到error setting up initial cluster這個關鍵資訊,就說明肯定是由于組態檔寫的不對導致的,仔細檢查組態檔語法就能找到問題所在

3.etcd報錯URL address does not have the form
報錯內容如下
9月 10 11:06:04 binary-k8s-master2 etcd[10971]: {"level":"warn","ts":"2021-09-10T11:06:04.981+0800","caller":"etcdmain/etcd.go:176","msg":"failed to start etcd","error":"error setting up initial cluster: URL address does not have the form \"host:port\": https://ip:192.168.20.11:2380"}
9月 10 11:06:04 binary-k8s-master2 etcd[10971]: {"level":"fatal","ts":"2021-09-10T11:06:04.981+0800","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"error setting up initial cluster: URL address does not have the form \"host:port\": https://ip:192.168.20.11:2380","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.9/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
仔細查看日志,根據提示說url找不到,在看后面的具體內容,發現還是組態檔寫的不太對吧,https://后面居然跟了個ip單詞,問題找到了

解決方法,將組態檔https://后面的ip單詞去掉去掉
果然有問題,去掉即可

服務啟動成功

4.etcd新節點加入集群報錯
9月 10 13:12:42 binary-k8s-master1 etcd[8832]: {"level":"warn","ts":"2021-09-10T13:12:42.386+0800","caller":"rafthttp/stream.go:682","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"aae107adddd0d3d8","remote-peer-cluster-id":"2d72d2986bd93bc7","local-member-id":"51ae3f86f3783687","local-member-cluster-id":"20b119eb5f91aa4b","error":"cluster ID mismatch"}
9月 10 13:12:42 binary-k8s-master1 etcd[8832]: {"level":"warn","ts":"2021-09-10T13:12:42.386+0800","caller":"rafthttp/stream.go:682","msg":"request sent was ignored by remote peer due to cluster ID mismatch","remote-peer-id":"aae107adddd0d3d8","remote-peer-cluster-id":"2d72d2986bd93bc7","local-member-id":"51ae3f86f3783687","local-member-cluster-id":"20b119eb5f91aa4b","error":"cluster ID mismatch"}
9月 10 13:12:42 binary-k8s-master1 etcd[8832]: request sent was ignored (cluster ID mismatch: remote[aae107adddd0d3d8]=2d72d2986bd93bc7, local=20b119eb5f91aa4b)

此報錯是由于新節點原來是單機部署的單節點etcd,加入集群后沒有洗掉資料目錄導致的,洗掉資料目錄即可解決
rm -rf /data/etcd/data/*
5.etcd集群新增的節點IP不存在于證書檔案導致無法加入集群
報錯內容如下
9月 14 18:45:40 binary-k8s-master1 etcd[14881]: {"level":"warn","ts":"2021-09-14T18:45:40.932+0800","caller":"rafthttp/probing_status.go:70","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"c8a24e337417915f","rtt":"0s","error":"x509: certificate is valid for 192.168.20.10, 192.168.20.11, 192.168.20.12, 192.168.20.13, not 192.168.20.8"}

由于新節點的ip不在etcd證書檔案里,所以導致的錯誤
解決方法:在證書組態檔中新增節點ip,然后重新生成證書,將證書拷貝至所有節點,重啟所有etcd節點即可
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/300993.html
標籤:其他
