kubernetes集群內置的dns插件kubedns/coredns在高并發情況下可能遇到性能瓶頸,以下從配置與本地快取方面說明如何減少dns查詢失敗率,提高性能,
配置優化
dnsPolicy
k8s 默認的 dnsPolicy 是ClusterFirst,因為 ndots 和 serach domain 在訪問外部 dns 會有額外的查詢次數,
/ # cat /etc/resolv.conf
nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
/ #
/ #
/ # host -v mi.com
Trying "mi.com.default.svc.cluster.local"
Trying "mi.com.svc.cluster.local"
Trying "mi.com.cluster.local"
Trying "mi.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38967
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;mi.com. IN A
;; ANSWER SECTION:
mi.com. 30 IN A 58.83.160.156
如果不訪問service,調整dnsPolicy為Default,直接走宿主機的dns
ndots
如需訪問service,盡量減少ndots(默認5)即域名中點的個數小于ndots會按照search域(mi.com.default.svc.cluster.local)依次查詢,若查詢不到再查詢原始域名,總共進行8次dns查詢(4次ipv4, 4次ipv6)
設定ndots為1后,只有兩次查詢(1次ipv4, ipv6)
/ # host -v mi.com
Trying "mi.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23894
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;mi.com. IN A
;; ANSWER SECTION:
mi.com. 30 IN A 58.83.160.156
但此種方式service域名分割大于等于ndots,則決議不到,需要業務自行判斷合適的ndots值
/ # host -v prometheus.kube-system
Trying "prometheus.kube-system"
Host prometheus.kube-system not found: 3(NXDOMAIN)
Received 115 bytes from 10.254.0.2#53 in 8 ms
Received 115 bytes from 10.254.0.2#53 in 8 ms
coredns優化
調整合理的副本數,阿里建議coredns:node=1:8,啟動AutoPath插件減少查詢次數,見DNS性能優化
DNS快取
NodeLocalDNS
NodeLocal DNSCache 通過在集群節點上作為 DaemonSet 運行 dns 快取代理來提高集群 DNS 性能,
借助這種新架構,Pods 將可以訪問在同一節點上運行的 dns 快取代理,從而避免了 iptables DNAT 規則和連接跟蹤,
架構如下:
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-tR6iRlgL-1612424489894)(https://d33wubrfki0l68.cloudfront.net/bf8e5eaac697bac89c5b36a0edb8855c860bfb45/6944f/images/docs/nodelocaldns.svg)]
NodeLocalDNS的設計提案見(nodelocal-dns-cache)
驗證
官方安裝方式見nodelocaldns,需要自行替換變數
可通過如下腳本,一鍵安裝(注意設定kubedns svc ClusterIP)
#!/bin/bash
wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
# registery
docker_registery=k8s.gcr.io/dns/k8s-dns-node-cache
# kube-dns svc clusterip
kubedns_svc=10.254.0.2
# nodelocaldns ip
nodelocaldns_ip=169.254.20.10
# kube-proxy mode, iptables or ipvs
kubeproxy_mode=iptables
result=result.yaml
if [ ${kubeproxy_mode} == "ipvs" ]; then
sed -e "s|k8s.gcr.io/dns/k8s-dns-node-cache|$docker_registery|g" \
-e "s/__PILLAR__CLUSTER__DNS__/$kubedns_svc/g" \
-e "s/__PILLAR__LOCAL__DNS__/$nodelocaldns_ip/g" \
-e 's/[ |,]__PILLAR__DNS__SERVER__//g' \
-e "s/__PILLAR__DNS__DOMAIN__/cluster.local/g" nodelocaldns.yaml >$result
else
sed -e "s|k8s.gcr.io/dns/k8s-dns-node-cache|$docker_registery|g" \
-e "s/__PILLAR__DNS__SERVER__/$kubedns_svc/g" \
-e "s/__PILLAR__LOCAL__DNS__/$nodelocaldns_ip/g" \
-e "s/__PILLAR__DNS__DOMAIN__/cluster.local/g" nodelocaldns.yaml >$result
fi
kubectl apply -f $result
創建完成后,每個節點運行一個pod,查看pod(個別節點ingress-nginx占用8080埠,導致nodelocaldns啟動失敗)
# kubectl get po -n kube-system -l k8s-app=node-local-dns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-local-dns-2fvxb 0/1 CrashLoopBackOff 4 103s 10.38.200.195 node04 <none> <none>
node-local-dns-4zmcd 1/1 Running 0 54d 10.38.201.55 node06 <none> <none>
node-local-dns-55tzg 1/1 Running 0 60d 10.38.200.186 node02 <none> <none>
node-local-dns-cctg7 1/1 Running 0 54d 10.38.200.242 node07 <none> <none>
node-local-dns-khgmm 1/1 Running 0 54d 10.38.201.36 node08 <none> <none>
node-local-dns-mbr64 1/1 Running 0 60d 10.38.200.187 node05 <none> <none>
node-local-dns-t67vw 1/1 Running 0 60d 10.38.200.188 node03 <none> <none>
node-local-dns-tmm92 1/1 Running 14 54d 10.38.200.57 node09 <none> <none>
默認配置如下:
cluster.local:53 {
errors
cache {
success 9984 30 # 默認成功快取30s
denial 9984 5 #失敗快取5s
}
reload
loop
bind 169.254.20.10 10.254.0.2 #本地監聽ip
forward . 10.254.132.95 { #轉發到kubedns-upstream
force_tcp
}
prometheus :9253 #監控介面
health 169.254.20.10:8080 #健康檢測埠
}
in-addr.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 10.254.0.2
forward . 10.254.132.95 {
force_tcp
}
prometheus :9253
}
ip6.arpa:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 10.254.0.2
forward . 10.254.132.95 {
force_tcp
}
prometheus :9253
}
.:53 {
errors
cache 30
reload
loop
bind 169.254.20.10 10.254.0.2
forward . /etc/resolv.conf
prometheus :9253
}
節點上查看localdns的網卡,本地將監聽169.254.20.10與10.254.0.2兩個地址,攔截kubedns((默認10.254.0.2)的請求,命中后直接回傳,若未命中轉發到kubedns(對應service kube-dns-upstream,kube-dns-upstream由localdns創建系結kubedns pod)
# ip addr show nodelocaldns
182232: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN
link/ether 4e:62:1c:fd:56:12 brd ff:ff:ff:ff:ff:ff
inet 169.254.20.10/32 brd 169.254.20.10 scope global nodelocaldns
valid_lft forever preferred_lft forever
inet 10.254.0.2/32 brd 10.254.0.2 scope global nodelocaldns
valid_lft forever preferred_lft forever
iptables規則,使用NOTRACK跳過其它表處理
iptables-save | egrep "10.254.0.2|169.254.20.10"
-A PREROUTING -d 10.254.0.2/32 -p udp -m udp --dport 53 -j NOTRACK
-A PREROUTING -d 10.254.0.2/32 -p tcp -m tcp --dport 53 -j NOTRACK
-A PREROUTING -d 169.254.20.10/32 -p udp -m udp --dport 53 -j NOTRACK
-A PREROUTING -d 169.254.20.10/32 -p tcp -m tcp --dport 53 -j NOTRACK
-A OUTPUT -d 10.254.0.2/32 -p udp -m udp --dport 53 -j NOTRACK
-A OUTPUT -d 10.254.0.2/32 -p tcp -m tcp --dport 53 -j NOTRACK
-A INPUT -d 10.254.0.2/32 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -d 10.254.0.2/32 -p tcp -m tcp --dport 53 -j ACCEPT
-A OUTPUT -s 10.254.0.2/32 -p udp -m udp --sport 53 -j ACCEPT
-A OUTPUT -s 10.254.0.2/32 -p tcp -m tcp --sport 53 -j ACCEPT
...
-A KUBE-SERVICES -d 10.254.0.2/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -d 10.254.0.2/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES -d 10.254.0.2/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
在pod通過localdns決議域名
# kubectl exec -it dns-perf-client-64cfb49f9-9c5hg sh
/ # nslookup kubernetes 169.254.20.10
Server: 169.254.20.10
Address: 169.254.20.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.254.0.1
壓測
通過dnsperf進行壓測
測驗域名串列如下
# cat records.txt
mi.com A
github.com A
www.microsoft.com A
www.aliyun.com A
kubernetes.io A
nginx A
nginx.default A
kubernetes A
kubernetes.default.svc.cluster.local A
kube-dns.kube-system.svc.cluster.local A
測驗命令
dnsperf -l 120 -s 10.254.0.2 -d records.txt
結果如下
| client number | qps | avg-lantency(ms) | stddev(ms) | lost | ||
|---|---|---|---|---|---|---|
| kubedns(1 pod) | 1 | 53910 | 1.83 | 6.07 | 0% | |
| kubedns(2 pod) | 2 | 110000 | 1.83 | 1.94 | 9% | |
| kubedns(4 pod) | 4 | 120000 | 3.2 | 0.8 | 24% | |
| nodelocaldns | 1 | 71494 | 1.39 | 1.66 | 0% | |
| nodelocaldns | 2 | 142000 | 1.37 | 1.55 | 0% |
相比nodelocaldns,localdns查詢性能提高了33%,而且延時相對更小,由于localdns是分布式的整體qps相對kubedns有較大優勢,當前測驗相對簡單,大部分請求會命中快取,完整的測驗結果待進一步驗證,
優缺點
優點:
- 大幅減少dns查詢延時
- 提高dns qps
- 不經過
iptables與conntrack - 默認使用tcp查詢dns,避免 dns 5秒延時
缺點:
- 單點故障(OOM/Evicted/Config Error/Upgrade),社區通過起一個探測daemonset監聽localdns狀態,如果localdns例外將去掉iptables規則
hostnetwork, 占用多個埠(8080, 9253等)- ipvs模式下,需要改動kubelet默認dns配置(
NOTRACK將對ipvs無效,除非service后端實體為0)
注意事項
- 低版本dns存在tcp請求記憶體泄露
- 安裝時
iptables與ipvs配置不同
HA
- 社區提案將
iptables寫入規則從nodelocaldns拆分為單獨的daemonset,通過監聽localdns地址來判斷是否寫入或洗掉iptables規則(ipvs默認下無效) - 在
/etc/resolv.conf配置多個nameservers(不推薦,不同基礎庫表現不同,如glibc 2.16+查詢dns時會向多個nameservers發送請求,反而造成了請求激增)
灰度方式
- 通過
dnsConfig配置Pod級別dns(需要配置啟動引數localip) - 通過設定
nodeselector灰度Node級別dns策略
本地DNS快取
除了nodelocaldns,用戶還可以在容器內或者添加sidecar來啟用dns快取
-
通過在鏡像中加入nscd行程,快取dns,如下:
FROM ubuntu RUN apt-get update && apt-get install -y nscd && rm -rf /var/lib/apt/lists/* CMD service nscd start; bash -c "sleep 3600"此種方式需要用戶改動鏡像,或者加入額外腳本配置
nscd -
另外可以配置可配置dns快取 sidecar(如
coredns,dnsmasq)來提高性能,此種方式靈活性高,但需要改動pod配置,而且較nodelocaldns浪費資源
參考
[1] https://kubernetes.io/zh/docs/tasks/administer-cluster/nodelocaldns/
[2] https://help.aliyun.com/document_detail/172339.html
[3] https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md
[4] https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/1024-nodelocal-cache-dns/README.md
[5] https://lework.github.io/2020/11/09/node-local-dns/
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/257082.html
標籤:其他
上一篇:Dubbo架構理解
