跳至主要内容

[21] 為什麼 EKS 使用 NLB 作為 Kubernetes service 會遇到 connection timeout(一)

Kubernetes 中 Service 1 Object 常見類型有 ClusterIPNodePortLoadBalancer。根據不同的 Cloud Provider 則會有對應 Cloud Provider 廠商的元件,如在 EKS 環境中的 cloud-provider-aws 2 提供協助使用者 provision LoadBalancer 資源。在 EKS 環境,cloud-provider-aws 會負責建立 Classic Load Balancer(CLB) 3Network Load Balancer 4 資源作為 load balancers5

而在某些場境下,我們會傾向將 client-side application 與 server-side application 皆部署於同一個 Kubernetes Cluster 上。

  • server-side 服務:Nginx Server 或是自建 Redis cluster。
  • 透過對內的 Kubernetes Service 建立 NLB 作為統一對內入口。
  • 於同一個 EKS cluster 內,client 端透過此 NLB endpoint 連接對應 server-side 服務。

建置環境

  1. 建立以下 nginx-nlb-service-demo.yaml YAML。使用 Nginx 作為 server-side application。
$ cat ./nginx-nlb-service-demo.yaml
apiVersion: v1
kind: Namespace
metadata:
name: "nginx-demo"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: "nginx-demo-deployment"
namespace: "nginx-demo"
spec:
selector:
matchLabels:
app: "nginx-demo"
replicas: 3
template:
metadata:
labels:
app: "nginx-demo"
role: "backend"
spec:
containers:
- image: nginx:latest
imagePullPolicy: Always
name: "nginx-demo"
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: "service-nginx-demo"
namespace: "nginx-demo"
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: "nginx-demo"

  1. 建立以下 netshoot.yaml 文件作為 client。
$ cat ./netshoot.yaml
apiVersion: v1
kind: Pod
metadata:
name: netshoot
spec:
containers:
- name: netshoot
image: nicolaka/netshoot:latest
command:
- sleep
- infinity
imagePullPolicy: IfNotPresent
restartPolicy: Alway
  1. 部署 Nginx Deployment。
$ kubectl apply -f ./nginx-nlb-service-demo.yaml
namespace/nginx-demo created
deployment.apps/nginx-demo-deployment created
service/service-nginx-demo created
  1. 部署 netshoot Pod。
$ kubectl apply -f ./netshoot.yaml
pod/netshoot created
  1. 檢視 node、pod、service 資源
$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-192-168-18-190.eu-west-1.compute.internal Ready <none> 21h v1.22.12-eks-ba74326
ip-192-168-55-245.eu-west-1.compute.internal Ready <none> 22h v1.22.12-eks-ba74326
ip-192-168-61-79.eu-west-1.compute.internal Ready <none> 133m v1.22.12-eks-ba74326
ip-192-168-71-85.eu-west-1.compute.internal Ready <none> 133m v1.22.12-eks-ba74326
ip-192-168-76-241.eu-west-1.compute.internal Ready <none> 133m v1.22.12-eks-ba7432
$ kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
netshoot 1/1 Running 0 36s 192.168.41.36 ip-192-168-55-245.eu-west-1.compute.internal <none> <none>

$ kubectl -n nginx-demo get svc,pod -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/service-nginx-demo LoadBalancer 10.100.135.233 a0ac38093315243b1a67d275a4379ca6-c09bd862a6e56fc5.elb.eu-west-1.amazonaws.com 80:30161/TCP 36s app=nginx-demo

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-demo-deployment-7489d44bbd-9z5dg 1/1 Running 0 36s 192.168.21.160 ip-192-168-18-190.eu-west-1.compute.internal <none> <none>
pod/nginx-demo-deployment-7489d44bbd-jcshp 1/1 Running 0 36s 192.168.44.248 ip-192-168-61-79.eu-west-1.compute.internal <none> <none>
pod/nginx-demo-deployment-7489d44bbd-m22pd 1/1 Running 0 36s 192.168.73.88 ip-192-168-76-241.eu-west-1.compute.internal <none> <none>
  1. 透過 netshoot Pod 使用 curl 命令訪問 NLB endpoint。
$ kubectl exec -it netshoot -- curl a0ac38093315243b1a67d275a4379ca6-c09bd862a6e56fc5.elb.eu-west-1.amazonaws.com
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

$ kubectl exec -it netshoot -- curl a0ac38093315243b1a67d275a4379ca6-c09bd862a6e56fc5.elb.eu-west-1.amazonaws.com
## 偶發性 timeout

總結

今天我們先架設完環境,明天將透過 tcpdump 擷取封包來查看為什麼 NLB 會有偶發性的 connection timeout 問題。

參考文件

Footnotes

  1. Service | Kubernetes - https://kubernetes.io/docs/concepts/services-networking/service/

  2. cloud-provider-aws - https://github.com/kubernetes/cloud-provider-aws

  3. What is a Classic Load Balancer? - https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/introduction.html

  4. What is a Network Load Balancer? - https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html

  5. Network load balancing on Amazon EKS - https://docs.aws.amazon.com/eks/latest/userguide/network-load-balancing.html