跳至主要内容

[13] 為什麼透過 CloudFormation template 或 ekstcl 啟用的 self-managed node 可以自動加入 EKS cluster

在先前 Day3 Day4 時,我們討論到 Amazon EKS AMI Build Specification script 定義了自動化設置 kubelet 設定檔,使 node 可以順利地加入節點。

故本文章將探討 CloudFormation template 及 eksctl 命令是如何根據此 bootstrap.sh 啟用 worker node。

CloudFormation template

若我們使用 CloudFormation 方式建立 self-managed nodes,我們可以使用以下 curl command 下載官方提供的 CloudFormation template。

curl -o amazon-eks-nodegroup.yaml https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-nodegroup.yaml

以下列舉部分 CloudFormation template 資訊:

NodeLaunchTemplate:
Type: "AWS::EC2::LaunchTemplate"
Properties:
LaunchTemplateData:
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
DeleteOnTermination: true
VolumeSize: !Ref NodeVolumeSize
VolumeType: gp2
IamInstanceProfile:
Arn: !GetAtt NodeInstanceProfile.Arn
ImageId: !If
- HasNodeImageId
- !Ref NodeImageId
- !Ref NodeImageIdSSMParam
InstanceType: !Ref NodeInstanceType
KeyName: !Ref KeyName
SecurityGroupIds:
- !Ref NodeSecurityGroup
UserData: !Base64
"Fn::Sub": |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments}
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
MetadataOptions:
HttpPutResponseHopLimit : 2
HttpEndpoint: enabled
HttpTokens: !If
- IMDSv1Disabled
- required
- optional

NodeGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
LaunchTemplate:
LaunchTemplateId: !Ref NodeLaunchTemplate
Version: !GetAtt NodeLaunchTemplate.LatestVersionNumber
MaxSize: !Ref NodeAutoScalingGroupMaxSize
MinSize: !Ref NodeAutoScalingGroupMinSize
Tags:
- Key: Name
PropagateAtLaunch: true
Value: !Sub ${ClusterName}-${NodeGroupName}-Node
- Key: !Sub kubernetes.io/cluster/${ClusterName}
PropagateAtLaunch: true
Value: owned
VPCZoneIdentifier: !Ref Subnets
UpdatePolicy:
AutoScalingRollingUpdate:
MaxBatchSize: 1
MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
PauseTime: PT5M

其中我們可以觀察到, CloudFormation Logical ID [5] NodeGroup 使用 AWS:: AutoScaling:: AutoScalingGroup [6] 而非 AWS:: EKS:: Nodegroup [7]。值得注意的是,AWS:: EKS:: Nodegroup 啟用的是 managed node group。

接續,NodeGroup 關聯了 NodeLaunchTemplate 其定義使用了 EC2 userdata [8],並使用了 cfn-signal [9]--exit-code 來確定此 bootstrap.sh 是否成功。

eksctl

透過 eksctl 建立 self-managed node group 資源後,可以查看到會建立對應的 CloudFormation Stack。故我們也能使用以下 AWS CLI aws cloudformation describe-stack-resources 命令,與 CloudFormation 所建立的方式類似,也是透過定義 Auto Scaling Group 及定義 Launch Template 來建立。

$ aws cloudformation describe-stack-resources \
--stack-name eksctl-ironman-2022-nodegroup-demo-self-ng
...
...
...
{
"StackName": "eksctl-ironman-2022-nodegroup-demo-self-ng",
"StackId": "arn:aws:cloudformation:eu-west-1:111111111111:stack/eksctl-ironman-2022-nodegroup-demo-self-ng/e2083840-3cda-11ed-882d-026b9e583afb",
"LogicalResourceId": "NodeGroup",
"PhysicalResourceId": "eksctl-ironman-2022-nodegroup-demo-self-ng-NodeGroup-1Q19K6MP0HTFJ",
"ResourceType": "AWS::AutoScaling::AutoScalingGroup",
"Timestamp": "2022-09-25T14:07:16.122000+00:00",
"ResourceStatus": "CREATE_COMPLETE",
"DriftInformation": {
"StackResourceDriftStatus": "NOT_CHECKED"
}
},
{
"StackName": "eksctl-ironman-2022-nodegroup-demo-self-ng",
"StackId": "arn:aws:cloudformation:eu-west-1:111111111111:stack/eksctl-ironman-2022-nodegroup-demo-self-ng/e2083840-3cda-11ed-882d-026b9e583afb",
"LogicalResourceId": "NodeGroupLaunchTemplate",
"PhysicalResourceId": "lt-011b90bb7c3e105d3",
"ResourceType": "AWS::EC2::LaunchTemplate",
"Timestamp": "2022-09-25T14:06:37.330000+00:00",
"ResourceStatus": "CREATE_COMPLETE",
"DriftInformation": {
"StackResourceDriftStatus": "NOT_CHECKED"
}
},
...
...

接著,我們可以透過 aws cloudformation get-template 查看原先的 userdata 定義,此輸出看起來像是經 base64 encode。

$ aws cloudformation get-template --stack-name eksctl-ironman-2022-nodegroup-demo-self-ng --query TemplateBody.Resources.NodeGroupLaunchTemplate.Properties.LaunchTemplateData.UserData
"H4sIAAAAAAAA/7xYbXPiOrL+nl+hy0ydnLl3DLaBnCFV3FpebMKLTSxbMnh2KiUsBb/KHlsEwmz++5YhZMJMztSera39QmLpedqtVquflt/5Sbahkp/x+3B9kRM/JmtWXgO+SZKLYsP9lF5fSEACjQdSNJJw1TgQGqVfhLkoGywufZE0VlkmSlGQvE4StV4Gf4kSsCRnRcXaFqFgd/dh...
... SKIP CONTENT ...
...
pIBWeClafi1HhDti7fcOjMzJ9+cDgyz4Sme9kQaX72mlfQn78lvCXF19LzF4anGngHKLsnm+S4HyRRQbQpBQg58EnJPgKeiUqcaTWyWW242PwndOV7e/W2tPwzAAD//+PPRyJOEgAA"

因此我們可以將此輸出內容透過 base64 decode,透過 file 查看到此 decode 格式為 gzip 的壓縮格式。

$ aws cloudformation get-template --stack-name eksctl-ironman-2022-nodegroup-demo-self-ng --query TemplateBody.Resources.NodeGroupLaunchTemplate.Properties.LaunchTemplateData.UserData --output text | base64 -d > ec2-userdata

$ file ec2-userdata
ec2-userdata: gzip compressed data

而透過 gunzip 解壓縮此文件後,確認此文件為 cloud-init 格式,並且一樣定義了 bootstrap.sh

$ mv ec2-userdata ec2-userdata.gz
$ gunzip ./ec2-userdata.gz

$ cat ./ec2-userdata
#cloud-config
packages: null
runcmd:
- - /var/lib/cloud/scripts/eksctl/bootstrap.al2.sh
- - /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
write_files:
- content: '{}'
owner: root:root
path: /etc/eksctl/kubelet-extra.json
permissions: "0644"
- content: |-
NODE_TAINTS=
CLUSTER_DNS=10.100.0.10
CONTAINER_RUNTIME=dockerd
CLUSTER_NAME=ironman-2022
API_SERVER_URL=https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.gr7.eu-west-1.eks.amazonaws.com
B64_CLUSTER_CA=LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFR...
... ...
V1lZRExMUDdLCm0xVUJQUWdzTzRQQlREUjlaLzhpbnZDV0FiT0szM2Z6OVZqU3dBbjlhQ0lXbU5FY2dVMkFUWm1FN0N4WEUrbFkKOFhjPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
NODE_LABELS=alpha.eksctl.io/cluster-name=ironman-2022,alpha.eksctl.io/nodegroup-name=demo-self-ng
owner: root:root
path: /etc/eksctl/kubelet.env
permissions: "0644"
- content: |
#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

source /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh

echo "eksctl: running /etc/eks/bootstrap"
/etc/eks/bootstrap.sh "${CLUSTER_NAME}" \
--apiserver-endpoint "${API_SERVER_URL}" \
--b64-cluster-ca "${B64_CLUSTER_CA}" \
--dns-cluster-ip "${CLUSTER_DNS}" \
--kubelet-extra-args "${KUBELET_EXTRA_ARGS}" \
--container-runtime "${CONTAINER_RUNTIME}"

echo "eksctl: merging user options into kubelet-config.json"
trap 'rm -f ${TMP_KUBE_CONF}' EXIT
jq -s '.[0] * .[1]' "${KUBELET_CONFIG}" "${KUBELET_EXTRA_CONFIG}" > "${TMP_KUBE_CONF}"
mv "${TMP_KUBE_CONF}" "${KUBELET_CONFIG}"

systemctl daemon-reload
echo "eksctl: restarting kubelet-eks"
systemctl restart kubelet
echo "eksctl: done"
owner: root:root
path: /var/lib/cloud/scripts/eksctl/bootstrap.al2.sh
permissions: "0755"
- content: |
#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

source /etc/eksctl/kubelet.env # file written by bootstrapper

# Use IMDSv2 to get metadata
TOKEN="$(curl --silent -X PUT -H "X-aws-ec2-metadata-token-ttl-seconds: 600" http://169.254.169.254/latest/api/token)"
function get_metadata() {
curl --silent -H "X-aws-ec2-metadata-token: $TOKEN" "http://169.254.169.254/latest/meta-data/$1"
}

API_SERVER_URL="${API_SERVER_URL}"
B64_CLUSTER_CA="${B64_CLUSTER_CA}"
INSTANCE_ID="$(get_metadata instance-id)"
INSTANCE_LIFECYCLE="$(get_metadata instance-life-cycle)"
CLUSTER_DNS="${CLUSTER_DNS:-}"
NODE_TAINTS="${NODE_TAINTS:-}"
MAX_PODS="${MAX_PODS:-}"
NODE_LABELS="${NODE_LABELS},node-lifecycle=${INSTANCE_LIFECYCLE},alpha.eksctl.io/instance-id=${INSTANCE_ID}"

KUBELET_ARGS=("--node-labels=${NODE_LABELS}")
[[ -n "${NODE_TAINTS}" ]] && KUBELET_ARGS+=("--register-with-taints=${NODE_TAINTS}")
# --max-pods as a CLI argument is deprecated, this is a workaround until we deprecate support for maxPodsPerNode
[[ -n "${MAX_PODS}" ]] && KUBELET_ARGS+=("--max-pods=${MAX_PODS}")
KUBELET_EXTRA_ARGS="${KUBELET_ARGS[@]}"

CLUSTER_NAME="${CLUSTER_NAME}"
KUBELET_CONFIG='/etc/kubernetes/kubelet/kubelet-config.json'
KUBELET_EXTRA_CONFIG='/etc/eksctl/kubelet-extra.json'
TMP_KUBE_CONF='/tmp/kubelet-conf.json'
CONTAINER_RUNTIME="${CONTAINER_RUNTIME:-dockerd}" # default for al2 just in case, not used in ubuntu
owner: root:root
path: /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
permissions: "0755"

根據 eksctl nodebootstrap 文件,上述 bootstrapping/userdata 則是由 nodebootstrap 定義的邏輯,又依據不同作業系統,如 Ubuntu、Amazon Linux 劃分 ubuntu.goal2.go。因此 UserData 會動態地設定以下設定檔:

  • kubelet-extra.json - kubelet user configuration
  • docker-extra.json - Docker Daemon extra config
  • kubelet.env:kubelet 環境變數 vars for kubelet

總結

不論是透過官方文件所提供 CloudFormation template 或是 eksctl 命令方式所建立的 CloudFormation Stack,皆使用了 userdata 方式設定 EKS worker node AMI 設置 bootstrap.sh script 設定必要參數。而 eksctl 更近一步地在 provision 階段,透過 nodebootstrap pkg 動態建立 cloud-init 設定檔。

參考文件

  1. [03] 為什麼 EKS worker node 可以自動加入 EKS cluster(一) - https://ithelp.ithome.com.tw/articles/10293460
  2. [04] 為什麼 EKS worker node 可以自動加入 EKS cluster(二) - https://ithelp.ithome.com.tw/articles/10294101
  3. Amazon EKS AMI Build Specification - https://github.com/awslabs/amazon-eks-ami
  4. Launching self-managed Amazon Linux nodes - https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html
  5. Resources - Resource fields - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/resources-section-structure.html#resources-section-structure-resource-fields
  6. AWS:: AutoScaling:: AutoScalingGroup - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html
  7. AWS:: EKS:: Nodegroup - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html
  8. Run commands on your Linux instance at launch - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
  9. cfn-signal - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-signal.html
  10. Userdata - https://cloudbase-init.readthedocs.io/en/latest/userdata.html
  11. https://github.com/weaveworks/eksctl/tree/main/pkg/nodebootstrap