[07] 為什麼 CDK 部署 EKS cluster 會比較慢
本日來探究一下 EKS 周邊部署 Cluster 方式,本系列文章主要以 eksctl 命令進行部署,以下為建立 cluster 時的輸出:
$ eksctl create cluster -f ./ironman-2022.yaml
2022-09-14 09:39:32 [ℹ]  eksctl version 0.111.0
2022-09-14 09:39:32 [ℹ]  using region eu-west-1
2022-09-14 09:39:32 [ℹ]  setting availability zones to [eu-west-1a eu-west-1c eu-west-1b]
2022-09-14 09:39:32 [ℹ]  subnets for eu-west-1a - public:192.168.0.0/19 private:192.168.96.0/19
2022-09-14 09:39:32 [ℹ]  subnets for eu-west-1c - public:192.168.32.0/19 private:192.168.128.0/19
2022-09-14 09:39:32 [ℹ]  subnets for eu-west-1b - public:192.168.64.0/19 private:192.168.160.0/19
2022-09-14 09:39:32 [ℹ]  nodegroup "ng1-public-ssh" will use "" [AmazonLinux2/1.22]
2022-09-14 09:39:32 [ℹ]  using EC2 key pair "ironman-2022"
2022-09-14 09:39:32 [ℹ]  using Kubernetes version 1.22
2022-09-14 09:39:32 [ℹ]  creating EKS cluster "ironman-2022" in "eu-west-1" region with managed nodes
2022-09-14 09:39:32 [ℹ]  1 nodegroup (ng1-public-ssh) was included (based on the include/exclude rules)
2022-09-14 09:39:32 [ℹ]  will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2022-09-14 09:39:32 [ℹ]  will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)
2022-09-14 09:39:32 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=eu-west-1 --cluster=ironman-2022'
2022-09-14 09:39:32 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "ironman-2022" in "eu-west-1"
2022-09-14 09:39:32 [ℹ]  configuring CloudWatch logging for cluster "ironman-2022" in "eu-west-1" (enabled types: api, audit, authenticator, controllerManager, scheduler & no types disabled)
2022-09-14 09:39:32 [ℹ]
2 sequential tasks: { create cluster control plane "ironman-2022",
    2 sequential sub-tasks: {
        4 sequential sub-tasks: {
            wait for control plane to become ready,
            associate IAM OIDC provider,
            2 sequential sub-tasks: {
                create IAM role for serviceaccount "kube-system/aws-node",
                create serviceaccount "kube-system/aws-node",
            },
            restart daemonset "kube-system/aws-node",
        },
        create managed nodegroup "ng1-public-ssh",
    }
}
2022-09-14 09:39:32 [ℹ]  building cluster stack "eksctl-ironman-2022-cluster"
2022-09-14 09:39:33 [ℹ]  deploying stack "eksctl-ironman-2022-cluster"
2022-09-14 09:40:03 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:40:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:41:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:42:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:43:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:44:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:45:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:46:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:47:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:48:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:49:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:50:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:51:33 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-cluster"
2022-09-14 09:53:34 [ℹ]  building iamserviceaccount stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-node"
2022-09-14 09:53:35 [ℹ]  deploying stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-node"
2022-09-14 09:53:35 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-node"
2022-09-14 09:54:05 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-addon-iamserviceaccount-kube-system-aws-node"
2022-09-14 09:54:05 [ℹ]  serviceaccount "kube-system/aws-node" already exists
2022-09-14 09:54:05 [ℹ]  updated serviceaccount "kube-system/aws-node"
2022-09-14 09:54:05 [ℹ]  daemonset "kube-system/aws-node" restarted
2022-09-14 09:54:05 [ℹ]  building managed nodegroup stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:54:05 [ℹ]  deploying stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:54:05 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:54:35 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:55:18 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:56:04 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:57:24 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:59:10 [ℹ]  waiting for CloudFormation stack "eksctl-ironman-2022-nodegroup-ng1-public-ssh"
2022-09-14 09:59:10 [ℹ]  waiting for the control plane availability...
2022-09-14 09:59:12 [✔]  saved kubeconfig as "/home/ec2-user/.kube/config"
2022-09-14 09:59:12 [ℹ]  no tasks
2022-09-14 09:59:12 [✔]  all EKS cluster resources for "ironman-2022" have been created
2022-09-14 09:59:12 [ℹ]  nodegroup "ng1-public-ssh" has 2 node(s)
2022-09-14 09:59:12 [ℹ]  node "ip-192-168-29-179.eu-west-1.compute.internal" is ready
2022-09-14 09:59:12 [ℹ]  node "ip-192-168-78-165.eu-west-1.compute.internal" is ready
2022-09-14 09:59:12 [ℹ]  waiting for at least 2 node(s) to become ready in "ng1-public-ssh"
比對確認 EKS cluster 完成建立相關資源,扣除建立 nodegroup 資源之外,大致花費時間約 20 分鐘左右。
2022-09-14 09:39:32 [ℹ]  eksctl version 0.111.0
... 建立集群、ServiceAccount role
2022-09-14 09:59:12 [✔]  all EKS cluster resources for "ironman-2022" have been created
... 開始建立 nodegroup 資源
除了 eksctl 之外,我們也可以透過 CDK1 方式來 provision EKS cluster 資源,其 CDK 也屬於 Infrastructure as Code(IaC)2 一種工具,可以更加方便使用者透過撰寫程式碼方式建構 AWS 架構。
本文將探討為何 CDK 部署 EKS cluster 時會稍微花費較久一些時間。
透過 CDK 建置 EKS cluster
- 安裝 CDK,在驗證時安裝版本為 2.37.1。
$ npm i -g aws-cdk
$ cdk version
2.37.1 (build f15dee0)
- 初始化 project。
$ cdk init sample-app --language=typescript
Applying project template sample-app for typescript
# Welcome to your CDK TypeScript project
You should explore the contents of this project. It demonstrates a CDK app with an instance of a stack (`HelloEksStack`)
which contains an Amazon SQS queue that is subscribed to an Amazon SNS topic.
The `cdk.json` file tells the CDK Toolkit how to execute your app.
## Useful commands
* `npm run build`   compile typescript to js
* `npm run watch`   watch for changes and compile
* `npm run test`    perform the jest unit tests
* `cdk deploy`      deploy this stack to your default AWS account/region
* `cdk diff`        compare deployed stack with current state
* `cdk synth`       emits the synthesized CloudFormation template
Initializing a new git repository...
Executing npm install...
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN hello-eks@0.1.0 No repository field.
npm WARN hello-eks@0.1.0 No license field.
✅ All done!
- 安裝相應套件並 bootstrap CDK。
npm i @aws-cdk/aws-ec2
npm i @aws-cdk/aws-eks
cdk bootstrap aws://111111111111/us-east-1
- 更新以下 CDK script 並建立 2 個 worker node。
$ cat ./bin/hello-eks.ts
#!/usr/bin/env node
import * as cdk from '@aws-cdk/core';
import { HelloEksStack } from '../lib/hello-eks-stack';
const app = new cdk.App();
const envIAD = { account: '111111111111', region: 'us-east-1' };
new HelloEksStack(app, 'HelloEksStack', { env: envIAD });
$ cat ./lib/hello-eks-stack.ts
import * as cdk from '@aws-cdk/core';
import * as eks from '@aws-cdk/aws-eks';
import * as ec2 from '@aws-cdk/aws-ec2';
import { Construct } from 'constructs';
export class HelloEksStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    const cluster = new eks.Cluster(this, 'HelloEKS', {
      version: eks.KubernetesVersion.V1_21,
      defaultCapacity: 2,
      defaultCapacityInstance: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MEDIUM),
    });
  }
}
- 部署 CDK stack,最終花費 1549.4s 約 25 分鐘。
$ cdk deploy
✨  Synthesis time: 3.83s
This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening).
Please confirm you intend to make the following modifications:
...
... SKIP ... IAM Statement/Policy change
...
(NOTE: There may be security-related changes not in this list. See https://github.com/aws/aws-cdk/issues/1299)
Do you wish to deploy these changes (y/n)? y
HelloEksStack: deploying...
[0%] start: Publishing 4288ebb3652acdf2d828b7db7ca44a7162a401ace50ebb4026e84b18a02a06ee:current
[0%] start: Publishing 98908a88aa1d19a5175824f9f289b2d4736b7e17837f4e1ca871251a7505c02d:current
[0%] start: Publishing 3b263c2ad043fd069ef446753788c36e595c82b51a70478e58258c8ef7471671:current
[0%] start: Publishing 5a0165b80474734eff290897da1c8d571862f020eeaac2561c839386159c7f03:current
[0%] start: Publishing 2cbb4c8a663eb2c903e9cef3ad44c9b47d48b0a2cd35dfc270a23bf93b2ecf1f:current
[0%] start: Publishing c6964dbf0c556ec82ce09622e99ad6f6d4e488cdaac0ef9e8492e078ec61ffed:current
[0%] start: Publishing a174006e07995336de90bec7b82ca81a33082ac5e5e34af3d09563b0e9674824:current
[0%] start: Publishing 6fc1e809b207366e5c08060b7f9cd496ec5519c1018444e21de238bc0092b2bc:current
[12%] success: Published 5a0165b80474734eff290897da1c8d571862f020eeaac2561c839386159c7f03:current
[25%] success: Published 6fc1e809b207366e5c08060b7f9cd496ec5519c1018444e21de238bc0092b2bc:current
[37%] success: Published a174006e07995336de90bec7b82ca81a33082ac5e5e34af3d09563b0e9674824:current
[50%] success: Published 3b263c2ad043fd069ef446753788c36e595c82b51a70478e58258c8ef7471671:current
[62%] success: Published 98908a88aa1d19a5175824f9f289b2d4736b7e17837f4e1ca871251a7505c02d:current
[75%] success: Published 4288ebb3652acdf2d828b7db7ca44a7162a401ace50ebb4026e84b18a02a06ee:current
^[[O[87%] success: Published 2cbb4c8a663eb2c903e9cef3ad44c9b47d48b0a2cd35dfc270a23bf93b2ecf1f:current
^[[I[100%] success: Published c6964dbf0c556ec82ce09622e99ad6f6d4e488cdaac0ef9e8492e078ec61ffed:current
HelloEksStack: creating CloudFormation changeset...
[███████████████████████████████████████▍··················] (32/47)
 ✅  HelloEksStack
✨  Deployment time: 1549.4s
Outputs:
HelloEksStack.HelloEKSConfigCommand861347FC = aws eks update-kubeconfig --name HelloEKS39C624A1-91bdaaebbc8945038db74027028f56ec --region us-east-1 --role-arn arn:aws:iam::111111111111:role/HelloEksStack-HelloEKSMastersRole53742E60-1WJJ2WRTYBI91
HelloEksStack.HelloEKSGetTokenCommandF486E67D = aws eks get-token --cluster-name HelloEKS39C624A1-91bdaaebbc8945038db74027028f56ec --region us-east-1 --role-arn arn:aws:iam::111111111111:role/HelloEksStack-HelloEKSMastersRole53742E60-1WJJ2WRTYBI91
Stack ARN:
arn:aws:cloudformation:us-east-1:111111111111:stack/HelloEksStack/57dd8730-1974-11ed-96b8-12f38830ada9
✨  Total time: 1553.23s
NOTICES[[O
19836 AWS CDK v1 has entered maintenance mode
 Overview: AWS CDK v1 has entered maintenance mode on June 1, 2022.
           Migrate to AWS CDK v2 to continue to get the latest features
           and fixes!
 Affected versions: framework: 1.*, cli: 1.*
 More information at: https://github.com/aws/aws-cdk/issues/19836
If you don’t want to see a notice anymore, use "cdk acknowledge <id>". For example, "cdk acknowledge 19836".
分析
在檢視此 CloudFormation Stack 時,可以觀察到 CDK 會替我們部署以下 Lambda
- AWS::Lambda::Function
- AWS::Lambda::LayerVersion(KubectlLayer600207B5)
並比對 CloudFormation Event,可以確認 CDK 計算整體 CloudFormation 建置完成時間,其中包含上述 Lambda Function 流程。
2022-09-22T13:16:15.805Z  AWS::CloudFormation::Stack HelloEksStack-awscdkawseksKubectlProvide...
CREATE_COMPLETE
接續,為了解此 Lambda Function 主要處理目的,我們可以在 CDK @aws-cdk/aws-eks3 了解其架構:
 +-----------------------------------------------+               +-----------------+
 |                 EKS Cluster                   |    kubectl    |                 |
 |-----------------------------------------------|<-------------+| Kubectl Handler |
 |                                               |               |                 |
 |                                               |               +-----------------+
 | +--------------------+    +-----------------+ |
 | |                    |    |                 | |
 | | Managed Node Group |    | Fargate Profile | |               +-----------------+
 | |                    |    |                 | |               |                 |
 | +--------------------+    +-----------------+ |               | Cluster Handler |
 |                                               |               |                 |
 +-----------------------------------------------+               +-----------------+
    ^                                   ^                          +
    |                                   |                          |
    | connect self managed capacity     |                          | aws-sdk
    |                                   | create/update/delete     |
    +                                   |                          v
 +--------------------+                 +              +-------------------+
 |                    |                 --------------+| eks.amazonaws.com |
 | Auto Scaling Group |                                +-------------------+
 |                    |
 +--------------------+
以下為兩個我們主要關注的 Lambda function:
- KubectlHandler- 用於在 cluster 上調用 kubectl 命令的 Lambda function。
- ClusterHandler- 用於與 EKS API 管理 cluster 生病週期的 Lambda function。
換言之,此 Lambda function 也會依據其定義等候 EKS 資源被建立後才能視為完成。故有可能在此 Lambda function 也會花費較多一些時間來設定 EKS cluster。此部分我們也可以在原始碼4上觀察到類似定義:
export async function onEvent(event: AWSLambda.CloudFormationCustomResourceEvent) {
  const provider = createResourceHandler(event);
  return provider.onEvent();
}
export async function isComplete(event: AWSLambda.CloudFormationCustomResourceEvent): Promise<IsCompleteResponse> {
  const provider = createResourceHandler(event);
  return provider.isComplete();
}
function createResourceHandler(event: AWSLambda.CloudFormationCustomResourceEvent) {
  switch (event.ResourceType) {
    case consts.CLUSTER_RESOURCE_TYPE: return new ClusterResourceHandler(defaultEksClient, event);
    case consts.FARGATE_PROFILE_RESOURCE_TYPE: return new FargateProfileResourceHandler(defaultEksClient, event);
    default:
      throw new Error(`Unsupported resource type "${event.ResourceType}`);
  }
}
總結
由上述可以了解 CDK 相較於 eksctl額外定義兩個 Lambda function 來管理 EKS cluster,會稍微花費較多一些時間來進行部署及更新。
參考文件
Footnotes
- 
Getting started with the AWS CDK - https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html ↩ 
- 
Infrastructure as code - https://en.wikipedia.org/wiki/Infrastructure_as_code ↩ 
- 
Architectural Overview - https://docs.aws.amazon.com/cdk/api/v1/docs/aws-eks-readme.html#architectural-overview ↩ 
- 
https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-eks/lib/cluster-resource-handler/index.ts#L49-L66 ↩