Closed33

CircleCI の Container Agent 触ってみる

Futa HirakobaFuta Hirakoba

これ

A more scalable, container-friendly self-hosted runner: Container Agent - now in Open Preview - Build Environment - CircleCI Discuss
https://discuss.circleci.com/t/a-more-scalable-container-friendly-self-hosted-runner-container-agent-now-in-open-preview/45094

Futa HirakobaFuta Hirakoba

まとめ

  • セットアップが簡単
  • ただしドキュメントは不親切
    • 何もわからなくてもできるようなドキュメントではない
    • helm知らない人はhelmのコマンドも調べる必要がある
    • セルフホストランナーを有効化にする手順が省かれている
  • container-agentのリソースクラスを利用するジョブが作成されたらpodが作られる
  • ランナーの設定項目は豊富
  • Apple Silicon搭載MacのkindおよびDocker Desktopでは動作しなかった
    • なぜかはよくわかってない
      • armでも動くらしいが...?
    • GKE Autopilotでは動作した

情報

使うまでの手順

  1. Organization Settings -> Self-Hosted Runnersでセルフホストランナーを有効にする
  2. トップページに戻り、Self-Hosted Runnersを開く
  3. Create Resource Classからリソースクラスを作成し、生成されたトークンを保存する
  4. クラスタにcontainer-agentをインストールする
    1. helm repo add container-agent https://packagecloud.io/circleci/container-agent/helm
    2. helm repo update
    3. kubectl create namespace circleci
    4. helm install container-agent container-agent/container-agent -n circleci
  5. values.yamlを生成する
    • helm show values container-agent/container-agent > values.yaml
  6. values.yamlを編集する
    • .agent.resourceClassesに 3. で作成したリソースクラスとトークンを追加
      • values.yamlの抜粋
         resourceClasses:
           korosuke613/gke-autopilot:
             token: < 3. で保存したトークン>
        
  7. values.yamlを適用する
    • helm upgrade container-agent container-agent/container-agent -n circleci -f ./values.yaml
  8. .circleci/config.ymlでリソースクラスを利用する
    • config.ymlの抜粋
      docker-image:
        docker:
          - image: quay.io/buildah/stable
        resource_class: <namespace>/<resourceClass>
      
Futa HirakobaFuta Hirakoba

helmで構築する

Futa HirakobaFuta Hirakoba
❯ helm repo add container-agent https://packagecloud.io/circleci/container-agent/helm
"container-agent" has been added to your repositories
❯ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "container-agent" chart repository
Update Complete. ⎈Happy Helming!⎈
❯ helm install container-agent container-agent/container-agent -n circleci
Error: INSTALLATION FAILED: create: failed to create: namespaces "circleci" not found

circleciというネームスペースを先に作っておく必要がありそう

Futa HirakobaFuta Hirakoba
❯ kubectl create namespace circleci
namespace/circleci created
❯ helm install container-agent container-agent/container-agent -n circleci
NAME: container-agent
LAST DEPLOYED: Tue Oct 11 15:52:46 2022
NAMESPACE: circleci
STATUS: deployed
REVISION: 1
TEST SUITE: None

できたっぽい

Futa HirakobaFuta Hirakoba

ドキュメントによると次のコマンドでvaluesを取得できる。

❯ helm show values circleci/container-agent
Error: failed to download "circleci/container-agent"

ダメじゃん。
インストールしたのはcontainer-agent/container-agentなので、helm show values container-agent/container-agentでできそう。

取得できた。

helm show values container-agent/container-agent
helm show values container-agent/container-agent
# Default values for container-agent.

## Overrides for generated resource names
# See templates/_helpers.tpl
# nameOverride:
# fullnameOverride:

agent:
  replicaCount: 1

  image:
    registry: ""
    repository: "circleci/container-agent"
    pullPolicy: Always
    tag: "1.0.17278-5c2bd95"

  pullSecrets: []

  matchLabels:
    app: container-agent

  # Annotations to be added to agent pods
  podAnnotations: {}

  # Security Context policies for agent pods
  podSecurityContext: {}

  # Security Context policies for agent containers
  containerSecurityContext: {}

  # Liveness and readiness probe values
  # Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
  livenessProbe:
    httpGet:
      # should match container.healthCheckPath
      path: "/live"
      port: 7623
      scheme: HTTP
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    successThreshold: 1
    failureThreshold: 5
  readinessProbe:
    httpGet:
      # should match container.healthCheckPath
      path: "/ready"
      port: 7623
      scheme: HTTP
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 1
    successThreshold: 1
    failureThreshold: 3

  # Agent pod resource configuration
  # Ref: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
  resources: {}
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

  # Node labels for agent pod assignment
  # Ref: https://kubernetes.io/docs/user-guide/node-selection/
  nodeSelector:
    kubernetes.io/arch: amd64

  # Node tolerations for agent scheduling to nodes with taints
  # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  tolerations: []

  # Agent affinity and anti-affinity
  # Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
  affinity: {}
  # # An example of preferred pod anti-affinity, weight is in the range 1-100
  # podAntiAffinity:
  #   preferredDuringSchedulingIgnoredDuringExecution:
  #   - weight: 100
  #     podAffinityTerm:
  #       labelSelector:
  #         matchExpressions:
  #         - key: app.kubernetes.io/name
  #           operator: In
  #           values:
  #           - ingress-nginx
  #         - key: app.kubernetes.io/instance
  #           operator: In
  #           values:
  #           - ingress-nginx
  #         - key: app.kubernetes.io/component
  #           operator: In
  #           values:
  #           - controller
  #       topologyKey: kubernetes.io/hostname

  # # An example of required pod anti-affinity
  # podAntiAffinity:
  #   requiredDuringSchedulingIgnoredDuringExecution:
  #   - labelSelector:
  #       matchExpressions:
  #       - key: app.kubernetes.io/name
  #         operator: In
  #         values:
  #         - ingress-nginx
  #       - key: app.kubernetes.io/instance
  #         operator: In
  #         values:
  #         - ingress-nginx
  #       - key: app.kubernetes.io/component
  #         operator: In
  #         values:
  #         - controller
  #     topologyKey: "kubernetes.io/hostname"

  # Pod disruption budget settings
  pdb:
    create: false
    minAvailable: 1
    maxUnavailable: ""

  # CircleCI Runner API URL
  runnerAPI: "https://runner.circleci.com"

  # A (preferably) unique name assigned to this particular container-agent instance.
  # This name will appear in your runners inventory page in the CircleCI UI.
  # If left unspecified, the name will default to the name of the deployment.
  name: ""

  # Tasks are drained during the termination grace period,
  # so this should be sufficiently long relative to the maximum run time to ensure graceful shutdown
  terminationGracePeriodSeconds: 18300 # 5 hours and 5 minutes
  maxRunTime: "5h"

  # Maximum number of tasks that can be run concurrently.
  # IMPORTANT: This concurrency is independent of, and may be limited by, the Runner concurrency of your plan.
  # Configure this value at your own risk based on the resources allocated to your cluster.
  maxConcurrentTasks: 20

  # Enable garbage collection of dangling Kubernetes objects managed by container agent
  kubeGCEnabled: true
  # The age of a Kubernetes object managed by container agent before the garbage collection deletes it
  kubeGCThreshold: "5h5m"

  # Name of the user provided secret containing resource class tokens. You can mix tokens from this secret
  # and in the secret created from tokens specified in the resourceClasses section below
  #
  # The tokens should be specified as secret key-value pairs of the form
  # ResourceClass: Token
  # The resource class name needs to match the names configured below exactly to match tokens to the correct configuration
  # As Kubernetes does not allow / in secret keys, a period (.) should be substituted instead
  customSecret: ""

  # Resource class settings. The tokens specified here will be used to claim tasks & the tasks
  # will be launched with the configured configs
  resourceClasses: {}
    # circleci-runner/resourceClass:
    #   token: XXXX
    #   metadata:
    #     annotations:
    #       custom.io: my-annotation
    #   spec:
    #     containers:
    #       - resources:
    #           limits:
    #             cpu: 500m
    #         volumeMounts:
    #           - name: xyz
    #             mountPath: /path/to/mount
    #     securityContext:
    #       runAsNonRoot: true
    #     imagePullSecrets:
    #       - name: my_cred
    # circleci-runner/resourceClass2:
    #   token: XXXX
    #   spec:
    #     imagePullSecrets:
    #       - name: "other"

  ## Resource class constraint validation checker settings. The checker will periodically validate the
  ## node constraints in the resource class spec to ensure task pods can be scheduled before claiming tasks
  constraintChecker:
    # Enable constraint checking (This requires at least List Node permissions)
    enable: false

    # Number of failed checks before disabling task claim
    threshold: 3

    # Check interval
    interval: 15m

# Kubernetes service account settings
serviceAccount:
  create: true
  name: ""
  automountServiceAccountToken: true
  annotations: {}

# Kubernetes Roles Based Access Control settings
rbac:
  create: true
  role:
    name: ""
    rules: []
  roleBinding:
    name: ""
  clusterRole:
    name: ""
    rules: []
  clusterRoleBinding:
    name: ""
Futa HirakobaFuta Hirakoba

設定を書き出しておく

❯ helm show values container-agent/container-agent > values.yaml
Futa HirakobaFuta Hirakoba

どうやら.agent.resourceClasses.<namespace>/<project>.tokenにPATを設定すればいいらしい。

values.yamlの一部を抜粋
  # Resource class settings. The tokens specified here will be used to claim tasks & the tasks
  # will be launched with the configured configs
  resourceClasses: {}
    # circleci-runner/resourceClass:
    #   token: XXXX
    #   metadata:
    #     annotations:
    #       custom.io: my-annotation
    #   spec:
    #     containers:
    #       - resources:
    #           limits:
    #             cpu: 500m
    #         volumeMounts:
    #           - name: xyz
    #             mountPath: /path/to/mount
    #     securityContext:
    #       runAsNonRoot: true
    #     imagePullSecrets:
    #       - name: my_cred
    # circleci-runner/resourceClass2:
    #   token: XXXX
    #   spec: 
    #     imagePullSecrets:
    #       - name: "other"
Futa HirakobaFuta Hirakoba

APIトークンを生成。

設定

  resourceClasses:
    korosuke613/playground:
      token: <見せられないよ!>
Futa HirakobaFuta Hirakoba

ていうかvalues.yamlの内容ってどう反映すればいいんだ?helm詳しくないからわからん。
ここら辺のやり方circleciのドキュメントに載ってない。載せてほしい

Futa HirakobaFuta Hirakoba
❯ helm upgrade container-agent/container-agent -n circleci -f ./values.yaml
Error: "helm upgrade" requires 2 arguments

Usage:  helm upgrade [RELEASE] [CHART] [flags]

間違えた。

❯ helm upgrade container-agent container-agent/container-agent -n circleci -f ./values.yaml
Release "container-agent" has been upgraded. Happy Helming!
NAME: container-agent
LAST DEPLOYED: Tue Oct 11 16:08:02 2022
NAMESPACE: circleci
STATUS: deployed
REVISION: 2
TEST SUITE: None

更新できた。

Futa HirakobaFuta Hirakoba

うーむ上手くいかん。

Futa HirakobaFuta Hirakoba

そもそもagentが動いてない?

❯ kubectl get -n circleci pods
NAME                               READY   STATUS    RESTARTS   AGE
container-agent-55f749d6c6-b6v24   0/1     Pending   0          16m
container-agent-5df4f9f7d7-4zt4n   0/1     Pending   0          32m
Futa HirakobaFuta Hirakoba

うーんなんでだろう。

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  2m31s (x6 over 27m)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Futa HirakobaFuta Hirakoba

kindで動かしてるのが関係してたりするんかなー
docker desktopのk8sでやってみるか

Futa HirakobaFuta Hirakoba

あーgke autopilotはスケジュールポッド無効にされてるかも

Warning  FailedScheduling  28s   gke.io/optimize-utilization-scheduler  no nodes available to schedule pods
Futa HirakobaFuta Hirakoba

待ってたら動いた。でも謎のエラーでCrashLoopBackOffに...

Events:
  Type     Reason            Age                   From                                   Message
  ----     ------            ----                  ----                                   -------
  Normal   TriggeredScaleUp  3m22s                 cluster-autoscaler                     pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/korosuke613-playground/zones/asia-east1-b/instanceGroups/gk3-autopilot-cluster-1-nap-1l62u8rn-14fe2a13-grp 0->1 (max: 1000)}]
  Warning  FailedScheduling  2m59s (x2 over 4m8s)  gke.io/optimize-utilization-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling  2m33s                 gke.io/optimize-utilization-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         104s                  gke.io/optimize-utilization-scheduler  Successfully assigned circleci/container-agent-78d9f8f6b6-6wtb2 to gk3-autopilot-cluster-1-nap-1l62u8rn-14fe2a13-jzfv
  Normal   Pulled            44s                   kubelet                                Successfully pulled image "circleci/container-agent:1.0.17278-5c2bd95" in 23.650168695s
  Normal   Pulled            25s                   kubelet                                Successfully pulled image "circleci/container-agent:1.0.17278-5c2bd95" in 15.549995251s
  Normal   Pulling           8s (x3 over 68s)      kubelet                                Pulling image "circleci/container-agent:1.0.17278-5c2bd95"
  Normal   Created           6s (x3 over 44s)      kubelet                                Created container container-agent
  Normal   Pulled            6s                    kubelet                                Successfully pulled image "circleci/container-agent:1.0.17278-5c2bd95" in 2.06933746s
  Normal   Started           5s (x3 over 43s)      kubelet                                Started container container-agent
  Warning  BackOff           1s (x7 over 24s)      kubelet                                Back-off restarting failed container
Futa HirakobaFuta Hirakoba

再度helm upgradeしたらいけた

❯ helm upgrade container-agent container-agent/container-agent -n circleci -f ./values.yaml
Release "container-agent" has been upgraded. Happy Helming!
NAME: container-agent
LAST DEPLOYED: Tue Oct 11 16:51:32 2022
NAMESPACE: circleci
STATUS: deployed
REVISION: 2
TEST SUITE: None
❯ kubectl get pods -n circleci
NAME                               READY   STATUS    RESTARTS   AGE
container-agent-79874f697f-cgw4w   1/1     Running   0          25s
Futa HirakobaFuta Hirakoba

でも状況変わらず。
jobは実行できない。

ログを覗いてみる
kubectl logs -n circleci container-agent-79874f697f-cgw4w

07:54:16 0e2dd 231.379ms claim app.loop_name=claim: korosuke613.playground error=Error calling CircleCI: the response from POST /api/v2/runner/claim was 401 (Unauthorized) (1 attempts) mode=agent result=error service.name=container-agent service_name=container-agent

認証がダメっぽい。APIトークン間違ってるかも

Futa HirakobaFuta Hirakoba

セルフホストランナーはまず同意する必要がありそう...


Organization Settings

あーページでたわ

リソースクラスを作成するとトークンも出てきた。


インストール方法ページも出た

Futa HirakobaFuta Hirakoba

トークンを更新後、再度helm upgrade container-agent container-agent/container-agent -n circleci -f ./values.yaml

登録できた!!

Futa HirakobaFuta Hirakoba

以下のconfig.ymlをpush。

.circleci/config.yml
version: 2.1

workflows:
  try-container-agent:
    jobs:
      - try-container-agent

jobs:
  try-container-agent:
    docker:
      - image: cimg/base:current
    resource_class: korosuke613/gke-autopilot
    steps:
      - checkout
      - run: echo "hello world"
❯ kubectl get pods -n circleci
NAME                                        READY   STATUS              RESTARTS   AGE
ccita-6345247bb9955825bf6de2ba-0-fozo6ngc   0/1     ContainerCreating   0          4s
container-agent-58fdf77455-bnvgk            1/1     Running             0          108s

pod生まれた!

ジョブ動いたー!

Futa HirakobaFuta Hirakoba

めちゃre-runしたけどすぐpodを立てて実行してくれた。

❯ kubectl get pods -n circleci -w
NAME                                        READY   STATUS    RESTARTS   AGE
ccita-6345258c50d0733667eaa9ed-0-knzwxxdk   1/1     Running   0          14s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   0/1     Pending   0          14s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     Pending   0          4s
container-agent-58fdf77455-bnvgk            1/1     Running   0          6m30s
ccita-6345258c50d0733667eaa9ed-0-knzwxxdk   1/1     Terminating   0          20s
ccita-6345258c50d0733667eaa9ed-0-knzwxxdk   1/1     Terminating   0          20s
ccita-6345258c50d0733667eaa9ed-0-knzwxxdk   0/1     Terminating   0          50s
ccita-6345258c50d0733667eaa9ed-0-knzwxxdk   0/1     Terminating   0          50s
ccita-6345258c50d0733667eaa9ed-0-knzwxxdk   0/1     Terminating   0          50s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   0/1     Pending       0          50s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   0/1     ContainerCreating   0          50s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   1/1     Running             0          56s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   1/1     Terminating         0          66s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   0/1     Terminating         0          97s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   0/1     Terminating         0          97s
ccita-63452593b3ebc43b41eb6774-0-lv4eflj7   0/1     Terminating         0          97s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     Pending             0          87s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     Pending             0          96s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     ContainerCreating   0          96s
ccita-63452599a9dccd19f905b57f-0-va3l0led   1/1     Running             0          2m26s
ccita-63452599a9dccd19f905b57f-0-va3l0led   1/1     Terminating         0          2m40s
ccita-63452599a9dccd19f905b57f-0-va3l0led   1/1     Terminating         0          2m40s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     Terminating         0          3m11s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     Terminating         0          3m11s
ccita-63452599a9dccd19f905b57f-0-va3l0led   0/1     Terminating         0          3m11s
Futa HirakobaFuta Hirakoba

podの状況はcircleciの画面からも見れる。

gke autopilot使ってるからかたくさんpodを立てた時に時間がかかった。

Futa HirakobaFuta Hirakoba

お片付け

gke autopilotにpod立てっぱなしにしてると課金されちゃうからね

Futa HirakobaFuta Hirakoba
❯ helm uninstall container-agent -n circleci
release "container-agent" uninstalled

削除完了

Futa HirakobaFuta Hirakoba

circleci cliで消せそう。

リソースクラス一覧
❯ circleci runner resource-class list korosuke613
+---------------------------+-------------+
|      RESOURCE CLASS       | DESCRIPTION |
+---------------------------+-------------+
| korosuke613/gke-autopilot |             |
+---------------------------+-------------+
リソースクラスの削除(失敗)
❯ circleci runner resource-class delete korosuke613/gke-autopilot
Error: resource class korosuke613/gke-autopilot still has tokens in use

tokenから先に消さないとダメ?

トークン一覧
❯ circleci runner token list korosuke613/gke-autopilot
+--------------------------------------+----------+----------------------+
|                  ID                  | NICKNAME |      CREATED AT      |
+--------------------------------------+----------+----------------------+
| 73c05bc1-9288-4ee5-b084-8b0cb0c950f0 | default  | 2022-10-11T08:04:58Z |
+--------------------------------------+----------+----------------------+
トークンの削除(失敗)
❯ circleci runner token delete korosuke613/gke-autopilot
Error: id is not a valid uuid

トークンの削除で指定するのはidだった。

トークンの削除
❯ circleci runner token delete 73c05bc1-9288-4ee5-b084-8b0cb0c950f0
リソースクラスの削除
❯ circleci runner resource-class delete korosuke613/gke-autopilot

消せた

このスクラップは2022/10/11にクローズされました