Closed6

Chaos meshでカオスエンジニアリング

daiskobadaiskoba

導入

ローカル環境での導入

Run Your First Chaos Experiment in 10 Minutes ではkind or Minikubeが紹介されている.
~今回はMac上でkindを使う~ kindはarm未対応だった.

Helmを使ってクラスタにインストール

GKEにインストール

https://chaos-mesh.org/docs/production-installation-using-helm/

% k get nodes -owide
NAME                                        STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP     OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
gke-chaos-mesh-default-pool-efa3374e-czdh   Ready    <none>   94m   v1.21.5-gke.1802   10.128.0.9    34.71.110.66    Container-Optimized OS from Google   5.4.144+         containerd://1.4.8
gke-chaos-mesh-default-pool-efa3374e-jcfg   Ready    <none>   94m   v1.21.5-gke.1802   10.128.0.8    34.132.77.142   Container-Optimized OS from Google   5.4.144+         containerd://1.4.8
gke-chaos-mesh-default-pool-efa3374e-m6s9   Ready    <none>   94m   v1.21.5-gke.1802   10.128.0.10   34.66.215.75    Container-Optimized OS from Google   5.4.144+         containerd://1.4.8
% k create ns chaos-testing
% helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-testing --set chaosDaemon.runtime=containerd --set chaosDaemon.socketPath=/run/containerd/containerd.sock --set prometheus.create=true  --set dashboard.create=true --version 2.0.5
board.create=true --version 2.0.5
NAME: chaos-mesh
LAST DEPLOYED: Mon Nov 29 22:30:15 2021
NAMESPACE: chaos-testing
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Make sure chaos-mesh components are running
   kubectl get pods --namespace chaos-testing -l app.kubernetes.io/instance=chaos-mesh

namespace: chaos-testingで稼働している

% kubectl get pods --namespace chaos-testing -l app.kubernetes.io/instance=chaos-mesh -w
NAME                                        READY   STATUS    RESTARTS   AGE
chaos-controller-manager-86756484c9-n7ntq   1/1     Running   0          34s
chaos-daemon-gmcng                          1/1     Running   0          35s
chaos-daemon-m4hcl                          1/1     Running   0          35s
chaos-daemon-p88sh                          1/1     Running   0          35s
chaos-dashboard-765547fbcb-wdlcr            1/1     Running   0          34s
chaos-prometheus-6b9f855bb4-vgtv5           1/1     Running   0          34s
daiskobadaiskoba

機能

https://chaos-mesh.org/docs/basic-features/

  • Basic resource faults:
    • PodChaos: simulates Pod failures, such as Pod node restart, Pod's persistent unavailablility, and certain container failures in a specific Pod.
    • NetworkChaos: simulates network failures, such as network latency, packet loss, packet disorder, and network partitions.
    • DNSChaos: simulates DNS failures, such as the parsing failure of DNS domain name and the wrong IP address returned.
    • HTTPChaos: simulates HTTP communication failures, such as HTTP communication latency.
    • StressChaos: simulates CPU race or memory race.
    • IOChaos: simulates the I/O failure of an application file, such as I/O delays, read and write failures.
    • TimeChaos: simulates the time jump exception.
    • KernelChaos: simulates kernel failures, such as an exception of the application memory allocation.
  • Platform faults:
    • AWSChaos: simulates AWS platform failures, such as the AWS node restart.
    • GCPChaos: simulates GCP platform failures, such as the GCP node restart.
  • Application faults:
    • JVMChaos: simulates JVM application failures, such as the function call delay.
daiskobadaiskoba

GUIを使った設定

User Permission

Manage User Permissions に従う.

GUIを提供するchaos-dashboardから設定する.
リモートから接続するためにServiceのポートを調べて,

❯❯❯ kubectl get svc -n chaos-testing
NAME                            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                       AGE
chaos-daemon                    ClusterIP   None           <none>        31767/TCP,31766/TCP           2d23h
chaos-dashboard                 NodePort    10.72.13.237   <none>        2333:32669/TCP                2d23h
chaos-mesh-controller-manager   ClusterIP   10.72.0.138    <none>        443/TCP,10081/TCP,10080/TCP   2d23h
chaos-prometheus                ClusterIP   10.72.0.232    <none>        9090/TCP                      2d23h

port-forwardでローカルの8080ポートと繋ぐ.

❯❯❯ kubectl port-forward svc/chaos-dashboard 8080:2333 -nchaos-testing 
Forwarding from 127.0.0.1:8080 -> 2333
Forwarding from [::1]:8080 -> 2333

ブラウザで http://localhost:8080/dashboard を開くとRBACの設定画面が表示される.

Click here to generate からテンプレートを使った権限が設定できる.

  • スコープ: クラスタ全体 or namespace単位
  • Role: 管理者権限 or viewer

設定例

Cluster全体のmanager権限を付けた例

rbac.yaml
kind: ServiceAccount
apiVersion: v1
metadata:
  namespace: default
  name: account-default-manager-gocbk

---
kind: ServiceAccount
apiVersion: v1
metadata:
  namespace: default
  name: account-cluster-manager-bxwkr

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: role-cluster-manager-bxwkr
rules:
- apiGroups: [""]
  resources: ["pods", "namespaces"]
  verbs: ["get", "watch", "list"]
- apiGroups:
  - chaos-mesh.org
  resources: [ "*" ]
  verbs: ["get", "list", "watch", "create", "delete", "patch", "update"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: bind-cluster-manager-bxwkr
subjects:
- kind: ServiceAccount
  name: account-cluster-manager-bxwkr
  namespace: default
roleRef:
  kind: ClusterRole
  name: role-cluster-manager-bxwkr
  apiGroup: rbac.authorization.k8s.io

クラスタへのRBAC登録とChaosMeshとの紐付け

% kubectl apply -f rbac.yaml
serviceaccount/account-cluster-manager-bxwkr created
clusterrole.rbac.authorization.k8s.io/role-cluster-manager-bxwkr created
clusterrolebinding.rbac.authorization.k8s.io/bind-cluster-manager-bxwkr created

%  kubectl describe secrets account-cluster-manager-bxwkr
Name:         account-cluster-manager-bxwkr-token-m2dz6
Namespace:    default
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: account-cluster-manager-bxwkr
              kubernetes.io/service-account.uid: cc9e63ea-ec00-4c0b-bb65-d1b11a08691f

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1509 bytes
namespace:  7 bytes
token:      eyJhbG...

describeで取得したTokenをRBACの設定画面に反映する.

daiskobadaiskoba

テストの実行

Chaos Meshはテストを experiment と呼んでいる.
experimentで障害を起こしたいテストを定義して,単体で実行するかworkflowで一連のシナリオとして扱う.

特定のPodにCPU負荷を加えるテストを実行する.

GUIに実行経過が表示されて,

Podに負荷も掛かっている.

このスクラップは2023/01/02にクローズされました