💥
kubenetesクラスタに突然接続できなくなった (証明書の期限切れ)
前ふり
- kubernetes 1.18.x
tl;dr;
- 証明書の期限切れ
流れ
- 2020/07/28 の昼頃から、kubectl を使って pod に処理を実行させる系のジョブがすべてコケる
- 自分の PC から kubectl コマンドを打つと
error: You must be logged in to the server (Unauthorized)
がでる - ググりまくるもののまったく成果なし。
- kubectl のバグを踏んだのではないかとおもい、コントロールノードの kubeadm, kubectl, kubelet をアップグレードした(本当は良くない)
- 再度 kubectl コマンドを自分の端末から打つと、
The connection to the server 192.168.10.190:6443 was refused - did you specify the right host or port?
となった - ここで、kubelet の状態を
systemctl status kubelet
で確認したところ、activating
となっており起動できてないということがわかる - kubelet のログを
journalctl -u kubelet
で見たところ、part of the existing bootstrap client certificate is expired: 2020-07-27 06:24:58 +0000 UTC
というメッセージが出て異常終了していた。
復旧手順
<code class="language-sh"># バックアップを取得
sudo su -
mkdir ~/k8s_backup_20200728
cp -rva /etc/kubernetes ~/k8s_backup_20200728/
<code class="language-sh"># とりあえずkubelet停止。
sudo systemctl stop kubelet
<code class="language-sh"># kubeadm init phase certs all
W0728 00:52:10.502471 3790 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
この時点ではまだ kubelet が起動してこない
`# kubeadm alpha certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration
W0728 00:58:18.634587 4408 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
これでもダメだった
# cd /etc/kubernetes/pki
# mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/k8s_backup_20200728/
# kubeadm init phase certs all --apiserver-advertise-address 192.168.10.190
W0728 01:03:46.960780 4926 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubemaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.10.190]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Using the existing "sa" key
# cd /etc/kubernetes
# mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} ~/k8s_backup_20200728/
# kubeadm init phase kubeconfig all
W0728 01:05:49.334781 5092 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
これで kubelet が起動するようになった。大量にエラーを吐いているが起動はするようなので、
ここで一度 reboot
を行った
`# ~/.kube/config を置換
$ sudo cp /etc/kubernetes/admin.conf ~/.kube/config
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58:53Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:39:24Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version が表示されていればOK!
最後に
- kubectl を使用する端末の ~/.kube/config を置き換えていってください。
- kubectl get nodes した結果、既存のノードとの接続は切れていませんでした。
- kubelet が起動していない間もクラスタの管理ができないだけで、起動済みの Pod は正常に可動していました。
- 正直、肝を冷やしたので勘弁してほしい。
蛇足
証明書の確認
今回の件と関係ないような気もするが、クラスタ内で使用される証明書の期限を一括表示できる。
- https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/
-
kubeadm alpha certs check-expiration
(新しいkubeadmならkubeadm certs check-expiration
)
誕生日
<code class="language-sh">$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster Ready master 365d v1.18.6
kubeworker1 Ready <none> 352d v1.18.5
kubeworker2 Ready <none> 352d v1.18.5
kubeworker3 Ready <none> 350d v1.18.5
とんだ誕生日プレゼントだった
Discussion