Kubernetes EventのAggregationとSpam Filteringの調査メモ

背景

CronJobの子Jobの完了を検知して、あるアクションを行うCustomControllerを作りたいと思ったときに、CronJobController (v2)が子Jobが完了した際にSawCompletedJobというEventを吐いていることを発見した。

｢お、このEventを使えば簡単に子Jobの完了を検知出来るな、よしよし｣と思ったが、ローカルクラスター (kind) で1分間に1回走るCronJobを実行してみたところ、途中 (10回目ぐらい) からSawCompletedJobが取得できなくなることに気がついた。具体的には、以下のような感じになる。

$ kubectl alpha events --for cronjob/hello
...
8m29s (x2 over 8m29s)   Normal   SawCompletedJob    CronJob/hello   Saw completed job: hello-28023907, status: Complete
8m29s                   Normal   SuccessfulDelete   CronJob/hello   Deleted job hello-28023904
7m35s                   Normal   SuccessfulCreate   CronJob/hello   Created job hello-28023908
7m28s (x2 over 7m28s)   Normal   SawCompletedJob    CronJob/hello   Saw completed job: hello-28023908, status: Complete
7m28s                   Normal   SuccessfulDelete   CronJob/hello   Deleted job hello-28023905
6m35s                   Normal   SuccessfulCreate   CronJob/hello   Created job hello-28023909
6m28s                   Normal   SawCompletedJob    CronJob/hello   Saw completed job: hello-28023909, status: Complete
2m35s (x3 over 4m35s)   Normal   SuccessfulCreate   CronJob/hello   (combined from similar events): Created job hello-28023913

6m28sまではSawCompletedJobが通知されているが、それ以降取得できなくなっている。ちなみに子Job及び子Podは毎分きちんとEventが通知されているように見える。

$ kubectl alpha events

4m18s                 Normal    Scheduled                 Pod/hello-28023914-frb94   Successfully assigned default/hello-28023914-frb94 to kind-control-plane
4m11s                 Normal    Completed                 Job/hello-28023914         Job completed
3m18s                 Normal    Started                   Pod/hello-28023915-5fsh5   Started container hello
3m18s                 Normal    SuccessfulCreate          Job/hello-28023915         Created pod: hello-28023915-5fsh5
3m18s                 Normal    Created                   Pod/hello-28023915-5fsh5   Created container hello
3m18s                 Normal    Pulled                    Pod/hello-28023915-5fsh5   Container image "busybox:1.28" already present on machine
3m18s                 Normal    Scheduled                 Pod/hello-28023915-5fsh5   Successfully assigned default/hello-28023915-5fsh5 to kind-control-plane
3m11s                 Normal    Completed                 Job/hello-28023915         Job completed
2m18s                 Normal    Started                   Pod/hello-28023916-qbqqk   Started container hello
2m18s                 Normal    Pulled                    Pod/hello-28023916-qbqqk   Container image "busybox:1.28" already present on machine
2m18s                 Normal    Created                   Pod/hello-28023916-qbqqk   Created container hello
2m18s                 Normal    SuccessfulCreate          Job/hello-28023916         Created pod: hello-28023916-qbqqk
2m18s                 Normal    Scheduled                 Pod/hello-28023916-qbqqk   Successfully assigned default/hello-28023916-qbqqk to kind-control-plane
2m11s                 Normal    Completed                 Job/hello-28023916         Job completed
78s                   Normal    SuccessfulCreate          Job/hello-28023917         Created pod: hello-28023917-kpxvn
78s                   Normal    Created                   Pod/hello-28023917-kpxvn   Created container hello
78s                   Normal    Pulled                    Pod/hello-28023917-kpxvn   Container image "busybox:1.28" already present on machine
78s                   Normal    Started                   Pod/hello-28023917-kpxvn   Started container hello
78s                   Normal    Scheduled                 Pod/hello-28023917-kpxvn   Successfully assigned default/hello-28023917-kpxvn to kind-control-plane
71s                   Normal    Completed                 Job/hello-28023917         Job completed
18s (x8 over 7m18s)   Normal    SuccessfulCreate          CronJob/hello              (combined from similar events): Created job hello-28023918
18s                   Normal    Started                   Pod/hello-28023918-grvbz   Started container hello
18s                   Normal    Created                   Pod/hello-28023918-grvbz   Created container hello
18s                   Normal    Pulled                    Pod/hello-28023918-grvbz   Container image "busybox:1.28" already present on machine
18s                   Normal    SuccessfulCreate          Job/hello-28023918         Created pod: hello-28023918-grvbz
18s                   Normal    Scheduled                 Pod/hello-28023918-grvbz   Successfully assigned default/hello-28023918-grvbz to kind-control-plane
11s                   Normal    Completed                 Job/hello-28023918         Job completed

この挙動は公式ドキュメントにも記載が見つけられなかったため、何が起こっているのかソースコードを頼りに読み解きたい。

shuheiktgw

OpenSDSのIssueに関係するKunbernetesのIssueへのリンクが有ったのでこれを手がかりに読み解いて行けそう。

https://github.com/kubernetes/kubernetes/pull/4372

ひとまずこのドキュメントを読めばどういうロジックでEvent Compressionが発生するのかざっくりとは理解できそう。ざっと読んだ感じでは

各Eventの発行元 (e.g. kubelet) がEventのHashTableを管理し、すでに発行したEventであればPostの代わりにPutでEventのLastTimestampとCountを更新する
HashTableのキーには以下のフィールドを用いる
- event.Source.Component
- event.Source.Host
- event.InvolvedObject.Kind
- event.InvolvedObject.Namespace
- event.InvolvedObject.Name
- event.InvolvedObject.UID
- event.InvolvedObject.APIVersion
- event.Reason
- event.Message

shuheiktgw

上のIssueに対応するPRは以下

https://github.com/kubernetes/kubernetes/pull/4306

Issueに書かれていることが素直に実装されている印象。ただこれではSawCompletedJobがレポートされなくなった理由の説明にはなっていないので現在の実装を追いたい。おそらく対応する現在のコードはclient-goのこの辺りだと思われる。上のPRが作られた2015年からは全く別物と言ってよいほど複雑な仕組みになっているように見える。

https://github.com/kubernetes/client-go/tree/master/tools/record

shuheiktgw

CronJobControllerのSawCompletedJobのPublishに利用されている jm.recorder の実体はrecorderImplなので、そのEventfメソッドを読み解けば実際の処理を追えるはず。

Eventfで実際にEventを送信しているのはActionOrDrop
実体はBroadcasterのActionOrDropを呼び出し、incomingチャネルへEventを送信

これだけでは何も分からないのでBroadcaster (チャネルの受信側) の処理を読み解く。

shuheiktgw

IncommingチャネルからのEventの取り出しとresultチャネルへの転送

jm.recorderはCronJobControllerの作成時にeventBroadcasterImplのNewRecorderメソッドを用いて初期化されている
Broadcasterの実体はeventBroadcasterImplのBroadcasterでeventBroadcasterImplはNewBroadcasterファンクションを用いて初期化されている
incomingチャネルへ送信されたEventはNewLongQueueBroadcasterファンクションで起動したloopゴルーチンにより取り出され、各Watcherのresultチャネルに転送される

Watcherの起動とresultチャネルからのEventの取り出し

CronJobControllerはeventBroadcasterImplのStartRecordingToSinkメソッドを呼び出し、StartEventWatcherメソッドからWatcherを起動する
StartEventWatcherはBroadcasterのWatchメソッドを通じてWatcherの初期化と登録を行う
- 結構面白いなーと思ったのは、Wathcerの登録処理をincommingチャネルに送信し、loopゴルーチンが実行することで、Wacherの起動前に送信されたEventをWatcherから見えなくしている点 (コメントにはterrible hackと書かれているけど、結構スマートだと個人的には思ったw)

Watcherの処理

Watcherはゴルーチンの中でひたすらresultチャネルから送信されてきたEventに対してeventHandlerを呼び続ける
evenHandlerは中でeventBroadcasterImplのrecordToSinkを呼ぶ
recordToSinkはeventCorrelatorを使ってEventのアグリゲーションとスパムフィルタリングを行ったあと、recordEventを呼び出し、Eventの作成もしくは既存のEventの更新を行う (Sinkが何者かは後で調べる)

EventCorrelatorの処理

eventBroadcasterImplのStartRecordingToSinkメソッド内でNewEventCorrelatorWithOptionsファンクションを通じて初期化されている
- e.optionsは空なので全てデフォルトの設定になる
eventBroadcasterImplのrecordToSinkメソッドでEventCorrelateメソッドが呼ばれてEventのアグリゲーションとスパムフィルタリングを実行

Eventのアグリゲーション

EventCorrelatorのEventCorrelateメソッド内でEventAggregatorのEventAggregateメソッドを呼び出す
EventAggregateメソッドでは始めに e.keyFunc を用いてEventから aggregateKey と localKey を計算する
- e.keyFuncにはデフォルトであるEventAggregatorByReasonFuncが使われているはず。この場合 aggregationKey は event.Source, event.InvolvedObject, event.Type, event.Reason, event.ReportingController and event.ReportingInstance で localKey は event.Message になる
aggregateKey を用いてEventAggregatorのcacheを検索する。cacheのバリューはaggregateRecord。
もしaggregateRecordのlastTimestampがEventAggregatorのmaxIntervalInSeconds (デフォルトは600秒) よりも古ければ新しいaggregateRecordを作成する
もしaggregateRecordのlocalKeysの数をチェックし、EventAggregatorのmaxEvents (デフォルトは10)よりも少なければ、eventKey をキーとして返す。そうでなければaggregationKeyを返す
EventCorrelatorのEventCorrelateメソッドから返されてキーを元にeventLoggerのeventObserveメソッドを呼び出し、eventLoggerのcacheを探索する。cacheのバリューはeventLog
eventLoggerのキャッシュにヒットすればEventのCount等のフィールドを更新したEventを返す

Eventのスパムフィルタリング

EventCorrelatorのfilterFuncを呼び出し、スパムフィルターを適応する。filterFuncの実体はEventSourceObjectSpamFilterのFilterメソッド
Eventに対してspamKeyFuncを使って eventKey を計算する。spamKeyFuncの実体はgetSpamKeyファンクション
eventKey を使ってEventSourceObjectSpamFilterのcacheを検索する。cacheのバリューはspamRecordでspamRecordの中身はrate limitter
- spamFiletrのqpsとburstはデフォルト値なのでqpsが1/300でburstが25
- コメントによればこのtoken bucker方式のrate limitterは初めにburst分のトークンが与えられ、その後qpsのペースでトークンが補充されるらしい。つまり初めに25トークンあり、その後は5分間 (300秒) に一度トークンが補充される
tokenBucketPassiveRateLimiterのTryAcceptメソッドを呼び出し、Rate Limitを確認する。もしRate Limitに引っかかっていた場合Eventは送信されない

shuheiktgw

以上を踏まえて何が起こっていたのか推測をまとめる。端的に言うとスパムフィルターに引っかかっているのではという予想

CronJobControllerはJobごとにSuccessfulCreate、SawCompletedJob、SuccessfulDeleteという3つのEventを1分間に発行する
- 厳密にはSuccessfulDeleteはHistoryLimitに達したタイミングからしか発行されない
デフォルトのspamFiletrはEventのSourceとInvolvedObjectしか見ていないため、これら3つのイベントは同じEventとして判定される
初期の25トークンは間に3トークンずつ消費されていく。5分に一度トークンは補充されるが9分を経過した辺りでトークンが枯渇し、以後5分ごとにトークンが補充されるがそれも (Aggregated) SuccessfulCreate Eventに消費されてしまうため、SawCompletedJobとSuccessfulDeleteは以後一度も通知されなくなる

shuheiktgw

補足1 Events

SuccessfulCreate Event

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2023-04-15T02:08:00Z"
involvedObject:
  apiVersion: batch/v1
  kind: CronJob
  name: hello
  namespace: default
  resourceVersion: "288987"
  uid: 5f3cfeca-8a83-452a-beb9-7a5f9c1eff63
kind: Event
lastTimestamp: "2023-04-15T02:08:00Z"
message: Created job hello-28025408
metadata:
  creationTimestamp: "2023-04-15T02:08:00Z"
  name: hello.1755f99b05f650ef
  namespace: default
  resourceVersion: "289021"
  uid: d0427cc3-2ac6-4616-81b2-e46283d263d8
reason: SuccessfulCreate
reportingComponent: ""
reportingInstance: ""
source:
  component: cronjob-controller
type: Normal

SawCompletedJob Event

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2023-04-15T02:08:34Z"
involvedObject:
  apiVersion: batch/v1
  kind: CronJob
  name: hello
  namespace: default
  resourceVersion: "289020"
  uid: 5f3cfeca-8a83-452a-beb9-7a5f9c1eff63
kind: Event
lastTimestamp: "2023-04-15T02:08:34Z"
message: 'Saw completed job: hello-28025408, status: Complete'
metadata:
  creationTimestamp: "2023-04-15T02:08:34Z"
  name: hello.1755f9a3024e63f6
  namespace: default
  resourceVersion: "289080"
  uid: 9abb33e5-5e3d-489d-ae8b-d7b958ec2cc4
reason: SawCompletedJob
reportingComponent: ""
reportingInstance: ""
source:
  component: cronjob-controller
type: Normal

SuccessfulDelete Event

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2023-04-15T02:14:34Z"
involvedObject:
  apiVersion: batch/v1
  kind: CronJob
  name: hello
  namespace: default
  resourceVersion: "289520"
  uid: 5f3cfeca-8a83-452a-beb9-7a5f9c1eff63
kind: Event
lastTimestamp: "2023-04-15T02:14:34Z"
message: Deleted job hello-28025408
metadata:
  creationTimestamp: "2023-04-15T02:14:34Z"
  name: hello.1755f9f6eace14e0
  namespace: default
  resourceVersion: "289584"
  uid: ec14de7c-46b4-4126-a269-ff57160d248a
reason: SuccessfulDelete
reportingComponent: ""
reportingInstance: ""
source:
  component: cronjob-controller
type: Normal

shuheiktgw

補足2 Sink

最終的にEventを送る際にはrecordEventファンクションからEventSinkのPatchもしくはCreateが呼ばれているが、EventSinkが何者なのか知りたい。

実体はEventSinkImpleでEventInterfaceの単なるラッパー
単純にEventのクライアントをラップしているだけなのでおそらくテストを目的としてインターフェイスを噛ませているだけ?