Kubernetes 1.36: SIG-Instrumentationの変更内容
はじめに
本記事では Kubernetes v1.36 の Changelog から、SIG-Instrumentation 関連及びメトリクスの変更点について取り上げ、まとめています。
- Kubernetes 1.31 SIG Instrumentation の変更内容
- Kubernetes 1.32 SIG Instrumentation の変更内容
- Kubernetes 1.33: SIG-Instrumentationの変更内容
- Kubernetes 1.34: SIG-Instrumentationの変更内容
- Kubernetes 1.35: SIG-Instrumentationの変更内容
Changes by kind
Upgrade Notes
-
kube-controller-manager: メトリクス
volume_operation_total_errorsがvolume_operation_errors_totalにリネームされました。既存のダッシュボードやアラートで旧メトリクス名を使っている場合は修正が必要です。 (#136399)- metrics diff 上では旧メトリクスが
deprecatedVersion: 1.36.0となり、新メトリクスが追加されています。
- metrics diff 上では旧メトリクスが
-
etcd_bookmark_countsがetcd_bookmark_totalにリネームされました。こちらも既存の監視設定を更新する必要があります。 (#136483)- 旧メトリクスは deprecated になり、新しい counter メトリクスへ移行する形です。
Deprecation
-
volume_operation_total_errorsは 1.36 で deprecated となり、volume_operation_errors_totalへ移行します。 (#136399) -
etcd_bookmark_countsは 1.36 で deprecated となり、etcd_bookmark_totalへ移行します。 (#136483)
API Changes
-
config.k8s.io/flagzとconfig.k8s.io/statuszがv1beta1に昇格しました。 (#137174, #137173)- 1.35 では
v1alpha1だった structured / versioned response が 1.36 で beta になっています。
- 1.35 では
-
/flagzと/statuszで YAML 形式のレスポンスが利用できるようになりました。 (#135309)- JSON に加えて YAML でも機械処理しやすくなっています。
-
/flagzと/statuszがapiserver_request_totalとapiserver_request_duration_secondsで計測されるようになりました。 (#137021)- content negotiation された API version に応じて
group/versionlabel が反映されます。
- content negotiation された API version に応じて
-
Manifest-based admission control configuration (KEP-5793) が alpha として追加されました。 (#137346)
- これに関連して
apiserver_manifest_admission_config_controller_*メトリクスが追加されています。apiserver_manifest_admission_config_controller_last_config_infoapiserver_manifest_admission_config_controller_automatic_reloads_totalapiserver_manifest_admission_config_controller_automatic_reload_last_timestamp_seconds
- これに関連して
Features
-
Prometheus native histogram support が
kube-apiserver、kube-controller-manager、kube-schedulerで有効化可能になりました。 (KEP-5808: Native Histogram Support for Kubernetes Metrics, #136763, #137779, #137466)-
NativeHistogramsfeature gate 有効時に classic histogram と native histogram の両方が公開されます。 - 動的に適切なbucket分割が自動で行われるようになるためより正確なhistogramを見ることができるようになります。
- https://kubernetes.io/docs/reference/instrumentation/native-histograms/
-
-
さまざまな既存メトリクスが alpha から beta に昇格しました。 (#136314, #136086, #136368, #136154, #136155, #136178, #136367, #135522)
apiserver_storage_events_received_totalwatch_list_duration_seconds- EndpointSlice 関連メトリクス
- component-base 関連メトリクス (
kubernetes_build_info,running_managed_controllersなど) - scheduler 関連メトリクス
- HPA 関連メトリクス
- Job controller 関連メトリクス
- workqueue 関連メトリクス
-
informer 関連の新メトリクスが追加されました。 (KEP-4346: Informer Metrics, #135782, #137419, #137101)
informer_queued_itemsinformer_store_resource_versioninformer_processing_latency_seconds
-
k8s.io/client-go/transportに関する自動 CA reload / TLS cache GC のメトリクスが追加されました。 (#132922, #136355)rest_client_transport_ca_reload_totalrest_client_transport_cache_gc_calls_totalrest_client_transport_cert_rotation_gc_calls_total
-
kubelet 関連でもいくつかメトリクスが増えています。 (#137453, #137780, #137719)
-
kubelet_terminated_containers_totalは終了したコンテナ数を exit code ごとに追跡します。 (#137453) -
kubelet_websocket_streaming_requests_totalは kubelet が受ける exec / attach / portforward を計測します。 -
kubelet_metrics_providerは container stats の収集元がcadvisorかcriかを示します。
-
-
コントローラ系では stale watch cache に起因する skip を示すメトリクスが追加されています。 (KEP-5647: Stale Controller Mitigation)
Documentation
- 自動生成される metrics reference documentation に component と endpoint の情報が追加されました。 (#136360)
Bug or Regression
-
apiserver_watch_cache_resource_versionにおいて watch cache の resource version メトリクスが 下15 桁に truncate されるようになりました。 (#137615)-
float64に載せた際の精度問題を避ける意図のようです。 -
informer_store_resource_versionでも同様の対応がとられています。
-
float64(resourceVersion % 1000000000000000)
- 一部メトリクスが実際のレイテンシではなくほぼ 0 に近い値を記録していた不具合が修正されました。 (#135749)
event_handling_duration_secondspreemption_goroutines_duration_secondsrun_podsandbox_duration_secondsstore_schedule_results_duration_seconds
// --- 1. ❌ INCORRECT WAY: Immediate Argument Evaluation ---
// The function time.Since(start) is an ARGUMENT to fmt.Println.
// Go's defer rule states that arguments are evaluated IMMEDIATELY
// when the defer statement is executed.
// Result: Reports a time duration close to zero.
defer fmt.Println("❌ Incorrect Value (Immediate Evaluation):", time.Since(start))
// --- 2. ✅ CORRECT WAY: Using a Closure/Anonymous Function ---
// The deferred function is now the anonymous function itself (func() {}).
// The code inside the closure (time.Since(start)) is NOT evaluated
// until the function exits and the deferred call is actually made.
// Result: Reports the true duration of the main function's execution.
defer func() {
fmt.Println("✅ Correct Value (Delayed Evaluation):", time.Since(start))
}()
Other (Cleanup or Flake)
Kubernetes Metrics Changes: v1.35.0 → v1.36.0
自動生成したメトリクスの差分の一覧を掲載しています。
実装はこちらにありますので表示形式などフィードバックがあれば歓迎いたします。 https://github.com/tsuzu/k8s-metrics-changes
Summary
- Added: 44 metrics
- Removed: 0 metrics
- Updated: 25 metrics
- Total Changes: 69 metrics
Changed Metrics
Detailed Changes
apiserver_impersonation_attempts_duration_seconds
+- name: attempts_duration_seconds
+ subsystem: impersonation
+ namespace: apiserver
+ help: Latency of impersonation attempts in seconds split by mode and decision.
+ type: Histogram
+ stabilityLevel: ALPHA
+ labels:
+ - decision
+ - mode
+ buckets:
+ - 0.001
+ - 0.002
+ - 0.004
+ - 0.008
+ - 0.016
+ - 0.032
+ - 0.064
+ - 0.128
+ - 0.256
+ - 0.512
+ - 1.024
+ - 2.048
+ - 4.096
+ - 8.192
+ - 16.384
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_impersonation_attempts_total
+- name: attempts_total
+ subsystem: impersonation
+ namespace: apiserver
+ help: Total number of impersonation attempts split by mode and decision.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - decision
+ - mode
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_impersonation_authorization_attempts_duration_seconds
+- name: authorization_attempts_duration_seconds
+ subsystem: impersonation
+ namespace: apiserver
+ help: Latency of authorization checks made by the impersonation handler in seconds split by mode and decision.
+ type: Histogram
+ stabilityLevel: ALPHA
+ labels:
+ - decision
+ - mode
+ buckets:
+ - 0.001
+ - 0.002
+ - 0.004
+ - 0.008
+ - 0.016
+ - 0.032
+ - 0.064
+ - 0.128
+ - 0.256
+ - 0.512
+ - 1.024
+ - 2.048
+ - 4.096
+ - 8.192
+ - 16.384
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_impersonation_authorization_attempts_total
+- name: authorization_attempts_total
+ subsystem: impersonation
+ namespace: apiserver
+ help: Total number of authorization checks made by the impersonation handler split by mode and decision.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - decision
+ - mode
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_manifest_admission_config_controller_automatic_reload_last_timestamp_seconds
+- name: automatic_reload_last_timestamp_seconds
+ subsystem: manifest_admission_config_controller
+ namespace: apiserver
+ help: Timestamp of the last automatic reload of admission manifest configuration split by status, plugin, and apiserver identity.
+ type: Gauge
+ stabilityLevel: ALPHA
+ labels:
+ - apiserver_id_hash
+ - plugin
+ - status
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_manifest_admission_config_controller_automatic_reloads_total
+- name: automatic_reloads_total
+ subsystem: manifest_admission_config_controller
+ namespace: apiserver
+ help: Total number of automatic reloads of admission manifest configuration split by status, plugin, and apiserver identity.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - apiserver_id_hash
+ - plugin
+ - status
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_manifest_admission_config_controller_last_config_info
+- name: apiserver_manifest_admission_config_controller_last_config_info
+ help: Information about the last applied admission manifest configuration with hash as label, split by plugin and apiserver identity.
+ type: Custom
+ stabilityLevel: ALPHA
+ labels:
+ - plugin
+ - apiserver_id_hash
+ - hash
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_peer_discovery_sync_errors_total
+- name: peer_discovery_sync_errors_total
+ subsystem: apiserver
+ help: Total number of errors encountered while syncing discovery information from a peer kube-apiserver
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - type
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_peer_proxy_errors_total
+- name: peer_proxy_errors_total
+ subsystem: apiserver
+ help: Total number of errors encountered while proxying requests to a peer kube apiserver
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ - type
+ - version
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_rerouted_request_total
- name: rerouted_request_total
subsystem: apiserver
- help: Total number of requests that were proxied to a peer kube apiserver because the local apiserver was not capable of serving it
+ help: '`Total number of requests that were proxied to a peer kube-apiserver because the local apiserver was not capable of serving it, broken down by ''group'', ''version'', and ''resource'' indicating the GVR of the request. If all three are empty (""), the request is a discovery request.`'
type: Counter
stabilityLevel: ALPHA
labels:
- code
+ - group
+ - resource
+ - version
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_storage_events_received_total
- name: storage_events_received_total
subsystem: apiserver
help: Number of etcd events received split by kind.
type: Counter
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- group
- resource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_watch_cache_resource_version
- name: resource_version
subsystem: watch_cache
namespace: apiserver
- help: Current resource version of watch cache broken by resource type.
+ help: Current resource version of watch cache broken by resource type. This is truncated to the 15 least significant digits.
type: Gauge
stabilityLevel: ALPHA
labels:
- group
- resource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_watch_filtered_events_total
+- name: watch_filtered_events_total
+ namespace: apiserver
+ help: Counter of events filtered out by shard selector during watch dispatch, broken by resource type.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_watch_list_duration_seconds
- name: watch_list_duration_seconds
subsystem: apiserver
help: Response latency distribution in seconds for watch list requests broken by group, version, resource and scope.
type: Histogram
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- group
- resource
- scope
- version
buckets:
- 0.05
- 0.1
- 0.2
- 0.4
- 0.6
- 0.8
- 1
- 2
- 4
- 6
- 8
- 10
- 15
- 20
- 30
- 45
- 60
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_watch_shards_total
+- name: watch_shards_total
+ namespace: apiserver
+ help: Number of active sharded watch connections broken by resource type.
+ type: Gauge
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
apiserver_websocket_streaming_requests_total
+- name: websocket_streaming_requests_total
+ subsystem: apiserver
+ help: Total number of WebSocket streaming requests (exec/attach/portforward) routed by the API server, labeled by subresource and proxy_type. proxy_type is proxied_to_kubelet when the kubelet handles the request directly; otherwise translated_at_apiserver.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - proxy_type
+ - subresource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
daemonset_controller_stale_sync_skips_total
+- name: stale_sync_skips_total
+ subsystem: daemonset_controller
+ help: Total number of DaemonSet syncs skipped due to a stale watch cache.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
endpoint_slice_controller_desired_endpoint_slices
- name: desired_endpoint_slices
subsystem: endpoint_slice_controller
help: Number of EndpointSlices that would exist with perfect endpoint allocation
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
endpoint_slice_controller_endpoints_added_per_sync
- name: endpoints_added_per_sync
subsystem: endpoint_slice_controller
help: Number of endpoints added on each Service sync
type: Histogram
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
buckets:
- 2
- 4
- 8
- 16
- 32
- 64
- 128
- 256
- 512
- 1024
- 2048
- 4096
- 8192
- 16384
- 32768
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
endpoint_slice_controller_endpoints_desired
- name: endpoints_desired
subsystem: endpoint_slice_controller
help: Number of endpoints desired
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
endpoint_slice_controller_endpoints_removed_per_sync
- name: endpoints_removed_per_sync
subsystem: endpoint_slice_controller
help: Number of endpoints removed on each Service sync
type: Histogram
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
buckets:
- 2
- 4
- 8
- 16
- 32
- 64
- 128
- 256
- 512
- 1024
- 2048
- 4096
- 8192
- 16384
- 32768
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
endpoint_slice_controller_num_endpoint_slices
- name: num_endpoint_slices
subsystem: endpoint_slice_controller
help: Number of EndpointSlices
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
endpoint_slice_controller_services_count_by_traffic_distribution
- name: services_count_by_traffic_distribution
subsystem: endpoint_slice_controller
help: Number of Services using some specific trafficDistribution
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- traffic_distribution
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
etcd_bookmark_counts
- name: etcd_bookmark_counts
help: Number of etcd bookmarks (progress notify events) split by kind.
type: Gauge
+ deprecatedVersion: 1.36.0
stabilityLevel: ALPHA
labels:
- group
- resource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
etcd_bookmark_total
+- name: etcd_bookmark_total
+ help: Number of etcd bookmarks (progress notify events) split by kind.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-apiserver
+ endpoint: /metrics
horizontal_pod_autoscaler_controller_metric_computation_duration_seconds
- name: metric_computation_duration_seconds
subsystem: horizontal_pod_autoscaler_controller
help: The time(seconds) that the HPA controller takes to calculate one metric. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. The label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type
type: Histogram
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- action
- error
- metric_type
buckets:
- 0.001
- 0.002
- 0.004
- 0.008
- 0.016
- 0.032
- 0.064
- 0.128
- 0.256
- 0.512
- 1.024
- 2.048
- 4.096
- 8.192
- 16.384
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
horizontal_pod_autoscaler_controller_metric_computation_total
- name: metric_computation_total
subsystem: horizontal_pod_autoscaler_controller
help: Number of metric computations. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type
type: Counter
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- action
- error
- metric_type
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
horizontal_pod_autoscaler_controller_reconciliation_duration_seconds
- name: reconciliation_duration_seconds
subsystem: horizontal_pod_autoscaler_controller
help: The time(seconds) that the HPA controller takes to reconcile once. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label.
type: Histogram
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- action
- error
buckets:
- 0.001
- 0.002
- 0.004
- 0.008
- 0.016
- 0.032
- 0.064
- 0.128
- 0.256
- 0.512
- 1.024
- 2.048
- 4.096
- 8.192
- 16.384
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
horizontal_pod_autoscaler_controller_reconciliations_total
- name: reconciliations_total
subsystem: horizontal_pod_autoscaler_controller
help: Number of reconciliations of HPA controller. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label.
type: Counter
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- action
- error
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
informer_processing_latency_seconds
+- name: processing_latency_seconds
+ subsystem: informer
+ help: Time taken to process events after popping from the queue.
+ type: Histogram
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - name
+ - resource
+ - version
+ buckets:
+ - 0.001
+ - 0.005
+ - 0.01
+ - 0.025
+ - 0.05
+ - 0.1
+ - 0.25
+ - 0.5
+ - 1
+ - 2.5
+ - 5
+ - 10
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
informer_queued_items
+- name: queued_items
+ subsystem: informer
+ help: Number of items currently queued in the FIFO.
+ type: Gauge
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - name
+ - resource
+ - version
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
informer_store_resource_version
+- name: store_resource_version
+ subsystem: informer
+ help: The 15 least significant digits of the resource version of the store.
+ type: Gauge
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - name
+ - resource
+ - version
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
job_controller_pod_failures_handled_by_failure_policy_total
- name: pod_failures_handled_by_failure_policy_total
subsystem: job_controller
help: |-
`The number of failed Pods handled by failure policy with
respect to the failure policy action applied based on the matched
rule. Possible values of the action label correspond to the
possible values for the failure policy rule action, which are:
"FailJob", "Ignore" and "Count".`
type: Counter
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- action
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
job_controller_stale_sync_skips_total
+- name: stale_sync_skips_total
+ subsystem: job_controller
+ help: Total number of Job syncs skipped due to a stale watch cache.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
job_controller_terminated_pods_tracking_finalizer_total
- name: terminated_pods_tracking_finalizer_total
subsystem: job_controller
help: |-
`The number of terminated pods (phase=Failed|Succeeded)
that have the finalizer batch.kubernetes.io/job-tracking
The event label can be "add" or "delete".`
type: Counter
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- event
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
kubelet_memory_qos_node_memory_low_bytes
+- name: memory_qos_node_memory_low_bytes
+ subsystem: kubelet
+ help: Total cgroup v2 memory.low in bytes for Burstable pods. This memory is soft-reserved and may be reclaimed under extreme pressure.
+ type: Gauge
+ stabilityLevel: ALPHA
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubelet_memory_qos_node_memory_min_bytes
+- name: memory_qos_node_memory_min_bytes
+ subsystem: kubelet
+ help: Total cgroup v2 memory.min in bytes for Guaranteed pods. This memory is hard-reserved and never reclaimed by the kernel.
+ type: Gauge
+ stabilityLevel: ALPHA
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubelet_metrics_provider
+- name: metrics_provider
+ subsystem: kubelet
+ help: Metrics provider used by kubelet to collect container stats. Values can be 'cadvisor' and 'cri'
+ type: Gauge
+ stabilityLevel: ALPHA
+ labels:
+ - provider
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubelet_pleg_pod_relist_duration_seconds
+- name: pleg_pod_relist_duration_seconds
+ subsystem: kubelet
+ help: Duration in seconds for relisting a single pod in PLEG.
+ type: Histogram
+ stabilityLevel: ALPHA
+ buckets:
+ - 0.005
+ - 0.01
+ - 0.025
+ - 0.05
+ - 0.1
+ - 0.25
+ - 0.5
+ - 1
+ - 2.5
+ - 5
+ - 10
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubelet_pod_watch_events_dropped_total
+- name: pod_watch_events_dropped_total
+ subsystem: kubelet
+ help: Cumulative number of pod watch events dropped.
+ type: Counter
+ stabilityLevel: ALPHA
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubelet_terminated_containers_total
+- name: terminated_containers_total
+ subsystem: kubelet
+ help: Cumulative number of container terminations.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - container_type
+ - exit_code
+ - reason
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubelet_websocket_streaming_requests_total
+- name: websocket_streaming_requests_total
+ subsystem: kubelet
+ help: Total number of WebSocket streaming requests (exec/attach/portforward) received by the kubelet.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - subresource
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
kubernetes_build_info
- name: kubernetes_build_info
help: A metric with a constant '1' value labeled by major, minor, git version, git commit, git tree state, build date, Go version, and compiler from which Kubernetes was built, and platform on which it is running.
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- build_date
- compiler
- git_commit
- git_tree_state
- git_version
- go_version
- major
- minor
- platform
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
latency
+- name: latency
+ type: Summary
+ stabilityLevel: ALPHA
+ labels:
+ - node
+ objectives:
+ 0.5: 0.05
+ 0.75: 0.025
+ 0.9: 0.01
+ 0.99: 0.001
replicaset_controller_stale_sync_skips_total
+- name: stale_sync_skips_total
+ subsystem: replicaset_controller
+ help: Total number of ReplicaSet syncs skipped due to a stale watch cache.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
resource_manager_allocation_errors_total
+- name: resource_manager_allocation_errors_total
+ help: Number of errors encountered during exclusive resource allocation.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - resource_name
+ - source
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
resource_manager_allocations_total
+- name: resource_manager_allocations_total
+ help: Number of exclusive resource allocations performed by a resource manager.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - resource_name
+ - source
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
resource_manager_container_assignments
+- name: resource_manager_container_assignments
+ help: Number of containers with a specific type of resource assignment.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - assignment_type
+ - resource_name
+ componentEndpoints:
+ - component: kubelet
+ endpoint: /metrics
resourcepoolstatusrequest_controller_request_processing_duration_seconds
+- name: request_processing_duration_seconds
+ subsystem: resourcepoolstatusrequest_controller
+ help: Time taken to process a ResourcePoolStatusRequest
+ type: Histogram
+ stabilityLevel: ALPHA
+ labels:
+ - driver_name
+ buckets:
+ - 0.001
+ - 0.002
+ - 0.004
+ - 0.008
+ - 0.016
+ - 0.032
+ - 0.064
+ - 0.128
+ - 0.256
+ - 0.512
+ - 1.024
+ - 2.048
+ - 4.096
+ - 8.192
+ - 16.384
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
resourcepoolstatusrequest_controller_request_processing_errors_total
+- name: request_processing_errors_total
+ subsystem: resourcepoolstatusrequest_controller
+ help: Total number of errors encountered while processing ResourcePoolStatusRequests
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - driver_name
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
resourcepoolstatusrequest_controller_requests_processed_total
+- name: requests_processed_total
+ subsystem: resourcepoolstatusrequest_controller
+ help: Total number of ResourcePoolStatusRequests processed
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - driver_name
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
rest_client_transport_ca_reload_total
+- name: rest_client_transport_ca_reload_total
+ help: Number of times a CA reload is attempted, partitioned by the result and reason for the reload attempt
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - reason
+ - result
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
rest_client_transport_cache_gc_calls_total
+- name: rest_client_transport_cache_gc_calls_total
+ help: 'Number of times a GC cleanup attempts to delete a transport cache entry, partitioned by the result: deleted, skipped'
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - result
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
rest_client_transport_cert_rotation_gc_calls_total
+- name: rest_client_transport_cert_rotation_gc_calls_total
+ help: Number of times a cert rotation goroutine cancel func is called via GC cleanup of the associated transport
+ type: Counter
+ stabilityLevel: ALPHA
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
rest_client_transport_create_calls_total
- name: rest_client_transport_create_calls_total
- help: 'Number of calls to get a new transport, partitioned by the result of the operation hit: obtained from the cache, miss: created and added to the cache, uncacheable: created and not cached'
+ help: 'Number of calls to get a new transport, partitioned by the result of the operation hit: obtained from the cache, miss: created and added to the cache, miss-gc: recreated and added back to the cache after being garbage collected, uncacheable: created and not cached'
type: Counter
stabilityLevel: ALPHA
labels:
- result
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
route_controller_route_sync_total
+- name: route_sync_total
+ subsystem: route_controller
+ help: A metric counting the amount of times routes have been synced with the cloud provider.
+ type: Counter
+ stabilityLevel: ALPHA
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
running_managed_controllers
- name: running_managed_controllers
help: Indicates where instances of a controller are currently running
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- manager
- name
+ componentEndpoints:
+ - component: cloud-controller-manager
+ endpoint: /metrics
+ - component: kube-apiserver
+ endpoint: /metrics
+ - component: kube-controller-manager
+ endpoint: /metrics
+ - component: kube-proxy
+ endpoint: /metrics
+ - component: kube-scheduler
+ endpoint: /metrics
+ - component: kubelet
+ endpoint: /metrics
scheduler_dra_bindingconditions_allocations_total
+- name: dra_bindingconditions_allocations_total
+ subsystem: scheduler
+ help: Number of allocations using devices with BindingConditions, counted per driver per scheduling attempt
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - driver
+ - profile
+ - status
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_dra_bindingconditions_wait_duration_seconds
+- name: dra_bindingconditions_wait_duration_seconds
+ subsystem: scheduler
+ help: Time in seconds spent waiting for BindingConditions to be satisfied during PreBind.
+ type: Histogram
+ stabilityLevel: ALPHA
+ labels:
+ - driver
+ - profile
+ - status
+ buckets:
+ - 0.1
+ - 0.2
+ - 0.4
+ - 0.8
+ - 1.6
+ - 3.2
+ - 6.4
+ - 12.8
+ - 25.6
+ - 51.2
+ - 102.4
+ - 204.8
+ - 409.6
+ - 819.2
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_goroutines
- name: goroutines
subsystem: scheduler
help: Number of running goroutines split by the work they do such as binding.
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- operation
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_permit_wait_duration_seconds
- name: permit_wait_duration_seconds
subsystem: scheduler
help: Duration of waiting on permit.
type: Histogram
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- result
buckets:
- 0.001
- 0.002
- 0.004
- 0.008
- 0.016
- 0.032
- 0.064
- 0.128
- 0.256
- 0.512
- 1.024
- 2.048
- 4.096
- 8.192
- 16.384
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_plugin_evaluation_total
- name: plugin_evaluation_total
subsystem: scheduler
help: Number of attempts to schedule pods by each plugin and the extension point (available only in PreFilter, Filter, PreScore, and Score).
type: Counter
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- extension_point
- plugin
- profile
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_podgroup_schedule_attempts_total
+- name: podgroup_schedule_attempts_total
+ subsystem: scheduler
+ help: Number of attempts to schedule pod group, by the result. 'unschedulable' means a pod group could not be scheduled, while 'error' means an internal scheduler problem.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - profile
+ - result
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_podgroup_scheduling_algorithm_duration_seconds
+- name: podgroup_scheduling_algorithm_duration_seconds
+ subsystem: scheduler
+ help: Pod group scheduling algorithm latency in seconds
+ type: Histogram
+ stabilityLevel: ALPHA
+ buckets:
+ - 0.001
+ - 0.002
+ - 0.004
+ - 0.008
+ - 0.016
+ - 0.032
+ - 0.064
+ - 0.128
+ - 0.256
+ - 0.512
+ - 1.024
+ - 2.048
+ - 4.096
+ - 8.192
+ - 16.384
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_podgroup_scheduling_attempt_duration_seconds
+- name: podgroup_scheduling_attempt_duration_seconds
+ subsystem: scheduler
+ help: Pod group scheduling attempt latency in seconds (scheduling algorithm + binding)
+ type: Histogram
+ stabilityLevel: ALPHA
+ labels:
+ - profile
+ - result
+ buckets:
+ - 0.001
+ - 0.002
+ - 0.004
+ - 0.008
+ - 0.016
+ - 0.032
+ - 0.064
+ - 0.128
+ - 0.256
+ - 0.512
+ - 1.024
+ - 2.048
+ - 4.096
+ - 8.192
+ - 16.384
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
scheduler_unschedulable_pods
- name: unschedulable_pods
subsystem: scheduler
help: The number of unschedulable pods broken down by plugin name. A pod will increment the gauge for all plugins that caused it to not schedule and so this metric have meaning only when broken down by plugin.
type: Gauge
- stabilityLevel: ALPHA
+ stabilityLevel: BETA
labels:
- plugin
- profile
+ componentEndpoints:
+ - component: kube-scheduler
+ endpoint: /metrics
statefulset_controller_stale_sync_skips_total
+- name: stale_sync_skips_total
+ subsystem: statefulset_controller
+ help: Total number of StatefulSet syncs skipped due to a stale watch cache.
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - group
+ - resource
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
volume_operation_errors_total
+- name: volume_operation_errors_total
+ help: Total volume operation errors
+ type: Counter
+ stabilityLevel: ALPHA
+ labels:
+ - operation_name
+ - plugin_name
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
volume_operation_total_errors
- name: volume_operation_total_errors
help: Total volume operation errors
type: Counter
+ deprecatedVersion: 1.36.0
stabilityLevel: ALPHA
labels:
- operation_name
- plugin_name
+ componentEndpoints:
+ - component: kube-controller-manager
+ endpoint: /metrics
Discussion