☸️

Kubernetes 1.36: SIG-Instrumentationの変更内容

に公開

はじめに

本記事では Kubernetes v1.36 の Changelog から、SIG-Instrumentation 関連及びメトリクスの変更点について取り上げ、まとめています。

Changes by kind

Upgrade Notes

  • kube-controller-manager: メトリクス volume_operation_total_errorsvolume_operation_errors_total にリネームされました。既存のダッシュボードやアラートで旧メトリクス名を使っている場合は修正が必要です。 (#136399)

    • metrics diff 上では旧メトリクスが deprecatedVersion: 1.36.0 となり、新メトリクスが追加されています。
  • etcd_bookmark_countsetcd_bookmark_total にリネームされました。こちらも既存の監視設定を更新する必要があります。 (#136483)

    • 旧メトリクスは deprecated になり、新しい counter メトリクスへ移行する形です。

Deprecation

  • volume_operation_total_errors は 1.36 で deprecated となり、volume_operation_errors_total へ移行します。 (#136399)
  • etcd_bookmark_counts は 1.36 で deprecated となり、etcd_bookmark_total へ移行します。 (#136483)

API Changes

  • config.k8s.io/flagzconfig.k8s.io/statuszv1beta1 に昇格しました。 (#137174, #137173)

    • 1.35 では v1alpha1 だった structured / versioned response が 1.36 で beta になっています。
  • /flagz/statusz で YAML 形式のレスポンスが利用できるようになりました。 (#135309)

    • JSON に加えて YAML でも機械処理しやすくなっています。
  • /flagz/statuszapiserver_request_totalapiserver_request_duration_seconds で計測されるようになりました。 (#137021)

    • content negotiation された API version に応じて group / version label が反映されます。
  • Manifest-based admission control configuration (KEP-5793) が alpha として追加されました。 (#137346)

    • これに関連して apiserver_manifest_admission_config_controller_* メトリクスが追加されています。
      • apiserver_manifest_admission_config_controller_last_config_info
      • apiserver_manifest_admission_config_controller_automatic_reloads_total
      • apiserver_manifest_admission_config_controller_automatic_reload_last_timestamp_seconds

Features

  • Prometheus native histogram support が kube-apiserverkube-controller-managerkube-scheduler で有効化可能になりました。 (KEP-5808: Native Histogram Support for Kubernetes Metrics, #136763, #137779, #137466)

  • さまざまな既存メトリクスが alpha から beta に昇格しました。 (#136314, #136086, #136368, #136154, #136155, #136178, #136367, #135522)

    • apiserver_storage_events_received_total
    • watch_list_duration_seconds
    • EndpointSlice 関連メトリクス
    • component-base 関連メトリクス (kubernetes_build_info, running_managed_controllers など)
    • scheduler 関連メトリクス
    • HPA 関連メトリクス
    • Job controller 関連メトリクス
    • workqueue 関連メトリクス
  • informer 関連の新メトリクスが追加されました。 (KEP-4346: Informer Metrics, #135782, #137419, #137101)

    • informer_queued_items
    • informer_store_resource_version
    • informer_processing_latency_seconds
  • k8s.io/client-go/transport に関する自動 CA reload / TLS cache GC のメトリクスが追加されました。 (#132922, #136355)

    • rest_client_transport_ca_reload_total
    • rest_client_transport_cache_gc_calls_total
    • rest_client_transport_cert_rotation_gc_calls_total
  • kubelet 関連でもいくつかメトリクスが増えています。 (#137453, #137780, #137719)

    • kubelet_terminated_containers_total は終了したコンテナ数を exit code ごとに追跡します。 (#137453)
    • kubelet_websocket_streaming_requests_total は kubelet が受ける exec / attach / portforward を計測します。
    • kubelet_metrics_provider は container stats の収集元が cadvisorcri かを示します。
  • コントローラ系では stale watch cache に起因する skip を示すメトリクスが追加されています。 (KEP-5647: Stale Controller Mitigation)

    • daemonset_controller_stale_sync_skips_total (#134937)
    • job_controller_stale_sync_skips_total (#137210)
    • replicaset_controller_stale_sync_skips_total (#137212)
    • statefulset_controller_stale_sync_skips_total (#137254)
      • 前回 sync 時の write をまだ watch cache で観測できていない場合に sync を defer し、その回数を記録します。

Documentation

  • 自動生成される metrics reference documentation に component と endpoint の情報が追加されました。 (#136360)

Bug or Regression

  • apiserver_watch_cache_resource_version において watch cache の resource version メトリクスが 下15 桁に truncate されるようになりました。 (#137615)
    • float64 に載せた際の精度問題を避ける意図のようです。
    • informer_store_resource_version でも同様の対応がとられています。
float64(resourceVersion % 1000000000000000)
  • 一部メトリクスが実際のレイテンシではなくほぼ 0 に近い値を記録していた不具合が修正されました。 (#135749)
    • event_handling_duration_seconds
    • preemption_goroutines_duration_seconds
    • run_podsandbox_duration_seconds
    • store_schedule_results_duration_seconds
	// --- 1. ❌ INCORRECT WAY: Immediate Argument Evaluation ---
	// The function time.Since(start) is an ARGUMENT to fmt.Println.
	// Go's defer rule states that arguments are evaluated IMMEDIATELY
	// when the defer statement is executed.
	// Result: Reports a time duration close to zero.
	defer fmt.Println("❌ Incorrect Value (Immediate Evaluation):", time.Since(start)) 

	// --- 2. ✅ CORRECT WAY: Using a Closure/Anonymous Function ---
	// The deferred function is now the anonymous function itself (func() {}).
	// The code inside the closure (time.Since(start)) is NOT evaluated
	// until the function exits and the deferred call is actually made.
	// Result: Reports the true duration of the main function's execution.
	defer func() {
		fmt.Println("✅ Correct Value (Delayed Evaluation):", time.Since(start))
	}()

Other (Cleanup or Flake)

Kubernetes Metrics Changes: v1.35.0 → v1.36.0

自動生成したメトリクスの差分の一覧を掲載しています。
実装はこちらにありますので表示形式などフィードバックがあれば歓迎いたします。 https://github.com/tsuzu/k8s-metrics-changes

Summary

  • Added: 44 metrics
  • Removed: 0 metrics
  • Updated: 25 metrics
  • Total Changes: 69 metrics

Changed Metrics

Metric Name Type Change Type Stability Level Description
apiserver_impersonation_attempts_duration_seconds Histogram Added ALPHA
apiserver_impersonation_attempts_total Counter Added ALPHA
apiserver_impersonation_authorization_attempts_duration_seconds Histogram Added ALPHA
apiserver_impersonation_authorization_attempts_total Counter Added ALPHA
apiserver_manifest_admission_config_controller_automatic_reload_last_timestamp_seconds Gauge Added ALPHA
apiserver_manifest_admission_config_controller_automatic_reloads_total Counter Added ALPHA
apiserver_manifest_admission_config_controller_last_config_info Custom Added ALPHA
apiserver_peer_discovery_sync_errors_total Counter Added ALPHA
apiserver_peer_proxy_errors_total Counter Added ALPHA
apiserver_rerouted_request_total Counter Updated ALPHA Help text changed.
Added labels: [group, resource, version].
apiserver_storage_events_received_total Counter Updated BETA Stability level changed from ALPHA to BETA.
apiserver_watch_cache_resource_version Gauge Updated ALPHA Help text changed.
apiserver_watch_filtered_events_total Counter Added ALPHA
apiserver_watch_list_duration_seconds Histogram Updated BETA Stability level changed from ALPHA to BETA.
apiserver_watch_shards_total Gauge Added ALPHA
apiserver_websocket_streaming_requests_total Counter Added ALPHA
daemonset_controller_stale_sync_skips_total Counter Added ALPHA
endpoint_slice_controller_desired_endpoint_slices Gauge Updated BETA Stability level changed from ALPHA to BETA.
endpoint_slice_controller_endpoints_added_per_sync Histogram Updated BETA Stability level changed from ALPHA to BETA.
endpoint_slice_controller_endpoints_desired Gauge Updated BETA Stability level changed from ALPHA to BETA.
endpoint_slice_controller_endpoints_removed_per_sync Histogram Updated BETA Stability level changed from ALPHA to BETA.
endpoint_slice_controller_num_endpoint_slices Gauge Updated BETA Stability level changed from ALPHA to BETA.
endpoint_slice_controller_services_count_by_traffic_distribution Gauge Updated BETA Stability level changed from ALPHA to BETA.
etcd_bookmark_counts Gauge Updated ALPHA Marked as deprecated in version 1.36.0.
etcd_bookmark_total Counter Added ALPHA
horizontal_pod_autoscaler_controller_metric_computation_duration_seconds Histogram Updated BETA Stability level changed from ALPHA to BETA.
horizontal_pod_autoscaler_controller_metric_computation_total Counter Updated BETA Stability level changed from ALPHA to BETA.
horizontal_pod_autoscaler_controller_reconciliation_duration_seconds Histogram Updated BETA Stability level changed from ALPHA to BETA.
horizontal_pod_autoscaler_controller_reconciliations_total Counter Updated BETA Stability level changed from ALPHA to BETA.
informer_processing_latency_seconds Histogram Added ALPHA
informer_queued_items Gauge Added ALPHA
informer_store_resource_version Gauge Added ALPHA
job_controller_pod_failures_handled_by_failure_policy_total Counter Updated BETA Stability level changed from ALPHA to BETA.
job_controller_stale_sync_skips_total Counter Added ALPHA
job_controller_terminated_pods_tracking_finalizer_total Counter Updated BETA Stability level changed from ALPHA to BETA.
kubelet_memory_qos_node_memory_low_bytes Gauge Added ALPHA
kubelet_memory_qos_node_memory_min_bytes Gauge Added ALPHA
kubelet_metrics_provider Gauge Added ALPHA
kubelet_pleg_pod_relist_duration_seconds Histogram Added ALPHA
kubelet_pod_watch_events_dropped_total Counter Added ALPHA
kubelet_terminated_containers_total Counter Added ALPHA
kubelet_websocket_streaming_requests_total Counter Added ALPHA
kubernetes_build_info Gauge Updated BETA Stability level changed from ALPHA to BETA.
latency Summary Added ALPHA
replicaset_controller_stale_sync_skips_total Counter Added ALPHA
resource_manager_allocation_errors_total Counter Added ALPHA
resource_manager_allocations_total Counter Added ALPHA
resource_manager_container_assignments Counter Added ALPHA
resourcepoolstatusrequest_controller_request_processing_duration_seconds Histogram Added ALPHA
resourcepoolstatusrequest_controller_request_processing_errors_total Counter Added ALPHA
resourcepoolstatusrequest_controller_requests_processed_total Counter Added ALPHA
rest_client_transport_ca_reload_total Counter Added ALPHA
rest_client_transport_cache_gc_calls_total Counter Added ALPHA
rest_client_transport_cert_rotation_gc_calls_total Counter Added ALPHA
rest_client_transport_create_calls_total Counter Updated ALPHA Help text changed.
route_controller_route_sync_total Counter Added ALPHA
running_managed_controllers Gauge Updated BETA Stability level changed from ALPHA to BETA.
scheduler_dra_bindingconditions_allocations_total Counter Added ALPHA
scheduler_dra_bindingconditions_wait_duration_seconds Histogram Added ALPHA
scheduler_goroutines Gauge Updated BETA Stability level changed from ALPHA to BETA.
scheduler_permit_wait_duration_seconds Histogram Updated BETA Stability level changed from ALPHA to BETA.
scheduler_plugin_evaluation_total Counter Updated BETA Stability level changed from ALPHA to BETA.
scheduler_podgroup_schedule_attempts_total Counter Added ALPHA
scheduler_podgroup_scheduling_algorithm_duration_seconds Histogram Added ALPHA
scheduler_podgroup_scheduling_attempt_duration_seconds Histogram Added ALPHA
scheduler_unschedulable_pods Gauge Updated BETA Stability level changed from ALPHA to BETA.
statefulset_controller_stale_sync_skips_total Counter Added ALPHA
volume_operation_errors_total Counter Added ALPHA
volume_operation_total_errors Counter Updated ALPHA Marked as deprecated in version 1.36.0.

Detailed Changes

apiserver_impersonation_attempts_duration_seconds

+- name: attempts_duration_seconds
+  subsystem: impersonation
+  namespace: apiserver
+  help: Latency of impersonation attempts in seconds split by mode and decision.
+  type: Histogram
+  stabilityLevel: ALPHA
+  labels:
+    - decision
+    - mode
+  buckets:
+    - 0.001
+    - 0.002
+    - 0.004
+    - 0.008
+    - 0.016
+    - 0.032
+    - 0.064
+    - 0.128
+    - 0.256
+    - 0.512
+    - 1.024
+    - 2.048
+    - 4.096
+    - 8.192
+    - 16.384
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_impersonation_attempts_total

+- name: attempts_total
+  subsystem: impersonation
+  namespace: apiserver
+  help: Total number of impersonation attempts split by mode and decision.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - decision
+    - mode
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_impersonation_authorization_attempts_duration_seconds

+- name: authorization_attempts_duration_seconds
+  subsystem: impersonation
+  namespace: apiserver
+  help: Latency of authorization checks made by the impersonation handler in seconds split by mode and decision.
+  type: Histogram
+  stabilityLevel: ALPHA
+  labels:
+    - decision
+    - mode
+  buckets:
+    - 0.001
+    - 0.002
+    - 0.004
+    - 0.008
+    - 0.016
+    - 0.032
+    - 0.064
+    - 0.128
+    - 0.256
+    - 0.512
+    - 1.024
+    - 2.048
+    - 4.096
+    - 8.192
+    - 16.384
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_impersonation_authorization_attempts_total

+- name: authorization_attempts_total
+  subsystem: impersonation
+  namespace: apiserver
+  help: Total number of authorization checks made by the impersonation handler split by mode and decision.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - decision
+    - mode
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_manifest_admission_config_controller_automatic_reload_last_timestamp_seconds

+- name: automatic_reload_last_timestamp_seconds
+  subsystem: manifest_admission_config_controller
+  namespace: apiserver
+  help: Timestamp of the last automatic reload of admission manifest configuration split by status, plugin, and apiserver identity.
+  type: Gauge
+  stabilityLevel: ALPHA
+  labels:
+    - apiserver_id_hash
+    - plugin
+    - status
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_manifest_admission_config_controller_automatic_reloads_total

+- name: automatic_reloads_total
+  subsystem: manifest_admission_config_controller
+  namespace: apiserver
+  help: Total number of automatic reloads of admission manifest configuration split by status, plugin, and apiserver identity.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - apiserver_id_hash
+    - plugin
+    - status
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_manifest_admission_config_controller_last_config_info

+- name: apiserver_manifest_admission_config_controller_last_config_info
+  help: Information about the last applied admission manifest configuration with hash as label, split by plugin and apiserver identity.
+  type: Custom
+  stabilityLevel: ALPHA
+  labels:
+    - plugin
+    - apiserver_id_hash
+    - hash
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_peer_discovery_sync_errors_total

+- name: peer_discovery_sync_errors_total
+  subsystem: apiserver
+  help: Total number of errors encountered while syncing discovery information from a peer kube-apiserver
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - type
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_peer_proxy_errors_total

+- name: peer_proxy_errors_total
+  subsystem: apiserver
+  help: Total number of errors encountered while proxying requests to a peer kube apiserver
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+    - type
+    - version
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_rerouted_request_total

 - name: rerouted_request_total
   subsystem: apiserver
-  help: Total number of requests that were proxied to a peer kube apiserver because the local apiserver was not capable of serving it
+  help: '`Total number of requests that were proxied to a peer kube-apiserver because the local apiserver was not capable of serving it, broken down by ''group'', ''version'', and ''resource'' indicating the GVR of the request. If all three are empty (""), the request is a discovery request.`'
   type: Counter
   stabilityLevel: ALPHA
   labels:
     - code
+    - group
+    - resource
+    - version
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_storage_events_received_total

 - name: storage_events_received_total
   subsystem: apiserver
   help: Number of etcd events received split by kind.
   type: Counter
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - group
     - resource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_watch_cache_resource_version

 - name: resource_version
   subsystem: watch_cache
   namespace: apiserver
-  help: Current resource version of watch cache broken by resource type.
+  help: Current resource version of watch cache broken by resource type. This is truncated to the 15 least significant digits.
   type: Gauge
   stabilityLevel: ALPHA
   labels:
     - group
     - resource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_watch_filtered_events_total

+- name: watch_filtered_events_total
+  namespace: apiserver
+  help: Counter of events filtered out by shard selector during watch dispatch, broken by resource type.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_watch_list_duration_seconds

 - name: watch_list_duration_seconds
   subsystem: apiserver
   help: Response latency distribution in seconds for watch list requests broken by group, version, resource and scope.
   type: Histogram
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - group
     - resource
     - scope
     - version
   buckets:
     - 0.05
     - 0.1
     - 0.2
     - 0.4
     - 0.6
     - 0.8
     - 1
     - 2
     - 4
     - 6
     - 8
     - 10
     - 15
     - 20
     - 30
     - 45
     - 60
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_watch_shards_total

+- name: watch_shards_total
+  namespace: apiserver
+  help: Number of active sharded watch connections broken by resource type.
+  type: Gauge
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

apiserver_websocket_streaming_requests_total

+- name: websocket_streaming_requests_total
+  subsystem: apiserver
+  help: Total number of WebSocket streaming requests (exec/attach/portforward) routed by the API server, labeled by subresource and proxy_type. proxy_type is proxied_to_kubelet when the kubelet handles the request directly; otherwise translated_at_apiserver.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - proxy_type
+    - subresource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

daemonset_controller_stale_sync_skips_total

+- name: stale_sync_skips_total
+  subsystem: daemonset_controller
+  help: Total number of DaemonSet syncs skipped due to a stale watch cache.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

endpoint_slice_controller_desired_endpoint_slices

 - name: desired_endpoint_slices
   subsystem: endpoint_slice_controller
   help: Number of EndpointSlices that would exist with perfect endpoint allocation
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

endpoint_slice_controller_endpoints_added_per_sync

 - name: endpoints_added_per_sync
   subsystem: endpoint_slice_controller
   help: Number of endpoints added on each Service sync
   type: Histogram
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   buckets:
     - 2
     - 4
     - 8
     - 16
     - 32
     - 64
     - 128
     - 256
     - 512
     - 1024
     - 2048
     - 4096
     - 8192
     - 16384
     - 32768
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

endpoint_slice_controller_endpoints_desired

 - name: endpoints_desired
   subsystem: endpoint_slice_controller
   help: Number of endpoints desired
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

endpoint_slice_controller_endpoints_removed_per_sync

 - name: endpoints_removed_per_sync
   subsystem: endpoint_slice_controller
   help: Number of endpoints removed on each Service sync
   type: Histogram
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   buckets:
     - 2
     - 4
     - 8
     - 16
     - 32
     - 64
     - 128
     - 256
     - 512
     - 1024
     - 2048
     - 4096
     - 8192
     - 16384
     - 32768
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

endpoint_slice_controller_num_endpoint_slices

 - name: num_endpoint_slices
   subsystem: endpoint_slice_controller
   help: Number of EndpointSlices
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

endpoint_slice_controller_services_count_by_traffic_distribution

 - name: services_count_by_traffic_distribution
   subsystem: endpoint_slice_controller
   help: Number of Services using some specific trafficDistribution
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - traffic_distribution
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

etcd_bookmark_counts

 - name: etcd_bookmark_counts
   help: Number of etcd bookmarks (progress notify events) split by kind.
   type: Gauge
+  deprecatedVersion: 1.36.0
   stabilityLevel: ALPHA
   labels:
     - group
     - resource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

etcd_bookmark_total

+- name: etcd_bookmark_total
+  help: Number of etcd bookmarks (progress notify events) split by kind.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-apiserver
+      endpoint: /metrics

horizontal_pod_autoscaler_controller_metric_computation_duration_seconds

 - name: metric_computation_duration_seconds
   subsystem: horizontal_pod_autoscaler_controller
   help: The time(seconds) that the HPA controller takes to calculate one metric. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. The label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type
   type: Histogram
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - action
     - error
     - metric_type
   buckets:
     - 0.001
     - 0.002
     - 0.004
     - 0.008
     - 0.016
     - 0.032
     - 0.064
     - 0.128
     - 0.256
     - 0.512
     - 1.024
     - 2.048
     - 4.096
     - 8.192
     - 16.384
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

horizontal_pod_autoscaler_controller_metric_computation_total

 - name: metric_computation_total
   subsystem: horizontal_pod_autoscaler_controller
   help: Number of metric computations. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. The label 'metric_type' corresponds to HPA.spec.metrics[*].type
   type: Counter
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - action
     - error
     - metric_type
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

horizontal_pod_autoscaler_controller_reconciliation_duration_seconds

 - name: reconciliation_duration_seconds
   subsystem: horizontal_pod_autoscaler_controller
   help: The time(seconds) that the HPA controller takes to reconcile once. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label.
   type: Histogram
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - action
     - error
   buckets:
     - 0.001
     - 0.002
     - 0.004
     - 0.008
     - 0.016
     - 0.032
     - 0.064
     - 0.128
     - 0.256
     - 0.512
     - 1.024
     - 2.048
     - 4.096
     - 8.192
     - 16.384
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

horizontal_pod_autoscaler_controller_reconciliations_total

 - name: reconciliations_total
   subsystem: horizontal_pod_autoscaler_controller
   help: Number of reconciliations of HPA controller. The label 'action' should be either 'scale_down', 'scale_up', or 'none'. Also, the label 'error' should be either 'spec', 'internal', or 'none'. Note that if both spec and internal errors happen during a reconciliation, the first one to occur is reported in `error` label.
   type: Counter
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - action
     - error
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

informer_processing_latency_seconds

+- name: processing_latency_seconds
+  subsystem: informer
+  help: Time taken to process events after popping from the queue.
+  type: Histogram
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - name
+    - resource
+    - version
+  buckets:
+    - 0.001
+    - 0.005
+    - 0.01
+    - 0.025
+    - 0.05
+    - 0.1
+    - 0.25
+    - 0.5
+    - 1
+    - 2.5
+    - 5
+    - 10
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

informer_queued_items

+- name: queued_items
+  subsystem: informer
+  help: Number of items currently queued in the FIFO.
+  type: Gauge
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - name
+    - resource
+    - version
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

informer_store_resource_version

+- name: store_resource_version
+  subsystem: informer
+  help: The 15 least significant digits of the resource version of the store.
+  type: Gauge
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - name
+    - resource
+    - version
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

job_controller_pod_failures_handled_by_failure_policy_total

 - name: pod_failures_handled_by_failure_policy_total
   subsystem: job_controller
   help: |-
     `The number of failed Pods handled by failure policy with
     			respect to the failure policy action applied based on the matched
     			rule. Possible values of the action label correspond to the
     			possible values for the failure policy rule action, which are:
     			"FailJob", "Ignore" and "Count".`
   type: Counter
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - action
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

job_controller_stale_sync_skips_total

+- name: stale_sync_skips_total
+  subsystem: job_controller
+  help: Total number of Job syncs skipped due to a stale watch cache.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

job_controller_terminated_pods_tracking_finalizer_total

 - name: terminated_pods_tracking_finalizer_total
   subsystem: job_controller
   help: |-
     `The number of terminated pods (phase=Failed|Succeeded)
     that have the finalizer batch.kubernetes.io/job-tracking
     The event label can be "add" or "delete".`
   type: Counter
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - event
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

kubelet_memory_qos_node_memory_low_bytes

+- name: memory_qos_node_memory_low_bytes
+  subsystem: kubelet
+  help: Total cgroup v2 memory.low in bytes for Burstable pods. This memory is soft-reserved and may be reclaimed under extreme pressure.
+  type: Gauge
+  stabilityLevel: ALPHA
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubelet_memory_qos_node_memory_min_bytes

+- name: memory_qos_node_memory_min_bytes
+  subsystem: kubelet
+  help: Total cgroup v2 memory.min in bytes for Guaranteed pods. This memory is hard-reserved and never reclaimed by the kernel.
+  type: Gauge
+  stabilityLevel: ALPHA
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubelet_metrics_provider

+- name: metrics_provider
+  subsystem: kubelet
+  help: Metrics provider used by kubelet to collect container stats. Values can be 'cadvisor' and 'cri'
+  type: Gauge
+  stabilityLevel: ALPHA
+  labels:
+    - provider
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubelet_pleg_pod_relist_duration_seconds

+- name: pleg_pod_relist_duration_seconds
+  subsystem: kubelet
+  help: Duration in seconds for relisting a single pod in PLEG.
+  type: Histogram
+  stabilityLevel: ALPHA
+  buckets:
+    - 0.005
+    - 0.01
+    - 0.025
+    - 0.05
+    - 0.1
+    - 0.25
+    - 0.5
+    - 1
+    - 2.5
+    - 5
+    - 10
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubelet_pod_watch_events_dropped_total

+- name: pod_watch_events_dropped_total
+  subsystem: kubelet
+  help: Cumulative number of pod watch events dropped.
+  type: Counter
+  stabilityLevel: ALPHA
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubelet_terminated_containers_total

+- name: terminated_containers_total
+  subsystem: kubelet
+  help: Cumulative number of container terminations.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - container_type
+    - exit_code
+    - reason
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubelet_websocket_streaming_requests_total

+- name: websocket_streaming_requests_total
+  subsystem: kubelet
+  help: Total number of WebSocket streaming requests (exec/attach/portforward) received by the kubelet.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - subresource
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

kubernetes_build_info

 - name: kubernetes_build_info
   help: A metric with a constant '1' value labeled by major, minor, git version, git commit, git tree state, build date, Go version, and compiler from which Kubernetes was built, and platform on which it is running.
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - build_date
     - compiler
     - git_commit
     - git_tree_state
     - git_version
     - go_version
     - major
     - minor
     - platform
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

latency

+- name: latency
+  type: Summary
+  stabilityLevel: ALPHA
+  labels:
+    - node
+  objectives:
+    0.5: 0.05
+    0.75: 0.025
+    0.9: 0.01
+    0.99: 0.001

replicaset_controller_stale_sync_skips_total

+- name: stale_sync_skips_total
+  subsystem: replicaset_controller
+  help: Total number of ReplicaSet syncs skipped due to a stale watch cache.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

resource_manager_allocation_errors_total

+- name: resource_manager_allocation_errors_total
+  help: Number of errors encountered during exclusive resource allocation.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - resource_name
+    - source
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

resource_manager_allocations_total

+- name: resource_manager_allocations_total
+  help: Number of exclusive resource allocations performed by a resource manager.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - resource_name
+    - source
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

resource_manager_container_assignments

+- name: resource_manager_container_assignments
+  help: Number of containers with a specific type of resource assignment.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - assignment_type
+    - resource_name
+  componentEndpoints:
+    - component: kubelet
+      endpoint: /metrics

resourcepoolstatusrequest_controller_request_processing_duration_seconds

+- name: request_processing_duration_seconds
+  subsystem: resourcepoolstatusrequest_controller
+  help: Time taken to process a ResourcePoolStatusRequest
+  type: Histogram
+  stabilityLevel: ALPHA
+  labels:
+    - driver_name
+  buckets:
+    - 0.001
+    - 0.002
+    - 0.004
+    - 0.008
+    - 0.016
+    - 0.032
+    - 0.064
+    - 0.128
+    - 0.256
+    - 0.512
+    - 1.024
+    - 2.048
+    - 4.096
+    - 8.192
+    - 16.384
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

resourcepoolstatusrequest_controller_request_processing_errors_total

+- name: request_processing_errors_total
+  subsystem: resourcepoolstatusrequest_controller
+  help: Total number of errors encountered while processing ResourcePoolStatusRequests
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - driver_name
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

resourcepoolstatusrequest_controller_requests_processed_total

+- name: requests_processed_total
+  subsystem: resourcepoolstatusrequest_controller
+  help: Total number of ResourcePoolStatusRequests processed
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - driver_name
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

rest_client_transport_ca_reload_total

+- name: rest_client_transport_ca_reload_total
+  help: Number of times a CA reload is attempted, partitioned by the result and reason for the reload attempt
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - reason
+    - result
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

rest_client_transport_cache_gc_calls_total

+- name: rest_client_transport_cache_gc_calls_total
+  help: 'Number of times a GC cleanup attempts to delete a transport cache entry, partitioned by the result: deleted, skipped'
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - result
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

rest_client_transport_cert_rotation_gc_calls_total

+- name: rest_client_transport_cert_rotation_gc_calls_total
+  help: Number of times a cert rotation goroutine cancel func is called via GC cleanup of the associated transport
+  type: Counter
+  stabilityLevel: ALPHA
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

rest_client_transport_create_calls_total

 - name: rest_client_transport_create_calls_total
-  help: 'Number of calls to get a new transport, partitioned by the result of the operation hit: obtained from the cache, miss: created and added to the cache, uncacheable: created and not cached'
+  help: 'Number of calls to get a new transport, partitioned by the result of the operation hit: obtained from the cache, miss: created and added to the cache, miss-gc: recreated and added back to the cache after being garbage collected, uncacheable: created and not cached'
   type: Counter
   stabilityLevel: ALPHA
   labels:
     - result
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

route_controller_route_sync_total

+- name: route_sync_total
+  subsystem: route_controller
+  help: A metric counting the amount of times routes have been synced with the cloud provider.
+  type: Counter
+  stabilityLevel: ALPHA
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics

running_managed_controllers

 - name: running_managed_controllers
   help: Indicates where instances of a controller are currently running
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - manager
     - name
+  componentEndpoints:
+    - component: cloud-controller-manager
+      endpoint: /metrics
+    - component: kube-apiserver
+      endpoint: /metrics
+    - component: kube-controller-manager
+      endpoint: /metrics
+    - component: kube-proxy
+      endpoint: /metrics
+    - component: kube-scheduler
+      endpoint: /metrics
+    - component: kubelet
+      endpoint: /metrics

scheduler_dra_bindingconditions_allocations_total

+- name: dra_bindingconditions_allocations_total
+  subsystem: scheduler
+  help: Number of allocations using devices with BindingConditions, counted per driver per scheduling attempt
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - driver
+    - profile
+    - status
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_dra_bindingconditions_wait_duration_seconds

+- name: dra_bindingconditions_wait_duration_seconds
+  subsystem: scheduler
+  help: Time in seconds spent waiting for BindingConditions to be satisfied during PreBind.
+  type: Histogram
+  stabilityLevel: ALPHA
+  labels:
+    - driver
+    - profile
+    - status
+  buckets:
+    - 0.1
+    - 0.2
+    - 0.4
+    - 0.8
+    - 1.6
+    - 3.2
+    - 6.4
+    - 12.8
+    - 25.6
+    - 51.2
+    - 102.4
+    - 204.8
+    - 409.6
+    - 819.2
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_goroutines

 - name: goroutines
   subsystem: scheduler
   help: Number of running goroutines split by the work they do such as binding.
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - operation
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_permit_wait_duration_seconds

 - name: permit_wait_duration_seconds
   subsystem: scheduler
   help: Duration of waiting on permit.
   type: Histogram
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - result
   buckets:
     - 0.001
     - 0.002
     - 0.004
     - 0.008
     - 0.016
     - 0.032
     - 0.064
     - 0.128
     - 0.256
     - 0.512
     - 1.024
     - 2.048
     - 4.096
     - 8.192
     - 16.384
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_plugin_evaluation_total

 - name: plugin_evaluation_total
   subsystem: scheduler
   help: Number of attempts to schedule pods by each plugin and the extension point (available only in PreFilter, Filter, PreScore, and Score).
   type: Counter
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - extension_point
     - plugin
     - profile
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_podgroup_schedule_attempts_total

+- name: podgroup_schedule_attempts_total
+  subsystem: scheduler
+  help: Number of attempts to schedule pod group, by the result. 'unschedulable' means a pod group could not be scheduled, while 'error' means an internal scheduler problem.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - profile
+    - result
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_podgroup_scheduling_algorithm_duration_seconds

+- name: podgroup_scheduling_algorithm_duration_seconds
+  subsystem: scheduler
+  help: Pod group scheduling algorithm latency in seconds
+  type: Histogram
+  stabilityLevel: ALPHA
+  buckets:
+    - 0.001
+    - 0.002
+    - 0.004
+    - 0.008
+    - 0.016
+    - 0.032
+    - 0.064
+    - 0.128
+    - 0.256
+    - 0.512
+    - 1.024
+    - 2.048
+    - 4.096
+    - 8.192
+    - 16.384
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_podgroup_scheduling_attempt_duration_seconds

+- name: podgroup_scheduling_attempt_duration_seconds
+  subsystem: scheduler
+  help: Pod group scheduling attempt latency in seconds (scheduling algorithm + binding)
+  type: Histogram
+  stabilityLevel: ALPHA
+  labels:
+    - profile
+    - result
+  buckets:
+    - 0.001
+    - 0.002
+    - 0.004
+    - 0.008
+    - 0.016
+    - 0.032
+    - 0.064
+    - 0.128
+    - 0.256
+    - 0.512
+    - 1.024
+    - 2.048
+    - 4.096
+    - 8.192
+    - 16.384
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

scheduler_unschedulable_pods

 - name: unschedulable_pods
   subsystem: scheduler
   help: The number of unschedulable pods broken down by plugin name. A pod will increment the gauge for all plugins that caused it to not schedule and so this metric have meaning only when broken down by plugin.
   type: Gauge
-  stabilityLevel: ALPHA
+  stabilityLevel: BETA
   labels:
     - plugin
     - profile
+  componentEndpoints:
+    - component: kube-scheduler
+      endpoint: /metrics

statefulset_controller_stale_sync_skips_total

+- name: stale_sync_skips_total
+  subsystem: statefulset_controller
+  help: Total number of StatefulSet syncs skipped due to a stale watch cache.
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - group
+    - resource
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

volume_operation_errors_total

+- name: volume_operation_errors_total
+  help: Total volume operation errors
+  type: Counter
+  stabilityLevel: ALPHA
+  labels:
+    - operation_name
+    - plugin_name
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics

volume_operation_total_errors

 - name: volume_operation_total_errors
   help: Total volume operation errors
   type: Counter
+  deprecatedVersion: 1.36.0
   stabilityLevel: ALPHA
   labels:
     - operation_name
     - plugin_name
+  componentEndpoints:
+    - component: kube-controller-manager
+      endpoint: /metrics
GitHubで編集を提案

Discussion