Ceph PG:activating+remapped の修復
現在の状況
Cephの環境
Host: Ubuntu 22.04 LTS x1 (cephadmを使用してsingle nodeで動作するように設定済み)
OSD: 9台
Ceph Version: 18.2.1 reef
CephのAlerts (critical)
- CephHealthError
- CephPGsInactive
- CephPGUnavilableBlockingIO
ceph health detail
# ceph health detail
HEALTH_ERR full ratio(s) out of order; Reduced data availability: 1 pg inactive; 1 slow ops, oldest one blocked for 173867 sec, osd.1 has slow ops; too many PGs per OSD (277 > max 250)
[ERR] OSD_OUT_OF_ORDER_FULL: full ratio(s) out of order
backfillfull_ratio (0.75) < nearfull_ratio (0.85), increased
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
pg 15.2 is stuck inactive for 2d, current state activating+remapped, last acting [1,4,0,5,6,3]
[WRN] SLOW_OPS: 1 slow ops, oldest one blocked for 173867 sec, osd.1 has slow ops
[WRN] TOO_MANY_PGS: too many PGs per OSD (277 > max 250)
[ERR] OSD_OUT_OF_ORDER_FULL: full ratio(s) out of order
このエラーは次の情報を参考の上で、解消済み
OSD_OUT_OF_ORDER_FULL
The utilization thresholds for nearfull, backfillfull, full, and/or failsafe_full are not ascending. In particular, the following pattern is expected: nearfull < backfillfull, backfillfull < full, and full < failsafe_full.
To adjust these utilization thresholds, run the following commands:
ceph osd set-nearfull-ratio <ratio>
ceph osd set-backfillfull-ratio <ratio>
ceph osd set-full-ratio <ratio>
とりあえず一度Cephを再起動してみる
sudo systemctl restart ceph.target
再起動したところ、PGの状態が変化した。
active+remapped
から
active+remapped+backfillingとactive+remapped+backfill_wait
に変化した。
また、ceph alertsからcriticalの項目が消えた。
一晩放置して様子を見ようと思う。
放置したらPGの問題が解決した。
osd.1のioに何らかの問題が発生してpgに問題が生じたと推測する。
TOO MANY PGSのアラートはこちらのscrapにて対処しようと思う