🤺
Debian11 で Pacemeker と SCSIフェンシング
前提
https://zenn.dev/mnod/articles/490386c39bc8d3 の環境を利用します。
イニシエータ側で sda として認識されているデバイスをフェンシングデバイスとします。
# ls -l /dev/disk/by-id/ | grep sda
lrwxrwxrwx 1 root root 9 Aug 9 19:23 scsi-36001405de80b906d5904c9d841819705 -> ../../sda
lrwxrwxrwx 1 root root 9 Aug 9 19:23 wwn-0x6001405de80b906d5904c9d841819705 -> ../../sda
準備
必要なパッケージをインストールします。
# apt install --no-install-recommends fence-agents sg3-utils
設定
設定します。
# pcs stonith create scsi-shooter fence_scsi pcmk_host_list="debian11-0 debian11-1" devices=/dev/disk/by-id/wwn-0x6001405de80b906d5904c9d841819705 meta provides=unfencing
# pcs stonith config scsi-shooter
Resource: scsi-shooter (class=stonith type=fence_scsi)
Attributes: devices=/dev/disk/by-id/wwn-0x6001405de80b906d5904c9d841819705 pcmk_host_list="debian11-0 debian11-1"
Meta Attrs: provides=unfencing
Operations: monitor interval=60s (scsi-shooter-monitor-interval-60s)
結果を確認します。
# pcs status
Cluster name: web_cluster
Cluster Summary:
* Stack: corosync
* Current DC: debian11-0 (version 2.0.5-ba59be7122) - partition with quorum
* Last updated: Thu Aug 10 17:05:33 2023
* Last change: Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
* 2 nodes configured
* 14 resource instances configured
Node List:
* Online: [ debian11-0 debian11-1 ]
Full List of Resources:
* ClusterIP (ocf::heartbeat:IPaddr2): Started debian11-0
* WebServer (systemd:nginx): Started debian11-0
* Clone Set: drbd_r0-clone [drbd_r0] (promotable):
* Masters: [ debian11-0 ]
* Slaves: [ debian11-1 ]
* fs_drbd_r0 (ocf::heartbeat:Filesystem): Started debian11-0
* Clone Set: drbd_r1-clone [drbd_r1] (promotable):
* Masters: [ debian11-0 debian11-1 ]
* Clone Set: shared_fs-clone [shared_fs]:
* Started: [ debian11-0 debian11-1 ]
* Clone Set: locking-clone [locking]:
* Stopped: [ debian11-0 debian11-1 ]
* scsi-shooter (stonith:fence_scsi): Started debian11-1
Failed Resource Actions:
* o2cb_monitor_0 on debian11-0 'not installed' (5): call=60, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:04:01 -04:00', queued=0ms, exec=16ms
* o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
動作テスト
下記をセカンダリ系で実行します。
# pcs stonith fence debian11-0
Node: debian11-0 fenced
リソースがセカンダリ系へ移動しました。
# pcs status
Cluster name: web_cluster
Cluster Summary:
* Stack: corosync
* Current DC: debian11-1 (version 2.0.5-ba59be7122) - partition with quorum
* Last updated: Thu Aug 10 17:07:19 2023
* Last change: Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
* 2 nodes configured
* 14 resource instances configured
Node List:
* Online: [ debian11-1 ]
* OFFLINE: [ debian11-0 ]
Full List of Resources:
* ClusterIP (ocf::heartbeat:IPaddr2): Started debian11-1
* WebServer (systemd:nginx): Started debian11-1
* Clone Set: drbd_r0-clone [drbd_r0] (promotable):
* Masters: [ debian11-1 ]
* Stopped: [ debian11-0 ]
* fs_drbd_r0 (ocf::heartbeat:Filesystem): Started debian11-1
* Clone Set: drbd_r1-clone [drbd_r1] (promotable):
* Masters: [ debian11-1 ]
* Stopped: [ debian11-0 ]
* Clone Set: shared_fs-clone [shared_fs]:
* Started: [ debian11-1 ]
* Stopped: [ debian11-0 ]
* Clone Set: locking-clone [locking]:
* Stopped: [ debian11-0 debian11-1 ]
* scsi-shooter (stonith:fence_scsi): Started debian11-1
Failed Resource Actions:
* o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
# df | grep /mnt
/dev/drbd0 1324760 32 1239180 1% /mnt
マスタ系で確認すると、pacemaker サービスが強制停止させられていました。
# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/lib/systemd/system/pacemaker.service; disabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2023-08-10 17:07:10 EDT; 2min 47s ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html
Process: 608 ExecStart=/usr/sbin/pacemakerd -f (code=exited, status=100)
Main PID: 608 (code=exited, status=100)
CPU: 6.472s
Aug 10 17:07:10 debian11-0 pacemakerd[608]: notice: Stopping pacemaker-fenced
Aug 10 17:07:10 debian11-0 pacemaker-fenced[610]: notice: Caught 'Terminated' signal
Aug 10 17:07:10 debian11-0 pacemakerd[608]: notice: Stopping pacemaker-based
Aug 10 17:07:10 debian11-0 pacemaker-based[609]: notice: Caught 'Terminated' signal
Aug 10 17:07:10 debian11-0 pacemaker-based[609]: notice: Disconnected from Corosync
Aug 10 17:07:10 debian11-0 pacemaker-based[609]: notice: Disconnected from Corosync
Aug 10 17:07:10 debian11-0 pacemakerd[608]: notice: Shutdown complete
Aug 10 17:07:10 debian11-0 pacemakerd[608]: notice: Shutting down and staying down after fatal error
Aug 10 17:07:10 debian11-0 systemd[1]: pacemaker.service: Succeeded.
Aug 10 17:07:10 debian11-0 systemd[1]: pacemaker.service: Consumed 6.472s CPU time.
戻し
セカンダリ系で下記を実行します。
# pcs cluster start debian11-0
debian11-0: Starting Cluster...
マスタ系がクラスタに復帰し、リソースもマスタ系へ移動しました。
# pcs status
Cluster name: web_cluster
Cluster Summary:
* Stack: corosync
* Current DC: debian11-1 (version 2.0.5-ba59be7122) - partition with quorum
* Last updated: Thu Aug 10 17:10:52 2023
* Last change: Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
* 2 nodes configured
* 14 resource instances configured
Node List:
* Online: [ debian11-0 debian11-1 ]
Full List of Resources:
* ClusterIP (ocf::heartbeat:IPaddr2): Started debian11-0
* WebServer (systemd:nginx): Started debian11-0
* Clone Set: drbd_r0-clone [drbd_r0] (promotable):
* Masters: [ debian11-0 ]
* Slaves: [ debian11-1 ]
* fs_drbd_r0 (ocf::heartbeat:Filesystem): Started debian11-0
* Clone Set: drbd_r1-clone [drbd_r1] (promotable):
* Masters: [ debian11-0 debian11-1 ]
* Clone Set: shared_fs-clone [shared_fs]:
* Started: [ debian11-0 debian11-1 ]
* Clone Set: locking-clone [locking]:
* Stopped: [ debian11-0 debian11-1 ]
* scsi-shooter (stonith:fence_scsi): Started debian11-1
Failed Resource Actions:
* o2cb_monitor_0 on debian11-0 'not installed' (5): call=51, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:10:48 -04:00', queued=0ms, exec=21ms
* o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
マスタ系で確認すると、もちろん、pacemaker サービスが起動していました。
# pcs status
Cluster name: web_cluster
Cluster Summary:
* Stack: corosync
* Current DC: debian11-1 (version 2.0.5-ba59be7122) - partition with quorum
* Last updated: Thu Aug 10 17:12:26 2023
* Last change: Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
* 2 nodes configured
* 14 resource instances configured
Node List:
* Online: [ debian11-0 debian11-1 ]
Full List of Resources:
* ClusterIP (ocf::heartbeat:IPaddr2): Started debian11-0
* WebServer (systemd:nginx): Started debian11-0
* Clone Set: drbd_r0-clone [drbd_r0] (promotable):
* Masters: [ debian11-0 ]
* Slaves: [ debian11-1 ]
* fs_drbd_r0 (ocf::heartbeat:Filesystem): Started debian11-0
* Clone Set: drbd_r1-clone [drbd_r1] (promotable):
* Masters: [ debian11-0 debian11-1 ]
* Clone Set: shared_fs-clone [shared_fs]:
* Started: [ debian11-0 debian11-1 ]
* Clone Set: locking-clone [locking]:
* Stopped: [ debian11-0 debian11-1 ]
* scsi-shooter (stonith:fence_scsi): Started debian11-1
Failed Resource Actions:
* o2cb_monitor_0 on debian11-0 'not installed' (5): call=51, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:10:48 -04:00', queued=0ms, exec=21ms
* o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Discussion