🤺

Debian11 で Pacemeker と SCSIフェンシング

2023/08/11に公開

前提

https://zenn.dev/mnod/articles/490386c39bc8d3 の環境を利用します。
イニシエータ側で sda として認識されているデバイスをフェンシングデバイスとします。

# ls -l /dev/disk/by-id/ | grep sda
lrwxrwxrwx 1 root root 9 Aug  9 19:23 scsi-36001405de80b906d5904c9d841819705 -> ../../sda
lrwxrwxrwx 1 root root 9 Aug  9 19:23 wwn-0x6001405de80b906d5904c9d841819705 -> ../../sda

準備

必要なパッケージをインストールします。

# apt install --no-install-recommends fence-agents sg3-utils

設定

設定します。

# pcs stonith create scsi-shooter fence_scsi pcmk_host_list="debian11-0 debian11-1" devices=/dev/disk/by-id/wwn-0x6001405de80b906d5904c9d841819705 meta provides=unfencing

# pcs stonith config scsi-shooter
 Resource: scsi-shooter (class=stonith type=fence_scsi)
  Attributes: devices=/dev/disk/by-id/wwn-0x6001405de80b906d5904c9d841819705 pcmk_host_list="debian11-0 debian11-1"
  Meta Attrs: provides=unfencing
  Operations: monitor interval=60s (scsi-shooter-monitor-interval-60s)

結果を確認します。

# pcs status
Cluster name: web_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: debian11-0 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Thu Aug 10 17:05:33 2023
  * Last change:  Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
  * 2 nodes configured
  * 14 resource instances configured

Node List:
  * Online: [ debian11-0 debian11-1 ]

Full List of Resources:
  * ClusterIP   (ocf::heartbeat:IPaddr2):        Started debian11-0
  * WebServer   (systemd:nginx):         Started debian11-0
  * Clone Set: drbd_r0-clone [drbd_r0] (promotable):
    * Masters: [ debian11-0 ]
    * Slaves: [ debian11-1 ]
  * fs_drbd_r0  (ocf::heartbeat:Filesystem):     Started debian11-0
  * Clone Set: drbd_r1-clone [drbd_r1] (promotable):
    * Masters: [ debian11-0 debian11-1 ]
  * Clone Set: shared_fs-clone [shared_fs]:
    * Started: [ debian11-0 debian11-1 ]
  * Clone Set: locking-clone [locking]:
    * Stopped: [ debian11-0 debian11-1 ]
  * scsi-shooter        (stonith:fence_scsi):    Started debian11-1

Failed Resource Actions:
  * o2cb_monitor_0 on debian11-0 'not installed' (5): call=60, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:04:01 -04:00', queued=0ms, exec=16ms
  * o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

動作テスト

下記をセカンダリ系で実行します。

# pcs stonith fence debian11-0
Node: debian11-0 fenced

リソースがセカンダリ系へ移動しました。

# pcs status
Cluster name: web_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: debian11-1 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Thu Aug 10 17:07:19 2023
  * Last change:  Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
  * 2 nodes configured
  * 14 resource instances configured

Node List:
  * Online: [ debian11-1 ]
  * OFFLINE: [ debian11-0 ]

Full List of Resources:
  * ClusterIP   (ocf::heartbeat:IPaddr2):        Started debian11-1
  * WebServer   (systemd:nginx):         Started debian11-1
  * Clone Set: drbd_r0-clone [drbd_r0] (promotable):
    * Masters: [ debian11-1 ]
    * Stopped: [ debian11-0 ]
  * fs_drbd_r0  (ocf::heartbeat:Filesystem):     Started debian11-1
  * Clone Set: drbd_r1-clone [drbd_r1] (promotable):
    * Masters: [ debian11-1 ]
    * Stopped: [ debian11-0 ]
  * Clone Set: shared_fs-clone [shared_fs]:
    * Started: [ debian11-1 ]
    * Stopped: [ debian11-0 ]
  * Clone Set: locking-clone [locking]:
    * Stopped: [ debian11-0 debian11-1 ]
  * scsi-shooter        (stonith:fence_scsi):    Started debian11-1

Failed Resource Actions:
  * o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

# df | grep /mnt
/dev/drbd0       1324760      32   1239180   1% /mnt

マスタ系で確認すると、pacemaker サービスが強制停止させられていました。

# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
     Loaded: loaded (/lib/systemd/system/pacemaker.service; disabled; vendor preset: enabled)
     Active: inactive (dead) since Thu 2023-08-10 17:07:10 EDT; 2min 47s ago
       Docs: man:pacemakerd
             https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html
    Process: 608 ExecStart=/usr/sbin/pacemakerd -f (code=exited, status=100)
   Main PID: 608 (code=exited, status=100)
        CPU: 6.472s

Aug 10 17:07:10 debian11-0 pacemakerd[608]:  notice: Stopping pacemaker-fenced
Aug 10 17:07:10 debian11-0 pacemaker-fenced[610]:  notice: Caught 'Terminated' signal
Aug 10 17:07:10 debian11-0 pacemakerd[608]:  notice: Stopping pacemaker-based
Aug 10 17:07:10 debian11-0 pacemaker-based[609]:  notice: Caught 'Terminated' signal
Aug 10 17:07:10 debian11-0 pacemaker-based[609]:  notice: Disconnected from Corosync
Aug 10 17:07:10 debian11-0 pacemaker-based[609]:  notice: Disconnected from Corosync
Aug 10 17:07:10 debian11-0 pacemakerd[608]:  notice: Shutdown complete
Aug 10 17:07:10 debian11-0 pacemakerd[608]:  notice: Shutting down and staying down after fatal error
Aug 10 17:07:10 debian11-0 systemd[1]: pacemaker.service: Succeeded.
Aug 10 17:07:10 debian11-0 systemd[1]: pacemaker.service: Consumed 6.472s CPU time.

戻し

セカンダリ系で下記を実行します。

# pcs cluster start debian11-0
debian11-0: Starting Cluster...

マスタ系がクラスタに復帰し、リソースもマスタ系へ移動しました。

# pcs status
Cluster name: web_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: debian11-1 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Thu Aug 10 17:10:52 2023
  * Last change:  Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
  * 2 nodes configured
  * 14 resource instances configured

Node List:
  * Online: [ debian11-0 debian11-1 ]

Full List of Resources:
  * ClusterIP   (ocf::heartbeat:IPaddr2):        Started debian11-0
  * WebServer   (systemd:nginx):         Started debian11-0
  * Clone Set: drbd_r0-clone [drbd_r0] (promotable):
    * Masters: [ debian11-0 ]
    * Slaves: [ debian11-1 ]
  * fs_drbd_r0  (ocf::heartbeat:Filesystem):     Started debian11-0
  * Clone Set: drbd_r1-clone [drbd_r1] (promotable):
    * Masters: [ debian11-0 debian11-1 ]
  * Clone Set: shared_fs-clone [shared_fs]:
    * Started: [ debian11-0 debian11-1 ]
  * Clone Set: locking-clone [locking]:
    * Stopped: [ debian11-0 debian11-1 ]
  * scsi-shooter        (stonith:fence_scsi):    Started debian11-1

Failed Resource Actions:
  * o2cb_monitor_0 on debian11-0 'not installed' (5): call=51, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:10:48 -04:00', queued=0ms, exec=21ms
  * o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

マスタ系で確認すると、もちろん、pacemaker サービスが起動していました。

# pcs status
Cluster name: web_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: debian11-1 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Thu Aug 10 17:12:26 2023
  * Last change:  Thu Aug 10 16:57:40 2023 by root via cibadmin on debian11-0
  * 2 nodes configured
  * 14 resource instances configured

Node List:
  * Online: [ debian11-0 debian11-1 ]

Full List of Resources:
  * ClusterIP   (ocf::heartbeat:IPaddr2):        Started debian11-0
  * WebServer   (systemd:nginx):         Started debian11-0
  * Clone Set: drbd_r0-clone [drbd_r0] (promotable):
    * Masters: [ debian11-0 ]
    * Slaves: [ debian11-1 ]
  * fs_drbd_r0  (ocf::heartbeat:Filesystem):     Started debian11-0
  * Clone Set: drbd_r1-clone [drbd_r1] (promotable):
    * Masters: [ debian11-0 debian11-1 ]
  * Clone Set: shared_fs-clone [shared_fs]:
    * Started: [ debian11-0 debian11-1 ]
  * Clone Set: locking-clone [locking]:
    * Stopped: [ debian11-0 debian11-1 ]
  * scsi-shooter        (stonith:fence_scsi):    Started debian11-1

Failed Resource Actions:
  * o2cb_monitor_0 on debian11-0 'not installed' (5): call=51, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:10:48 -04:00', queued=0ms, exec=21ms
  * o2cb_monitor_0 on debian11-1 'not installed' (5): call=47, status='complete', exitreason='Setup problem: couldn't find command: /usr/sbin/ocfs2_controld.pcmk', last-rc-change='2023-08-10 17:05:28 -04:00', queued=0ms, exec=23ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
GitHubで編集を提案

Discussion