iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🛋️

Brahms: Random Node Sampling in Dynamic Networks (2008)

に公開
2

Brahms: Byzantine Resilient Random Membership Sampling

Brahms [1] is a distributed algorithm that can randomly sample (extract) arbitrary nodes in a dynamic network where nodes can join and leave freely.

Membership Sampling

As an overall sketch, the Gossipping module collects node IDs circulating in the network and continuously updates a Sampler vector with those node IDs. Over time, the state of the Sampler vector becomes identical to "a set of node IDs chosen randomly from the network." In a stable network with no joins or leaves, once all node IDs have permeated the network, the Sampler vector of every node will be equivalent to a set of node IDs sampled randomly from the network (the sample set differs per node).

Overview

In environments like P2P or large-scale networks where it is difficult to recognize or connect to every other node, it is necessary to select nodes that will form part of your network instead. In such cases, Brahms can be used to sample nodes without bias for connection targets, task delegation, creating tunnels (like Tor), or collecting statistics. Furthermore, a key feature of Brahms is its ability to function even when the network contains Byzantine faulty nodes (nodes that act maliciously).

System Model

Brahms assumes the following system:

  • All nodes join/leave the network at any timing.
  • Every node on the network has a unique ID.
  • The node ID of the sender of a received message can be accurately determined.
    • However, authentication (prior exchange of public keys or distribution of certificates for identity proof) is not required.
  • Intermediaries cannot tamper with messages exchanged between peers.

Sampling Algorithm

Node IDs are diffused through the network via the Brahms Gossip protocol. Nodes maintain samples of node IDs by continuously updating a Sampler vector with observed node IDs (note that this is not sampling against an arbitrary snapshot of the network). Since the network changes dynamically as nodes join and leave, the samples maintained by Brahms also change over time.

min-wise independent permutations

Brahms's sampling is performed by a state machine called a "Sampler" that follows the logic: "calculate a uniformly random value based on the node ID, and maintain the ID that has the smallest value observed so far as the current sample value q." A single Sampler holds only one ID. Therefore, by aggregating n independent Samplers into a Sampler vector \mathcal{S}, it is possible to sample up to n node IDs uniformly at random. This is based on an algorithm called min-wise independent permutations[2].

Expressing this sampling in Python code, the Sampler's process is as follows (pseudo-code is also shown later). You can see that after calling next() for all node IDs in the network, q holds one node ID chosen randomly from among them.

import hashlib
import secrets

class Sampler:
  def __init__(self):
    self.q = None
    self.salt = secrets.token_bytes(16)

  def h(self, id):
    return hashlib.sha256(id + self.salt).digest()

  def next(self, id):
    if self.q is None or self.h(id) < self.h(self.q):
      self.q = id

  def sample(self):
    return self.q
  • Note: An ID randomization function is a "hash function for which the distribution of output (hash values) for given inputs is guaranteed to be uniform." Empirically, hash functions like SHA-256 or BLAKE2 can be expected to behave this way, but it is not formally known to what extent they provide this quality.

Randomization Function

The randomization function h outputs a uniformly random value for a given input ID. Additionally, h is idempotent, outputting the same value for the same ID. Therefore, a hash function that is guaranteed to allow its output values to be used as random numbers can be applied (note that not all hash algorithms guarantee this).

h needs to randomly change what value is output for the same ID for each Sampler (and each time it is initialized). Therefore, it can actually be regarded as a hash function with a hyperparameter like salt.

This is equivalent to using a hash function randomly selected from a set of hash functions with certain mathematical properties, which minimizes collisions with other Samplers (this technique is called Universal Hashing). While the paper considers such collisions negligible and excludes them from discussion, they may occur more frequently when the number of samples is small.

Health Checks

To confirm that the node q held as the minimum value by each Sampler is active, health checks are performed periodically using methods such as ping. If q is determined to be no longer active, that Sampler is initialized and restarts with q=\perp (meaning null) and a new randomization function h.

Gossiping Algorithm

Generally, push and pull are used together for information transmission via Gossiping. Push is an operation to diffuse the information one possesses to adjacent nodes. However, since information might not reach the edges of the network due to the failure of relaying nodes with push alone, each node periodically issues a pull request to an arbitrary node to check if its own state is up to date.

Separately from the Sampler vector \mathcal{S}, Brahms maintains a view \mathcal{V}, which is a set of at most m node IDs that the node currently recognizes. This view \mathcal{V} can also be thought of as the current set of neighbor nodes. In the initial state (since communication would be impossible if \mathcal{V}=\emptyset), a set \mathcal{V}_0 of node IDs obtained from known node ID sets or super-peers is used.

The flow of Brahms's Gossiping process is as follows:

  1. Randomly select \alpha m node IDs from \mathcal{V} and send a push_request.
  2. Randomly select \beta m node IDs from \mathcal{V} and send a pull_request.
  3. Wait for responses.
  4. Let the set of sender node IDs of received push_requests be \mathcal{V}_{\rm push}.
  5. Send a pull_reply containing \mathcal{V} to the sender node IDs of received pull_requests.
  6. Let the union of the node ID sets contained in received pull_replys be \mathcal{V}_{\rm pull} (ignore pull_replys from those to whom no pull_request was sent).
  7. If the number of node IDs in \mathcal{V}_{\rm push} is \alpha m or less, and neither \mathcal{V}_{\rm push} nor \mathcal{V}_{\rm pull} is empty, the union of the following sets becomes the new view \mathcal{V}:
    • A set of \alpha m node IDs chosen randomly from \mathcal{V}_{\rm push}.
    • A set of \beta m node IDs chosen randomly from \mathcal{V}_{\rm pull}.
    • A set of node IDs held by \gamma m Samplers chosen randomly from \mathcal{S}.
  8. Update the Sampler vector \mathcal{S} with the newly observed \mathcal{V}_{\rm push} and \mathcal{V}_{\rm pull}.

\alpha, \beta, and \gamma are values in the range of 0 to 1 that represent the influence of push, pull, and history on the view, respectively, as ratios. Since \alpha, \beta, \gamma \gt 0 and \alpha + \beta + \gamma = 1, \mathcal{V} is composed of at most m IDs.

The states and data flow held by a Brahms node, along with the pseudo-code for Sampling and Gossiping that execute them, are summarized in the following figure.

Attack Resilience

In a network where Byzantine faulty nodes do not exist and only benign failures like crash failures are assumed, Brahms functions correctly. Once sufficient time has passed and each Sampler vector \mathcal{S} has collected active node IDs on the network, \mathcal{S} converges to a set holding node IDs sampled randomly from the network.

The problem arises in cases where Byzantine faulty nodes with malicious intent are present. The goal of an attacker against Brahms is to fill a node's Sampler vector \mathcal{S} or view \mathcal{V} with IDs of faulty nodes (poisoning), thereby isolating specific nodes from the network or reducing the overall efficiency of the network.

Push Flooding Attack

In general Gossiping, a push flooding attack poisons the entire system by propagating a large amount of incorrect information via push. In such a scenario, correct information in the system is only transmitted via pull, and the proportion of correct information in the system decays exponentially.

On the other hand, Brahms's push_request is a restricted transmission that can only convey its own node ID. Since it's impossible to diffuse a large number of faulty node IDs through push, attacking with a method like push flooding becomes difficult.

Balanced Attack

In a Balanced attack, malicious nodes send push_requests to a wide area of the network to ensure that malicious nodes are included in the views of all nodes. In such a situation, as time passes, the ratio of faulty IDs circulating in the system converges to a certain fixed value. By adjusting the parameters, this fixed value can be made smaller than 1. Therefore, while a Balanced attack can reduce the overall efficiency of the network, whether it can partition the network or isolate specific nodes... depends on the tuning.

Targeted Attack

A targeted attack attempts to take over a node's view \mathcal{V} by concentrating a large number of push_requests from attack nodes onto a single normal node. However, in the Brahms algorithm, to prevent the view \mathcal{V} from being poisoned by a large number of push_requests, the view \mathcal{V} is not updated if the number of push_requests received in a single round exceeds its expected value \alpha m.

Since Brahms's push can only transmit the node's own ID, it is difficult for a single attacker to use this blocking behavior to hinder view updates for all correct nodes.

However, in cases where there are many attackers in the network (for example, about 10% of m), it seems possible to stop view updates with this blocking function by continuously sending push_requests to all normal nodes. In this case, the update of the Sampler vector \mathcal{S} will still progress through node ID collection via pull (though efficiency will decrease). However, there is a possibility that new participating nodes will not be discovered, and as the number of leaving nodes increases over time, the view \mathcal{V} and Sampler vector \mathcal{S} may become useless.

In the absence of an attack, this blocking behavior does not function in most rounds, and the view \mathcal{V} is updated correctly.

History Feedback

Brahms can mitigate targeted attacks, but it cannot prevent them entirely. If a targeted attack succeeds in including a faulty ID in the view, pull requests to that ID will result in even more faulty IDs. Once a view is poisoned, pushes to normal nodes decrease, and the overall system update stagnates.

In Brahms, by incorporating node IDs originating from Sampler vectors that were observed and health-checked in the past into the view \mathcal{V} with a ratio \gamma, it becomes difficult for the view to be completely poisoned and for the node to become isolated.

Personal Impressions

Application to Private Networks: Even in a small-to-medium-sized network with approximately N nodes where Byzantine failures are not assumed, applying Brahms with n=m=N might allow for the construction of a "nearly" fully connected network that supports dynamic joining and leaving of nodes.

Sybil Attack Resistance: In an open network with no restrictions on participation, it is not difficult to make one physical computer appear as 1000 nodes. Brahms appears to allow such a Sybil attack to hinder view updates for all (or some) normal nodes. Without additional restrictive designs (such as an authorized participation system or requiring proof-of-work for pushes) or something that can be "covered by operations," I feel that application to anonymous open environments like P2P might be difficult. Of course, this does not apply to networks that only assume benign arbitrary failures.

Combined Use with Other Probabilistic Data Structures: For example, applying node IDs collected by Gossiping not only to the Sampler vector but also to HyperLogLog allows for obtaining an estimate of the number of nodes connected to the network. Probabilistic data structures like HLL are often used in large-scale data processing. Considering Gossiping as a data source for stream data processing, it feels like it would be compatible with probabilistic data structures such as min-wise independent permutations and HyperLogLog.

脚注
  1. BORTNIKOV, Edward, et al. Brahms: Byzantine resilient random membership sampling. In: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing. 2008. p. 145-154. ↩︎

  2. BRODER, Andrei Z., et al. Min-wise independent permutations. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing. 1998. p. 327-336. ↩︎

GitHubで編集を提案

Discussion

waddy_uwaddy_u

素敵な記事をありがとうございます、このレイヤのアルゴリズムは普段目にすることがないので興味深く読みました。

1点、脚注のリンクがうまくいってないようです。

  • 脚注本体の引用符>はなくてよさそうです
  • [^1]: のように、コロンを付けていただくとうまくリンクされると思います。

修正を検討いただければと思います!


Brahms [^1]

[^1]: BORTNIKOV, Edward, et al. Brahms: Byzantine resilient random membership sampling. In: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing. 2008. p. 145-154.


Brahms [1]

脚注
  1. BORTNIKOV, Edward, et al. Brahms: Byzantine resilient random membership sampling. In: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing. 2008. p. 145-154. ↩︎