⚙️

Collision of MAC address at large scale Kubernetes cluster

2021/10/13に公開約4,100字

Summary

A MAC address collision issue occurred when using ovs-cni (https://github.com/k8snetworkplumbingwg/ovs-cni/issues/206). The production environment consisted of 100 worker nodes, 24 vlans, and 1000 interfaces per vlan.

Let's start with the conclusion, if you use Kubernetes "normally", the collision probability is at most 0.0639 (%), so there is no need to worry about it. If you want to know what "normal" means and how it is calculated, please read the following.

About MAC address

It is commonly explained that MAC addresses are "addresses unique to hardware" and therefore do not collide. However, in Kubernetes clusters, since we use a lot of virtual interfaces, we also use automatically generated MAC addresses. In this article, we will consider the probability of collision of automatically generated MAC addresses.

How to calculate probability of collision

First, we will formulate an equation to calculate the collision probability. I will try to write it in an easy-to-understand manner, but if you are not good at mathematics, you can skip it.

As a simple example, consider the case where a cube dice is rolled three times.

A collision of rolls can be described as at least two dice rolling the same roll.
The number of all the pattern of rools is 6^{3}.
The number of all rools are different is (6! / (6 - 3)!) = 6 \times 5 \times 4.
So, the probability that at least two dice will show the same number is (6^{3}-(6! / (6 - 3)!)) / 6^{3} = 0.444....
About 44.4(%).

Generalize this. If we write the number of patterns as possibility and the number of trials as trials, probability of collision could be described as (possibility^{trials} - (possibility! / (possibility - trials)!)) / possibility^{trials}.

Probability of collision

Now we will use the above equation to calculate the probability. The MAC address length is 48 bits, but the actual MAC address space is 24 bits because ovs-cni uses the format 02:00:00:XX:XX:XX for MAC addresses. So possibility = 2^{24} = 16,777,216. And trials = 24 \times 1000 = 24,000 because there are 24 vlans, each with 1000 interfaces. Applying this to the above equation, we get a collision probability of 99.9 (%), which indicates that the collision was inevitable. As you may have noticed, the cause of the collision is the MAC address in the format 02:00:00:XX:XX:XX. Therefore, we can say that this is a bug specific to ovs-cni.

After this bug, a fix was made to change the MAC address format to 0A:58:XX:XX:XX:XX, and the MAC address space was extended to 32 bits. So possibility = 2^{32} = 4,294,967,296. trials does not change. Applying this to the above equation, the collision probability is 6.48 (%), which is quite an improvement.

In addition, we have modified MAC address allocation, which we used to do by ourselves, and let SetupVeth() function do the allocation. This is equivalent to using the ip link add command, which allowed us to extend the MAC address space to 46 bits. The reason why the MAC address length is 2 bits less than the 48-bit length is that the following 2 bits are fixed. Descriptions of each bit can be found in the wikipedia https://en.wikipedia.org/wiki/MAC_address.

unicast/multicast bit = 0
globally unique/locally administered bit = 1

Expanding to 46 bits gives possibility = 2^{46} = 70,368,744,177,664, and the probability of collision drops to 0.000409(%).

This solves the problem that occurred with ovs-cni, but consider the case where more interfaces are added. Kubernetes publishes the best practice that the number of containers should be less than 300,000.

https://kubernetes.io/docs/setup/best-practices/cluster-large/

To go along with this
possibility = 2^{46} = 70,368,744,177,664
trials = 300,000.
the probability of collision is 0.0639(%).

So for now, it seems that we don't need to worry about MAC address collisions, but if there are any changes in the best practices for the number of containers due to improvements in Kubernetes performance, etc., we will need to calculate again.

Discussion

ログインするとコメントできます