Collision of MAC address at large scale Kubernetes cluster
Summary
A MAC address collision issue occurred when using ovs-cni (https://github.com/k8snetworkplumbingwg/ovs-cni/issues/206). The production environment consisted of 100 worker nodes, 24 vlans, and 1000 interfaces per vlan.
Let's start with the conclusion, if you use Kubernetes "normally", the collision probability is at most 0.0639 (%), so there is no need to worry about it. If you want to know what "normal" means and how it is calculated, please read the following.
About MAC address
It is commonly explained that MAC addresses are "addresses unique to hardware" and therefore do not collide. However, in Kubernetes clusters, since we use a lot of virtual interfaces, we also use automatically generated MAC addresses. In this article, we will consider the probability of collision of automatically generated MAC addresses.
How to calculate probability of collision
First, we will formulate an equation to calculate the collision probability. I will try to write it in an easy-to-understand manner, but if you are not good at mathematics, you can skip it.
As a simple example, consider the case where a cube dice is rolled three times.
A collision of rolls can be described as at least two dice rolling the same roll.
The number of all the pattern of rools is
The number of all rools are different is
So, the probability that at least two dice will show the same number is
About 44.4(%).
Generalize this. If we write the number of patterns as
Probability of collision
Now we will use the above equation to calculate the probability. The MAC address length is 48 bits, but the actual MAC address space is 24 bits because ovs-cni uses the format 02:00:00:XX:XX:XX for MAC addresses. So
After this bug, a fix was made to change the MAC address format to 0A:58:XX:XX:XX:XX, and the MAC address space was extended to 32 bits. So
In addition, we have modified MAC address allocation, which we used to do by ourselves, and let SetupVeth()
function do the allocation. This is equivalent to using the ip link add
command, which allowed us to extend the MAC address space to 46 bits. The reason why the MAC address length is 2 bits less than the 48-bit length is that the following 2 bits are fixed. Descriptions of each bit can be found in the wikipedia https://en.wikipedia.org/wiki/MAC_address.
unicast/multicast bit = 0
globally unique/locally administered bit = 1
Expanding to 46 bits gives
This solves the problem that occurred with ovs-cni, but consider the case where more interfaces are added. Kubernetes publishes the best practice that the number of containers should be less than 300,000.
To go along with this
the probability of collision is 0.0639(%).
So for now, it seems that we don't need to worry about MAC address collisions, but if there are any changes in the best practices for the number of containers due to improvements in Kubernetes performance, etc., we will need to calculate again.
Discussion