iTranslated by AI
Expanding Ceph Clusters in Rook
This article is a late entry for the 11th day of the Rook and Friends, Cloud Native Storage Advent Calendar 2020.
In Rook/Ceph, there is a type of cluster called a PVC based cluster. This cluster creates OSDs on top of Kubernetes (hereafter K8s) PersistentVolumes (hereafter PVs). To keep it simple, you can imagine that increasing the number in the storageClassDeviceSets[].count field as described in the link will increase the number of OSDs. This PVC-based cluster allows for easy cluster expansion in terms of both scaling out and scaling up.
First, let's look at scaling out. When you increase the count mentioned above, the K8s administrator normally needs to prepare the PVs where the OSDs will be created. however, if you use a CSI driver that supports Dynamic Provisioning, K8s can automatically create the PVs corresponding to the OSDs just by increasing the count.
There are many CSI drivers that support Dynamic Provisioning, with common examples being block devices in the cloud like Amazon EBS. You can also use TopoLVM for Dynamic Provisioning of local volumes.
Next is scaling up. The storageClassDeviceSets includes a volumeClaimTemplate field, and by increasing the value of the storage field within it, the OSDs and the underlying PVs can be automatically expanded. This requires the CSI driver to support Volume Expansion.
By combining Ceph cluster capacity monitoring with the features mentioned above, it wouldn't be too difficult to automatically increase capacity through both scaling out and scaling up when the cluster starts running out of free space. While monitoring currently needs to be handled manually, there are also proposals for features that record the free capacity in the CephCluster custom resource or automatically expand the cluster based on that data.
Once these proposed features are realized, a Ceph cluster built in the cloud will be able to expand automatically as much as the vendor's infrastructure and the user's budget allow. While some might say, "Why bother building that in a cloud environment...", I personally find it interesting because it's cool and full of potential. If I had infinite money, I'd love to try setting the count to something like 100 million.
Discussion