iTranslated by AI
Designing a Runner Architecture to Avoid Shared VM Constraints in Workflows
Conclusion
If your internal CI meets the following conditions, a configuration where VMs are separated for each runner becomes a compelling option:
- The environment uses resources that might conflict on the same host (such as TCP ports) due to parallel test execution.
- Creating and recreating VMs is easy, thanks to technologies like KubeVirt.
- The costs and operational burden associated with an increased number of VMs are acceptable.
Prerequisites
Previous article: https://zenn.dev/n_mug/articles/30f345fab6eb5c
Due to the internal constraints mentioned in my previous article, my team adopted a self-hosted runner using KubeVirt VMs. Therefore, this article proceeds with the assumption that a VM runner is mandatory.
Issues with shared VMs
Although we adopted a shared VM architecture, as we continued operations, instances occurred where multiple runners executed tests in parallel and failed due to conflicts over the same fixed TCP port.
When such problems occur, they can often be resolved by reviewing the workflow design.
For example, in the case of port conflicts, you could randomize port numbers or design the workflow so that only the conflicting parts run serially.
While resolution is possible, the issue is that developers need to be conscious of CI environment constraints, which degrades the developer experience.
To avoid this, I wanted to improve the environment so that it could approach the usage experience of GitHub-hosted runners.
Proposed Solution: Separation of Execution Environments
In this case, while we were able to separate working directories, we were not able to separate host resources.
Therefore, while conflicts in file editing within the working directory could be avoided, conflicts in things like port numbers were inevitable.
As a workaround, serializing tests could be considered. However, as mentioned above, this forces users to be aware of CI constraints, which is not desirable.
Thus, as a permanent solution, I considered separating VMs for each runner. This allows for the separation of host resources and makes it easier to avoid conflicts.
Pros / Cons
shared VM
- 👍
- Easier to keep the number of VMs low.
- 👎
- Host resources are shared, leading to port conflicts.
- Workflow design must account for host resource conflict avoidance.
Separating VMs for each runner
- 👍
- Easier to avoid host resource sharing issues like port conflicts, making it easier to aim for a usage experience similar to GitHub-hosted runners.
- 👎
- Increased administrative burden.
- Since the number of VMs increases, estimation of idle costs and operational load is necessary.
- Adoption is difficult if there is no easy means of VM creation.
- Standardization and automation to avoid manual initial setup and post-setup updates are prerequisites.
- Increased administrative burden.
Manifest Sample
As mentioned above, while a configuration that separates VMs for each runner brings the usage experience closer to that of GitHub-hosted runners, it has the disadvantage of increasing the man-hours required for environment setup. As a practice, with KubeVirt, you can consolidate the initial setup into the VM definition and cloud-config, significantly reducing the differences between runners. Below is a sample:
-
vm.yml: VirtualMachine definition for KubeVirt -
cloud-config.yml: User creation, necessary package installation, and runner setup -
volume.yml: DataVolume definition for OS images
- When putting the following samples into actual operation, it is recommended to load hard-coded or masked parts from environment variables or Secrets as appropriate.
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: test-runner
spec:
runStrategy: Always
template:
spec:
domain:
resources:
requests:
cpu: "8"
memory: "16Gi"
volumes:
- name: test-runner
persistentVolumeClaim:
claimName: test-runner
- cloudInitNoCloud:
# Include networkData if necessary. Omitted here.
secretRef:
name: test-runner-cloud-config
name: cloudinitdisk
#cloud-config
users:
- name: actions-runner
shell: /bin/bash
create_home: true
packages:
- ca-certificates
- curl
- gnupg
- lsb-release
- git
# List other necessary packages as needed
runcmd:
- |
set -e
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
- apt-get update -y
- apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
- systemctl enable --now docker
- usermod -aG docker actions-runner
- |
set -e
mkdir -p /opt/actions-runner/runner
chown -R actions-runner:actions-runner /opt/actions-runner
- |
set -e
curl -fL -o /opt/actions-runner/runner.tar.gz https://github.com/actions/runner/releases/download/v2.334.0/actions-runner-linux-x64-2.334.0.tar.gz
chown actions-runner:actions-runner /opt/actions-runner/runner.tar.gz
tar xzf /opt/actions-runner/runner.tar.gz -C /opt/actions-runner/runner/
chown -R actions-runner:actions-runner /opt/actions-runner/runner
- |
set -e
su - actions-runner -c "
set -e
cd /opt/actions-runner/runner
./config.sh --unattended --url 'https://github.com/***' --token '***' --name 'test-runner' --labels 'hoge,fuga'
"
cd /opt/actions-runner/runner
./svc.sh install actions-runner
./svc.sh start
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: test-runner
spec:
source:
http:
url: "https://example.com/os-template.qcow2"
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "50Gi"
$ kubectl apply -f volume.yml
$ kubectl create secret generic test-runner-cloud-config --from-file=userdata=cloud-config.yml
$ kubectl apply -f vm.yml
$ virtctl ssh vmi/test-runner -- cloud-init status --wait
$ virtctl ssh vmi/test-runner -- tail -50 /var/log/cloud-init-output.log
Summary
With shared VMs, even if workspaces are separated, host resources are shared, so if resources like TCP ports conflict during parallel testing, you end up having to be conscious of those constraints on the workflow side.
In this case, we adopted a configuration that separates VMs to avoid imposing those constraints on users.
As a result, port conflicts were resolved, and it became easier to approach a usage experience similar to GitHub-hosted runners.
On the other hand, this configuration creates different liabilities, such as an increase in managed VMs and idle costs.
Therefore, it is a prerequisite to have an infrastructure where VMs are easy to create, such as KubeVirt, and to be able to standardize initial construction using tools like cloud-config.
In other words, this is also a question of division of responsibility: whether to impose constraints like resource conflicts on users, or to have the infrastructure side shoulder the operational costs.
It is a bit more work as an administrator, but I encourage those struggling with DinD issues and port conflicts to give it a try.
Discussion