From Simple to HA: A Learning Path for Kubernetes on Apple Silicon
Most Kubernetes tutorials stop at kubectl get nodes. Here’s the structured path from a 6-VM simple cluster to an 11-VM production-grade HA setup — with working code at every level.
Why most Kubernetes learning paths are broken
There are two extremes in Kubernetes education. On one end: minikube, kind, and Docker Desktop — single-node setups that abstract away everything interesting. You learn kubectl commands but nothing about how the cluster actually works. On the other end: “Kubernetes the Hard Way” by Kelsey Hightower — brilliant but intimidating, and it drops you into the deep end with no intermediate steps.
What’s missing is the middle. A structured path that starts simple enough to build confidence but complex enough to teach real concepts, then progressively adds the production patterns that actually matter — HA control planes, proper PKI, etcd clustering, bastion architecture. Each level builds on the previous one, and at every step there’s working code you can run.
That’s what this project provides. Six GitHub repos across three virtualization tools (UTM, Vagrant, OrbStack) at two complexity levels (Simple and HA). Same architecture, same Ansible automation, same component versions — the only variables are the virtualization layer and the complexity level. This post maps the learning path through all of it.
Two levels, three tools, six repos
Every repo in this project shares the same foundation: Kubernetes 1.32.0 installed from raw binaries (no kubeadm), HashiCorp Vault for PKI certificate management, Ansible for automation, and Ubuntu 24.04 ARM64 as the base OS. The difference is scope.
| Simple (6 VMs) | HA (11 VMs) | |
|---|---|---|
| UTM | k8s-utm-simple | k8s-utm-ha |
| Vagrant | k8s-vagrant-simple | k8s-vagrant-ha |
| OrbStack | k8s-orbstack-simple | k8s-orbstack-ha |
Not sure which tool to pick? The UTM vs Vagrant vs OrbStack comparison covers deployment times, resource consumption, networking differences, and when each tool makes sense. Short version: OrbStack for the easiest start with the lowest resource footprint, UTM for maximum production realism, Vagrant for declarative infrastructure-as-code.
Simple cluster: learn the fundamentals (6 VMs)
Start here. The simple setup is deliberately constrained — one master, one etcd node, two workers — but it’s not a toy. Every simple cluster includes a dedicated HashiCorp Vault server for PKI, a jump/bastion server, and Kubernetes installed the hard way from raw binaries. This is already more sophisticated than 90% of homelab tutorials.
What you’re building
| VM | Role | What you learn |
|---|---|---|
| vault | PKI & Secrets | Certificate management, Vault operations, PKI hierarchy |
| jump | Bastion / Ansible | Bastion pattern, SSH ProxyJump, Ansible automation |
| etcd-1 | Key-value store | etcd basics, TLS configuration, data storage |
| master-1 | Control plane | API server, controller-manager, scheduler — how they connect |
| worker-1/2 | Worker nodes | kubelet, kube-proxy, containerd, pod scheduling |
Concepts you’ll understand after Level 1
How Kubernetes components connect. The API server is the hub — everything talks to it. The controller-manager and scheduler connect as clients. Kubelets on worker nodes register with it. etcd sits behind it as the data store. Understanding this topology is fundamental, and the simple setup makes it visible because each component runs on a separate, identifiable VM.
Why certificates matter. Even the simple cluster uses Vault PKI with a 3-tier CA hierarchy — Root CA, Intermediate CA, and leaf CAs for Kubernetes, etcd, and the front proxy. Every connection between components is authenticated with TLS certificates. You’ll see firsthand what happens when a certificate is wrong, expired, or signed by the wrong CA.
What “the hard way” actually means. No kubeadm, no abstractions. Every binary is downloaded individually. Every systemd unit file is written from scratch. Every kubeconfig is generated with explicit certificate references. When something breaks, you know exactly which config file to check because you wrote it.
The bastion pattern. Your Mac only connects to the jump server. From jump, you reach every other node. This is how production environments restrict access — a single hardened entry point instead of every node being directly SSH-accessible.
Ansible as infrastructure automation. The entire deployment is driven by Ansible playbooks and roles. You’ll learn how idempotent tasks work, how inventory files map to real machines, and how roles encapsulate reusable automation. Every playbook can be run multiple times safely — the second run changes nothing.
Deployment times (Simple)
From cold start to kubectl get nodes showing all nodes Ready.
What’s missing from your simple cluster
The simple cluster works. Pods deploy, services route, kubectl responds. But hold it up against a production checklist and it fails on several critical items. Understanding why it fails is exactly where the most valuable learning happens.
Single point of failure: the control plane
One master node. Reboot it and kubectl stops responding. No new pods get scheduled. Existing workloads on workers keep running but can’t be managed, scaled, or healed. In production, losing the control plane means losing all operational capability.
Single point of failure: etcd
One etcd node stores the entire cluster state — every pod, service, secret, configmap, and RBAC policy. A disk failure or process crash means total data loss. No quorum, no consensus, no fault tolerance. You also never learn how Raft leader election works with a single node.
No load balancer for the API server
Every kubeconfig points directly to master-1’s IP. If you add a second master later, clients don’t know about it. There’s no abstraction layer between API server clients and the actual API server instances.
Only two workers
Two workers means limited scheduling decisions. Pod anti-affinity, topology spread constraints, and node failure scenarios are harder to explore. Three workers give the scheduler meaningful choices.
For the full production-readiness audit — including certificate rotation, etcd mutual TLS, network policies, and monitoring — see Why Your Homelab K8s Cluster Isn’t Production-Ready (And How to Fix It).
HA cluster: production patterns on your laptop (11 VMs)
Level 2 fixes every gap from Level 1 and adds five new VMs. The architecture goes from “works” to “would survive a basic production review.” Here’s what changes and — more importantly — why each change matters.
What’s added in HA
| New VM | Role | Why it exists |
|---|---|---|
| haproxy | API server load balancer | Abstracts away individual master IPs. All clients point to HAProxy. If a master dies, traffic routes to the survivor within seconds. |
| master-2 | Second control plane | Eliminates single point of failure. Both masters run identical components. Controller-manager and scheduler use leader election — only one is active, but the other takes over instantly on failure. |
| etcd-2 | Second etcd node | Three etcd nodes form a Raft consensus cluster. Quorum requires a majority (2 of 3), so the cluster tolerates one node failure. You learn leader election, log replication, and what happens during a network partition. |
| etcd-3 | Third etcd node | |
| worker-3 | Third worker | Meaningful scheduling: pod anti-affinity, topology spread, and realistic node failure scenarios with workload redistribution. |
The full 11-VM architecture
| VM | Role | Simple | HA |
|---|---|---|---|
| haproxy | Load balancer | — | ✓ |
| vault | PKI & Secrets | ✓ | ✓ |
| jump | Bastion / Ansible | ✓ | ✓ |
| etcd-1 | etcd | ✓ | ✓ |
| etcd-2/3 | etcd cluster | — | ✓ |
| master-1 | Control plane | ✓ | ✓ |
| master-2 | Control plane | — | ✓ |
| worker-1/2 | Worker nodes | ✓ | ✓ |
| worker-3 | Worker node | — | ✓ |
Deployment times (HA)
From cold start to full 11-VM HA cluster with all nodes Ready and Calico CNI installed.
The concepts that make HA meaningful
Moving from Simple to HA isn’t just adding more VMs. Each new component introduces a concept that matters in production. Here’s a primer on the three most important ones.
etcd quorum and the Raft consensus protocol
A single etcd node is a database. Three etcd nodes are a distributed consensus cluster. The difference is fundamental.
etcd uses the Raft protocol to maintain consistency across nodes. One node is elected leader — all writes go through it. The leader replicates each write to the followers, and the write is only committed once a majority (quorum) acknowledges it. With 3 nodes, quorum is 2. This means one node can fail completely and the cluster continues operating normally.
This is why the magic number is 3, not 2. A 2-node etcd cluster has a quorum of 2 — both nodes must be healthy. Two nodes is actually worse than one for availability. Three nodes, five nodes, seven nodes — always odd numbers, because the quorum math works: (n/2)+1 means 3→2, 5→3, 7→4.
In the HA setup, you can test this yourself. SSH to the jump server, stop etcd on one node (sudo systemctl stop etcd), and confirm the cluster still works.
All three etcd nodes communicate using mutual TLS. Both peer and server certificates are signed by the dedicated etcd CA, separate from the Kubernetes CA. The CA separation is covered in the Vault PKI deep dive.
HAProxy and API server load balancing
In the simple setup, every kubeconfig points to https://master-1:6443. Add a second master and clients don’t know about it.
HAProxy sits in front of both masters as a TCP load balancer on port 6443. Every client points to https://haproxy:6443. HAProxy round-robins and runs health checks.
Only one controller-manager and one scheduler are active at any time via leader election. The API server itself doesn’t need leader election — both instances serve requests simultaneously.
Vault PKI: the 3-tier CA hierarchy
Both Simple and HA clusters use a 3-tier Certificate Authority hierarchy with separate CAs for Kubernetes, etcd, and the front proxy. HA makes the consequences of good certificate design more visible.
The certificate count grows from about 15 in Simple to over 25 in HA. The Vault PKI deep dive covers the full hierarchy and the three Ansible roles that automate it.
Same components, same versions, every repo
All six repos share identical component versions: Kubernetes 1.32.0, etcd 3.5.12, containerd 1.7.24, runc 1.2.4, Calico CNI 3.28.0, Vault 1.15.4, Ubuntu 24.04 (Noble) ARM64.
The Ansible roles are also shared. Improvements to any role benefit all six repos immediately.
Which virtualization tool at each level
The full comparison post covers this in depth.
Starting out? Use OrbStack Simple.
Lowest barrier to entry. 6 VMs use ~10 GB disk. Your Mac stays cool.
Ready for HA? Your Mac’s RAM decides.
32 GB or less: OrbStack HA. 48 GB or more: UTM HA or Vagrant HA.
Want infrastructure-as-code?
Vagrant. Everything in a Vagrantfile. See the Vagrant deep dive.
Get running in 5 minutes
# Install OrbStack from orbstack.dev, then:
ssh-keygen -t ed25519 -f ~/.ssh/k8slab.key -C “k8s-homelab” -N “”
git clone https://github.com/labitlearnit/k8s-orbstack-simple-homelab.git
cd k8s-orbstack-simple-homelab
bash scripts/k8s-orbstack-simple-homelab.sh
Big tech, small lab. One reel at a time.
The full series
Questions, corrections, or want to share how you’re using these repos?
labitlearnit@gmail.com
Leave a Reply