OrbStack Gotchas — Shared-Kernel Surprises When Running K8s

OrbStack is the lightest way to run Kubernetes on Apple Silicon — shared kernel, instant machine creation, minimal resource consumption. But “lightest” comes with its own set of surprises. This post covers what breaks, what confuses, and what behaves differently when building K8s clusters with OrbStack on an M-series Mac, drawn from real deployment experience with both the Simple cluster and the full 11-machine HA deployment.

Each gotcha follows the same format: what happens, why it happens, and how to fix it. For gotchas that apply to all three tools (UTM, Vagrant, OrbStack), see the HA-Specific Gotchas post. For UTM and Vagrant-specific issues, see the UTM Gotchas and Vagrant Gotchas posts.

Gotcha #1: Swap Can’t Be Disabled — And That’s Fine

What happens: You run swapoff -a inside an OrbStack machine but swap is still active. free -h continues to show swap space. Historically, kubelet would refuse to start if swap was detected, leading to the cluster failing to come up.

Why it happens: OrbStack machines share the host kernel, and OrbStack uses zram swap that’s managed at the host level. There’s no per-machine swap configuration — you can’t disable it because it’s not your kernel to configure. The swapoff -a command appears to succeed but has no lasting effect because the zram device is managed by the host.

How to fix it: Configure the kubelet to tolerate swap instead of trying to disable it. Kubernetes has supported running with swap since v1.22 (alpha), with the feature graduating to beta in v1.28 and reaching GA in v1.34. This project uses Kubernetes 1.32.0, where the NodeSwap feature gate is beta and enabled by default.

The kubelet configuration sets:

# kubelet-config.yaml
failSwapOn: false

This tells the kubelet to start normally even with swap present. The default swap behavior (NoSwap) means Kubernetes workloads won’t actually use swap — they just won’t be blocked from starting because of it. This isn’t a workaround; it’s the intended configuration for environments where swap can’t be turned off at the host level.

On UTM and Vagrant, VMs have their own kernels and swapoff -a works as expected. This gotcha is unique to OrbStack’s shared-kernel architecture.

Gotcha #2: kube-proxy conntrack Permission Denied

What happens: kube-proxy starts but logs errors about failing to set net.netfilter.nf_conntrack_max or conntrack.maxPerCore. You see “permission denied” in the kube-proxy logs when it tries to modify sysctl parameters.

Why it happens: The shared kernel means certain sysctl parameters can’t be modified from within an OrbStack machine because they would affect the entire host. Conntrack settings are kernel-wide — changing them from inside one machine would change them for all machines and potentially for macOS itself. The kernel correctly denies the write.

How to fix it: Set conntrack.maxPerCore to 0 in the kube-proxy configuration, which tells kube-proxy to skip the sysctl modification and use whatever the host kernel provides:

# kube-proxy-config.yaml
conntrack:
maxPerCore: 0
min: 0

Setting it to 0 doesn’t mean “zero connections allowed” — it means “don’t try to set this value, use the kernel default.” The OrbStack host kernel has a perfectly reasonable conntrack limit already configured. The Ansible roles handle this automatically, but if you’re configuring kube-proxy manually, this is a required change for OrbStack.

Gotcha #3: The Dual-IP Problem — hostname -I Returns the Wrong IP

What happens: A Kubernetes component (kubelet, etcd, API server) binds to the wrong IP address. Running hostname -I inside an OrbStack machine returns two IPs, and the first one might not be the static IP you configured via cloud-init.

Why it happens: OrbStack assigns two IP addresses to each machine’s eth0 interface: the static IP from your cloud-init config (192.168.139.x) and a dynamic IP that OrbStack uses internally for its networking layer. Unlike Vagrant, which puts the two IPs on separate interfaces (eth0 and eth1), OrbStack stacks both on the same interface. hostname -I returns both, and which one is listed first is not guaranteed.

How to fix it: The same principle as the Vagrant dual-NIC problem: never rely on auto-detection. Always specify bind addresses explicitly in every component configuration:

# Check both IPs on eth0
ip addr show eth0
# You'll see two inet entries on the same interface
# Verify which IP the Ansible inventory is using
cat ~/k8s-orbstack-ha-homelab/ansible/inventory/homelab.yml
# All IPs should be 192.168.139.x addresses
# If a component bound to the wrong IP, check its config
ss -tlnp | grep 6443 # API server
ss -tlnp | grep 2379 # etcd
ss -tlnp | grep 10250 # kubelet

The Ansible playbooks handle this by always specifying the exact bind address rather than relying on auto-detection. If you’re adding custom services to the cluster, remember this dual-IP behavior and bind explicitly.

Gotcha #4: File Copy to OrbStack Machines Is Slow

What happens: The Ansible deployment phase takes longer than expected. Specifically, tasks that distribute binaries (Kubernetes binaries, etcd tarball, containerd tarball) across 11 machines are noticeably slower than the same tasks on UTM or Vagrant.

Why it happens: File transfer to OrbStack machines is slower than to full QEMU VMs. This is a known characteristic of OrbStack’s I/O path. Each binary distribution involves SCP/rsync from the jump server to 10 other machines, and the per-file overhead adds up when you’re distributing ~500 MB of binaries across 11 machines.

How to fix it: This is the primary reason OrbStack HA (7m 26s) is over a minute slower than UTM HA (6m 13s) despite machines starting almost instantly. You can’t eliminate the overhead, but the deploy script minimizes it by:

# Pre-caching all binaries on the jump server first
# This happens once, then Ansible distributes from jump to nodes
# over the fast local network
# The alternative — having each machine download from the internet —
# would be even slower since all 11 machines share the same
# OrbStack network path to the Mac's internet connection
# To skip the binary distribution on re-runs:
./scripts/k8s-orbstack-ha-homelab.sh --from-step 7 # Resume after caching

If speed is your top priority, UTM gives you the fastest total deployment time (6m 13s). OrbStack’s advantage is elsewhere — machine creation speed, memory efficiency, and daily driver convenience.

Gotcha #5: VS Code Terminal Can’t Reach OrbStack IPs

What happens: SSH commands to OrbStack machines fail from VS Code’s integrated terminal with “Connection refused” or “No route to host,” but the same commands work perfectly from macOS Terminal.app or iTerm2.

Why it happens: VS Code’s integrated terminal may not inherit the correct network routing for OrbStack’s static IP subnet (192.168.139.0/24). This appears to be related to how VS Code handles network interfaces and DNS resolution internally, particularly when OrbStack’s network configuration is set up through its own networking layer rather than standard macOS interfaces.

How to fix it: Use macOS Terminal.app or iTerm2 for cluster management instead of VS Code’s integrated terminal:

# Test from VS Code terminal
ssh jump # If this fails...
# Try from macOS Terminal
ssh jump # ...and this works, use Terminal for cluster work
# Or use VS Code's remote SSH extension to connect to jump
# and run commands from there

This gotcha doesn’t affect UTM or Vagrant since their networking goes through standard macOS vmnet interfaces that VS Code handles correctly.

Gotcha #6: OrbStack Machines Are Not VMs — Implications

What happens: You try to do something that requires kernel-level access (load a custom kernel module, modify kernel parameters, use a different kernel version) and it doesn’t work. Or you’re troubleshooting a network policy issue and the behavior doesn’t match what you’d see on a real VM.

Why it happens: OrbStack machines share the host kernel (6.17.8-orbstack). They’re lightweight Linux environments, not full VMs with their own kernels. For 95% of Kubernetes learning — deployments, services, RBAC, Helm, monitoring, CI/CD — this is indistinguishable from a full VM. The 5% where differences surface includes: custom kernel module loading, kernel-level security policies (AppArmor/SELinux), low-level syscall behavior, and network namespace isolation edge cases.

How to fix it: You don’t fix this — you understand it and work within it. For the vast majority of K8s learning and experimentation, OrbStack behaves identically to full VMs. If you hit an edge case that requires real kernel isolation, switch to UTM or Vagrant for that specific investigation — the same Ansible roles work across all three tools.

# Check the kernel version inside an OrbStack machine
uname -r
# Returns something like: 6.17.8-orbstack
# This is the shared host kernel, not a per-machine kernel
# Check if a kernel module is available
lsmod | grep overlay
# If the module isn't loaded and you can't modprobe it,
# that's a shared-kernel limitation

Gotcha #7: orb create Silently Fails with Bad Cloud-Init

What happens: orb create ubuntu noble machine-name completes successfully, but the machine doesn’t have the expected hostname, user, SSH key, or static IP. Everything from the cloud-init config was silently ignored.

Why it happens: If the cloud-init YAML has a syntax error (bad indentation, missing colons, invalid YAML), orb create creates the machine anyway but skips the cloud-init configuration. There’s no error message indicating the cloud-init config was invalid. This is more forgiving than UTM (where a bad cloud-init ISO prevents boot) but also more dangerous because the machine appears to work.

How to fix it: Validate your cloud-init YAML before passing it to orb create:

# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('cloud-init/jump.yaml'))"
# After machine creation, verify cloud-init ran
orb run jump -- cloud-init status
# Should say "done" with no errors
# Check if the k8s user was created
orb run jump -- id k8s
# If "no such user", cloud-init didn't apply
# Check cloud-init logs for details
orb run jump -- cat /var/log/cloud-init-output.log | tail -30

If cloud-init didn’t run, delete the machine (orb delete machine-name), fix the YAML, and recreate. The deploy script’s cloud-init configs are tested and working — this gotcha mainly hits when you’re customizing configs.

Gotcha #8: OrbStack Network Prefix Changes Between Installations

What happens: You reinstall OrbStack or upgrade to a new version, and the network prefix changes from 192.168.139 to a different subnet. Existing cloud-init configs, Ansible inventory files, and /etc/hosts entries all reference the old subnet.

Why it happens: OrbStack’s subnet is configured in its settings and can change between installations. The deploy script auto-detects it using orb config show | grep network.subnet4, but hardcoded references in cloud-init configs or manually edited inventory files won’t update automatically.

How to fix it: The deploy script handles this by auto-detecting the prefix and generating all configs dynamically. If you’ve hardcoded IPs anywhere, update them after an OrbStack reinstall:

# Check the current OrbStack subnet
orb config show | grep network.subnet4
# If it changed, re-run the full deploy script
# It will auto-detect the new prefix and regenerate everything
bash scripts/k8s-orbstack-ha-homelab.sh
# Or destroy and recreate if the old machines have wrong IPs
bash scripts/destroy-vms.sh
bash scripts/k8s-orbstack-ha-homelab.sh

Quick Reference: OrbStack Diagnostics

When something goes wrong with OrbStack machines, these commands help narrow down the issue:

# List all OrbStack machines
orb list
# Check machine status
orb info jump
# Run a command inside a machine without SSH
orb run jump -- hostname -I
# Check both IPs on eth0
orb run jump -- ip addr show eth0
# Check cloud-init status
orb run jump -- cloud-init status
# Check swap status (it will always show active)
orb run worker-1 -- free -h
# Check kube-proxy logs for conntrack errors
ssh jump 'kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20'
# Check kernel version (shared host kernel)
orb run jump -- uname -r
# Check OrbStack network config
orb config show | grep network
# Destroy a single machine
orb delete machine-name
# Destroy all machines
bash scripts/destroy-vms.sh

Where to Go Next

These gotchas cover the OrbStack-specific issues. For problems that hit during the Ansible deployment phase — Vault seal/unseal, certificate SANs, etcd quorum, Calico initialization — see the HA-Specific Gotchas post, which covers cross-tool issues that apply regardless of whether you’re running UTM, Vagrant, or OrbStack.

For the full deployment walkthrough, see the OrbStack HA deep dive. For the full roadmap from simple to HA, see From Simple to HA: A Learning Path for Kubernetes on Apple Silicon.

Big tech, small lab. One reel at a time.

Questions, corrections, or want to share how you’re using these repos?

labitlearnit@gmail.com

Enjoyed this post?

Want homelab configs to your email?

Leave a Reply

Discover more from Lab it, learn it

Subscribe now to keep reading and get access to the full archive.

Continue reading