Vault PKI for Kubernetes: 3-Tier CA the Right Way

Every post in this series — the UTM deep dive, the Vagrant walkthrough, and the OrbStack guide — mentions a “3-tier PKI CA hierarchy with HashiCorp Vault.” This post is the one that explains what that actually means, why Kubernetes needs it, and how three Ansible roles automate the entire thing. These roles are common across all three projects — UTM, Vagrant, and OrbStack use the exact same code.

Most Kubernetes tutorials skip proper certificate management entirely. They either use kubeadm (which hides the CA behind auto-generated self-signed certs) or generate everything with openssl in a flat structure — one CA for everything, no hierarchy, no rotation capability. That works for a quick demo, but it teaches the wrong mental model for production.

This project takes a different approach: a dedicated HashiCorp Vault server manages the entire PKI lifecycle through a proper CA chain. Three Ansible roles handle it end to end — bootstrapping Vault, configuring the PKI hierarchy, and issuing certificates to every component across the cluster. The full source is on GitHub: ansible/roles

Why Three CAs, Not One?

Kubernetes uses TLS certificates for authentication and encryption between every component. The API server talks to etcd over mutual TLS. Kubelets authenticate to the API server with client certificates. The aggregation layer (metrics server, custom API servers) uses a separate trust chain called the front proxy. If all of these used the same CA and that CA’s private key were compromised, an attacker could mint a certificate for any identity — forge an API server cert, create a fake kubelet, or impersonate the controller manager. Even without a full CA key compromise, a single CA means there’s no cryptographic boundary between trust domains: the etcd cluster, the Kubernetes control plane, and the API aggregation layer all share the same root of trust, so a misconfiguration in one domain’s certificate issuance could affect another.

The solution is CA separation — three leaf CAs, each with a specific scope:

CA	Scope	Vault Mount	What It Signs
Kubernetes CA	API server, control plane, kubelets	pki_kubernetes	kube-apiserver, controller-manager, scheduler, kubelet, kube-proxy, admin, service accounts
etcd CA	etcd cluster communication	pki_etcd	etcd server certs, peer certs, client certs, healthcheck client
Front Proxy CA	API aggregation layer	pki_front_proxy	front-proxy-client

Each CA is chained to a shared intermediate, which chains to a root. The root CA sits at the top, signs nothing directly, and acts purely as the trust anchor. The intermediate CA signs the three leaf CAs. The leaf CAs issue the actual component certificates. This is the same pattern used by public CAs like Let’s Encrypt — root offline, intermediate active, leaf issuers purpose-scoped.

The Full CA Hierarchy

Root CA (pki_root) — 365-day TTL, pathlen:2
└── Intermediate CA (pki_int) — 180-day TTL, pathlen:1
    ├── Kubernetes CA (pki_kubernetes) — 90-day TTL, pathlen:0
    │   ├── kube-apiserver
    │   ├── kube-controller-manager
    │   ├── kube-scheduler
    │   ├── kubelet (per worker node, server + client certs)
    │   ├── kube-proxy
    │   ├── admin
    │   └── service-accounts (signing keypair)
    ├── etcd CA (pki_etcd) — 90-day TTL, pathlen:0
    │   ├── etcd server (per-node, with node-specific SANs)
    │   ├── etcd peer (per-node, for inter-node communication)
    │   ├── etcd client (for API server access)
    │   └── etcd healthcheck client
    └── Front Proxy CA (pki_front_proxy) — 90-day TTL, pathlen:0
        └── front-proxy-client

The TTL design is intentional. The root CA has the longest lifetime (365 days) because rotating it means redistributing trust to every component. The intermediate CA has a shorter TTL (180 days) because replacing it only requires re-signing the leaf CAs. The leaf CAs have the shortest TTL (90 days) — they’re the most active issuers and the most likely attack surface. The pathlen constraints enforce the hierarchy: pathlen:2 on root means it can sign CAs that themselves sign CAs (two levels below), pathlen:1 on intermediate means it can sign leaf CAs but not deeper, and pathlen:0 on leaf CAs means they can only sign end-entity certificates.

The Three Ansible Roles

The Vault PKI automation is split across three Ansible roles, each handling a distinct phase. These roles are shared identically across the UTM, Vagrant, and OrbStack projects — the Ansible code doesn’t know or care which virtualization layer created the machines.

Role	Playbook	Purpose
vault-bootstrap	vault-full-setup.yml	Install Vault, initialize with Shamir, unseal, save credentials
vault-pki-setup	vault-full-setup.yml	Configure the 5 PKI secrets engines + roles + CA chain
k8s-certs	k8s-certs.yml	Issue and deploy certificates to all cluster nodes

Both vault-bootstrap and vault-pki-setup are called from the same playbook (vault-full-setup.yml), which runs on the vault host. The k8s-certs role runs from a separate playbook targeting all cluster nodes.

Role 1: vault-bootstrap

This role takes a fresh Ubuntu machine and turns it into a running, unsealed Vault server. The sequence:

Install Vault binary. The role downloads HashiCorp Vault 1.15.4 for ARM64 (or uses a pre-cached copy from the jump server if available). The binary is placed in /usr/local/bin/ and given executable permissions. A vault system user and group are created for process isolation.

Configure Vault server. The Vault configuration file is templated to /etc/vault.d/vault.hcl. Key settings include the storage backend (file-based at /opt/vault/data), the listener address (0.0.0.0:8200 — accessible from all VMs), UI enabled, and TLS disabled for internal lab use. A systemd unit is created so Vault starts on boot and can be managed with standard service commands.

Initialize Vault. The role runs vault operator init with Shamir’s Secret Sharing parameters: 5 key shares, 3 threshold. This means Vault generates 5 unseal keys, and any 3 of them are needed to unseal the vault after a restart. The initialization output — all 5 unseal keys plus the root token — is saved to .vault-credentials/vault-init.json on the jump server. This is critical: lose these keys and the vault data is permanently inaccessible.

Unseal Vault. Using the first 3 unseal keys from the saved credentials, the role unseals the vault. Each vault operator unseal call provides one key share, and after the third, the vault transitions from sealed to unsealed and begins serving requests.

After this role completes, Vault is running, unsealed, and ready for PKI configuration. The root token is available for the next role to authenticate with.

Role 2: vault-pki-setup

This is the most complex role. It builds the entire 5-engine PKI hierarchy inside Vault using the Vault HTTP API. Every step is an API call wrapped in an Ansible uri module task.

Enable PKI secrets engines. Five separate PKI secrets engines are mounted, each at its own path with its own max TTL:

Engine Path	Purpose	Max TTL
pki_root	Root CA	365 days
pki_int	Intermediate CA	180 days
pki_kubernetes	Kubernetes leaf CA	90 days
pki_etcd	etcd leaf CA	90 days
pki_front_proxy	Front proxy leaf CA	90 days

Generate the Root CA. The role calls Vault’s pki_root/root/generate/internal endpoint to create a self-signed root certificate. The common name is set to something descriptive like “K8s Lab Root CA”. The key type is RSA 4096-bit. The root CA private key is generated and stored within Vault’s encrypted storage backend (the file backend at /opt/vault/data in this project). The key is protected by Vault’s barrier encryption and is never exposed through any API endpoint — there is no Vault API call that returns the root private key. CRL and issuing URLs are configured so downstream CAs can publish revocation information.

Generate and sign the Intermediate CA. A CSR (Certificate Signing Request) is generated at the pki_int mount. This CSR is then submitted to the root CA (pki_root/root/sign-intermediate) for signing. The signed certificate is imported back into the pki_int engine. This establishes the chain: Root → Intermediate.

Generate and sign the three leaf CAs. The same CSR → sign → import pattern repeats three times, once for each leaf CA. Each leaf CA generates its CSR, submits it to the intermediate CA for signing, and imports the signed certificate. After this step, the full chain is established: Root → Intermediate → Kubernetes/etcd/Front Proxy.

Create PKI roles. This is where the certificate issuance policy is defined. Each leaf CA gets one or more Vault PKI roles that specify exactly what certificates can be issued from it. For example, the Kubernetes CA might have a role that allows certificates with specific SANs (Subject Alternative Names), key usages (digital signature, key encipherment), and extended key usages (server auth, client auth). The etcd CA has separate roles for server certificates (with node-specific SANs), peer certificates (for etcd-to-etcd communication), and client certificates (for the API server to connect to etcd).

The PKI roles are the policy enforcement layer. Even if someone has the Vault token, they can only issue certificates that match the role’s constraints. A role configured for etcd server certs won’t allow issuing a certificate with a Kubernetes API server SAN. This is defense in depth — the CA separation prevents cross-domain impersonation, and the roles prevent within-domain abuse.

Role 3: k8s-certs

With the PKI infrastructure in place, the k8s-certs role issues actual certificates for every component and deploys them to the correct nodes. This role runs with forks=12 in Ansible, distributing certificates to all nodes in parallel.

For each certificate, the role calls Vault’s pki_{ca}/issue/{role} endpoint, which generates a fresh key pair, creates a certificate signed by the appropriate CA, and returns the certificate, private key, and CA chain. These are then written to standardized paths on each node.

etcd certificates are deployed to /etc/etcd/pki/ on each etcd node. Each node gets its own server certificate with SANs that include the node’s IP address and hostname, a peer certificate for inter-node replication, and a client certificate. Additionally, an etcd client certificate is deployed to the master nodes so kube-apiserver can authenticate to etcd.

Kubernetes certificates go to /etc/kubernetes/pki/. The API server certificate includes SANs for all master IPs, the HAProxy VIP, the cluster service IP (10.96.0.1), and relevant hostnames. The controller manager, scheduler, and admin each get their own client certificates. A service account signing keypair is generated for token creation. Per-node kubelet certificates are deployed to each worker node. Note that in this hard-way setup, masters don’t run kubelet — the control plane components (kube-apiserver, controller-manager, scheduler) run directly as systemd services, not as pods managed by kubelet. This is why master nodes don’t appear in kubectl get nodes output. In a kubeadm-based cluster, masters do run kubelet (which manages the control plane as static pods), so they show up as nodes. Here, only workers register with the API server.

Front proxy certificates also go to /etc/kubernetes/pki/ on the master nodes. The front-proxy-client cert is used by the API server when proxying requests to extension API servers (like metrics-server). It’s signed by a completely separate CA, so extension API servers can independently validate that requests are coming from the real API server.

Every certificate, key, and CA bundle is deployed with correct file ownership and permissions — private keys are 0600, certificates are 0644, and directories are owned by the service user.

How the Certificates Flow

Here’s the concrete flow of which component uses which certificate from which CA, and why:

kube-apiserver → etcd: The API server connects to etcd as a client. It presents the etcd client certificate (signed by the etcd CA) and verifies the etcd server’s cert against the etcd CA bundle. This is mutual TLS — both sides authenticate. Because the etcd client cert is signed by the etcd CA (not the Kubernetes CA), a stolen Kubernetes component cert can’t be used to connect to etcd directly.

kubelet → kube-apiserver: Each kubelet (running on worker nodes) connects to the API server (via HAProxy) using its kubelet client certificate (signed by the Kubernetes CA). The API server verifies this against the Kubernetes CA bundle. The kubelet also serves its own HTTPS endpoint for log streaming and exec — for this it uses its kubelet server certificate, also signed by the Kubernetes CA.

etcd ↔ etcd (peer): Each etcd node connects to every other etcd node for Raft consensus. Both sides present etcd peer certificates (signed by the etcd CA) and verify the other’s cert against the same CA. This is a closed loop — only certificates signed by the etcd CA can participate in the cluster.

API aggregation (front proxy): When the API server proxies a request to an extension API server (e.g., metrics-server), it presents the front-proxy-client certificate (signed by the Front Proxy CA). The extension API server is configured to trust the Front Proxy CA. This is separate from the main Kubernetes CA so that extension API servers can differentiate between direct client requests and proxied requests from the API server.

Why Vault Instead of openssl or cfssl?

Many Kubernetes-the-hard-way tutorials use openssl or cfssl to generate certificates. These tools work, but they have significant limitations for anything beyond a one-time setup:

No built-in rotation. With openssl, rotating a certificate means manually regenerating it, redistributing it to every node that uses it, and restarting the affected services. With Vault, the same API call that issued the original cert issues a replacement — the Ansible role just runs again.

No revocation. If a certificate is compromised, openssl has no way to revoke it without manually maintaining a CRL (Certificate Revocation List). Vault has built-in CRL support — revoke a certificate via the API and all clients that check the CRL will reject it.

No audit trail. Vault logs every certificate issuance with timestamps, requesting identity, and certificate details. With openssl, you have whatever your shell history shows.

No policy enforcement. Vault PKI roles enforce what certificates can be issued — allowed domains, maximum TTLs, required SANs. With openssl, anyone with the CA key can sign anything.

Key isolation. When using Vault’s internal key type, CA private keys are stored within Vault’s encrypted storage and are never exposed through any API endpoint. You can issue certificates signed by the root CA, but you cannot retrieve the root CA’s private key — Vault simply doesn’t offer an API for that. The keys do reside on disk (inside the storage backend), but they’re protected by Vault’s barrier encryption, which requires the unseal keys to access. With openssl, the root key sits directly on a filesystem as a plain PEM file, often with permissions that are too broad.

The tradeoff is complexity. Vault is another service to deploy, initialize, and maintain. For a homelab, that’s actually a benefit — you learn to operate Vault, which is a production skill. For a throwaway demo, openssl is faster.

The Vault Unseal Problem

One operational reality of this setup: Vault seals itself on every restart. When you reboot the vault VM (or destroy and recreate the cluster), Vault starts in a sealed state and won’t serve any requests until 3 of the 5 unseal keys are provided.

The deploy script handles this automatically during initial setup. But for day-to-day use, a vault-unseal helper function is defined in the jump server’s .bashrc. It reads the unseal keys from the saved credentials JSON and applies them in sequence. Just SSH to jump and run vault-unseal.

# After a VM restart, from the jump server:
vault-unseal

# Verify Vault is unsealed:
vault status

In production, you’d use auto-unseal with a cloud KMS (AWS KMS, GCP Cloud KMS, Azure Key Vault) so the vault unseals itself on restart. For a homelab, manual unsealing with saved keys is a reasonable tradeoff that teaches you how the seal mechanism works.

Certificate Paths on Each Node

After the k8s-certs role runs, certificates are deployed to standardized locations across the cluster:

# etcd nodes (/etc/etcd/pki/)
etcd-server.crt / etcd-server.key        # Server cert (node-specific SANs)
etcd-peer.crt / etcd-peer.key              # Peer cert (inter-node)
etcd-ca.crt                                 # etcd CA bundle

# Master nodes (/etc/kubernetes/pki/)
kube-apiserver.crt / kube-apiserver.key    # API server cert
etcd-client.crt / etcd-client.key          # For API server → etcd auth
front-proxy-client.crt / .key              # API aggregation
sa.key / sa.pub                             # Service account signing
ca.crt                                      # Kubernetes CA bundle
front-proxy-ca.crt                          # Front proxy CA bundle
etcd-ca.crt                                 # etcd CA (for API server to verify etcd)
# Note: no kubelet certs — masters don't run kubelet in this hard-way setup

# Worker nodes (/etc/kubernetes/pki/)
kubelet.crt / kubelet.key                  # Kubelet server+client cert
ca.crt                                      # Kubernetes CA bundle
kube-proxy.crt / kube-proxy.key            # kube-proxy client cert

The CA bundles are the important detail. Each node only gets the CA bundles it needs to validate the connections it makes. Workers get the Kubernetes CA bundle (to validate API server connections) but not the etcd CA bundle (workers never talk to etcd). Masters get all three CA bundles because the API server connects to etcd, serves the Kubernetes API, and proxies to extension API servers.

Common Across UTM, Vagrant, and OrbStack

The most important architectural decision in this project: the Vault PKI roles are completely decoupled from the virtualization layer. The same vault-bootstrap, vault-pki-setup, and k8s-certs roles run on all three platforms without any changes. The roles don’t reference any VM-specific IPs or hostnames — those come from the Ansible inventory, which is the only file that changes between UTM (192.168.64.x), Vagrant (192.168.105.x), and OrbStack (192.168.139.x).

This separation is why the project can exist as three repos with shared Ansible code. Change the inventory file, and the same PKI infrastructure deploys across fundamentally different virtualization layers. It’s also why improvements to the PKI automation — adding certificate rotation, tightening role constraints, extending TTLs — benefit all three projects simultaneously.

# The only difference between the three projects' PKI:
# UTM inventory:     192.168.64.x
# Vagrant inventory: 192.168.105.x
# OrbStack inventory: 192.168.139.x
#
# The roles, playbooks, and PKI structure are identical.

Verifying the Setup

After deployment, you can verify the PKI hierarchy directly from the jump server using the Vault CLI:

# Check Vault status
vault status

# List all PKI mounts
vault secrets list | grep pki

# View the root CA certificate
vault read pki_root/cert/ca

# List roles on the Kubernetes CA
vault list pki_kubernetes/roles

# Issue a test certificate (optional)
vault write pki_kubernetes/issue/k8s-server \
    common_name="test.kubernetes.local" \
    ttl="24h"

You can also verify from any cluster node that the certificates chain correctly:

# On a master node — verify the API server cert chains to the root
openssl verify -CAfile /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/kube-apiserver.crt

# On an etcd node — verify the server cert chains to the etcd CA
openssl verify -CAfile /etc/etcd/pki/etcd-ca.crt /etc/etcd/pki/etcd-server.crt

# Check the full chain (shows Root → Intermediate → Leaf CA → Cert)
openssl x509 -in /etc/kubernetes/pki/kube-apiserver.crt -text -noout | grep -A2 Issuer

What This Teaches You

Building a 3-tier CA with Vault for Kubernetes isn’t just about getting TLS to work. It teaches several production-relevant concepts:

CA hierarchy design — why root CAs should be offline (or at least isolated), why intermediates exist, and how pathlen constraints enforce trust boundaries. This is the same architecture that secures HTTPS on the public internet.

Blast radius containment — if the etcd CA is compromised, only etcd certificates need rotation. The Kubernetes API and front proxy are unaffected. This principle applies to any system with separate trust domains.

Vault operations — initializing, unsealing, managing secrets engines, defining roles, issuing certificates via API. These are daily tasks for infrastructure teams running Vault in production.

Ansible-driven PKI automation — how to wrap API calls in idempotent Ansible tasks, how to manage secrets (unseal keys, tokens) safely, and how to distribute certificates across a cluster without manual SCP.

Kubernetes TLS internals — which component talks to which, what certificate flags matter (key usage, extended key usage, SANs), and why certain connections require mutual TLS while others use one-way verification.

Running It Yourself

The Vault PKI roles are part of every project in the series. Pick the virtualization tool that fits your setup and clone the repo:

# UTM (full VMs, fastest HA deployment)
git clone https://github.com/labitlearnit/k8s-utm-ha-homelab.git

# Vagrant (declarative, Vagrantfile-driven)
git clone https://github.com/labitlearnit/k8s-vagrant-ha-homelab.git

# OrbStack (lightweight, lowest resource usage)
git clone https://github.com/labitlearnit/k8s-orbstack-ha-homelab.git

The deploy scripts handle everything — Vault bootstrap, PKI setup, and certificate issuance are all automated. After deployment, the deploy script adds a host entry for vault to the Mac’s /etc/hosts, so you can open http://vault:8200 directly in your Mac’s browser to explore the Vault UI and inspect the PKI hierarchy. For cluster access, ssh jump from your Mac gives you kubectl and the Vault CLI.

The full Ansible roles are under ansible/roles/ in every repo. Read them — they’re well-commented and each task has a descriptive name. Understanding what the automation does is the whole point of building it the hard way.

Wrapping Up

A 3-tier CA hierarchy with separate Kubernetes, etcd, and front proxy CAs isn’t overkill — it’s how production clusters should manage certificates. Vault makes it automatable, auditable, and rotatable. The three Ansible roles (vault-bootstrap, vault-pki-setup, k8s-certs) turn what would be a manual, error-prone process into a repeatable pipeline that works identically across UTM, Vagrant, and OrbStack.

If certificates in Kubernetes have ever felt like magic — or worse, like an obstacle to be bypassed with --insecure-skip-tls-verify — this project is designed to make them understandable. Every certificate has a reason. Every CA has a scope. Every trust boundary has a purpose.

The source code for all three projects is on GitHub: UTM · Vagrant · OrbStack. Star the repos if you find them useful, and feel free to open issues or PRs.

Big tech, small lab. One reel at a time.