etcd stores the entire state of your Kubernetes cluster. Every deployment, every secret, every config map, every node registration — all of it lives in etcd. If someone can read or write to etcd without authorization, they own your cluster. This is why etcd doesn’t use regular TLS — it uses mutual TLS (mTLS) on every connection, including the connections between etcd nodes themselves.
This post explains how mTLS works between etcd peers in the homelab clusters, why peer certificates are separate from server and client certificates, what the systemd unit flags actually do, and how to verify the whole thing is working. This is a deep dive into one specific part of the security architecture covered in Production-Grade Kubernetes Security on a Homelab Budget. For the full CA hierarchy behind these certificates, see the Vault PKI deep dive.
TLS vs. Mutual TLS
Regular TLS (like HTTPS) is one-way authentication. The server presents a certificate, the client verifies it, and the connection is encrypted. The server has no idea who the client is — any client that trusts the server’s CA can connect.
Mutual TLS adds the second direction. The client also presents a certificate, and the server verifies it. Both sides know exactly who they’re talking to. If either side presents an invalid certificate (expired, wrong CA, revoked), the connection is refused.
For etcd, this distinction is critical. Without mTLS, anyone who can reach etcd’s port (2379 for client traffic, 2380 for peer traffic) could connect and read the cluster state. With mTLS, the connecting party must present a certificate signed by the etcd CA — and the etcd CA is separate from the Kubernetes CA, so even valid Kubernetes component certificates can’t authenticate to etcd. This CA separation is covered in depth in Why Kubernetes Needs Three Separate CAs.
The Three Certificate Types in etcd
etcd uses three distinct certificate types, each serving a different purpose. All three are issued by the same CA — the dedicated etcd CA (pki_etcd in Vault) — but they’re separate certificates with different SANs, key usages, and deployment locations.
Server certificates identify the etcd node to incoming clients. When the Kubernetes API server connects to etcd on port 2379, the etcd node presents its server certificate. The API server verifies it against the etcd CA bundle. Each etcd node has its own server certificate with SANs specific to that node — its hostname (etcd-1, etcd-2, etcd-3) and its IP address. The server certificate is also used for the healthcheck endpoint.
Peer certificates are used exclusively for etcd-to-etcd communication on port 2380. When etcd-1 connects to etcd-2 for Raft log replication or leader election (see Understanding etcd Quorum for how Raft works), both nodes present their peer certificates and both verify the other’s. This is the mTLS handshake — peer-to-peer, both sides authenticated. Peer certificates have the same node-specific SANs as server certificates, but they’re a separate certificate with a separate private key.
Client certificates are presented by services that connect to etcd as clients. The most important client is the Kubernetes API server, which uses an etcd client certificate to read and write cluster state. The healthcheck client certificate is used by monitoring and health probes. Client certificates don’t have node-specific SANs — they identify the client’s role, not a specific machine.
Here’s where each certificate lives on disk after the k8s-certs Ansible role deploys them:
# On each etcd node (/etc/etcd/pki/)
etcd-server.crt / etcd-server.key # Server cert — presented to clients on port 2379
etcd-peer.crt / etcd-peer.key # Peer cert — presented to other etcd nodes on port 2380
etcd-ca.crt # etcd CA bundle — used to verify incoming certs
healthcheck-client.crt / .key # Healthcheck client cert
# On each master node (/etc/kubernetes/pki/)
etcd-client.crt / etcd-client.key # Client cert — API server uses this to connect to etcd
etcd-ca.crt # etcd CA bundle — API server uses this to verify etcd
Notice that masters get the etcd client certificate and the CA bundle, but not the server or peer certificates. Workers get neither — they never talk to etcd directly. This is the principle of least privilege applied to certificate distribution.
Why Peer Certificates Are Separate
A common question: why not use the same certificate for both server and peer connections? The etcd node is the same machine in both cases.
The answer is about trust boundaries and the principle of least privilege. Peer certificates authenticate nodes to each other for Raft consensus. Server certificates authenticate nodes to clients for data access. These are different trust relationships.
If you used the same certificate for both, and that certificate were compromised, the attacker could both impersonate the node to clients (serving potentially modified data) and participate in the Raft consensus (injecting writes, disrupting leader election). With separate certificates, a compromised server certificate lets an attacker impersonate the node to clients, but they can’t join the peer mesh. A compromised peer certificate lets them participate in replication, but they can’t serve the client API. Neither scenario is good, but each limits the blast radius.
There’s also a practical reason: the --peer-client-cert-auth flag enables a separate verification path for peer connections. etcd can enforce that peer connections present certificates with specific properties (like being signed by the peer CA) independently from client connection requirements. This separation gives you finer-grained control over who can do what.
The systemd Unit Flags
The etcd systemd unit in the homelab clusters (UTM, Vagrant, OrbStack) has a dense set of TLS-related flags. Here’s what each one does.
Client-facing TLS (port 2379):
--cert-file=/etc/etcd/pki/etcd-server.crt--key-file=/etc/etcd/pki/etcd-server.key--trusted-ca-file=/etc/etcd/pki/etcd-ca.crt--client-cert-auth=true
--cert-file and --key-file are the server certificate and private key. etcd presents these to any client connecting on port 2379. --trusted-ca-file is the CA bundle used to verify client certificates. --client-cert-auth=true is the flag that enables mutual TLS — without it, etcd would accept any TLS connection without verifying the client’s identity. With it, the client must present a certificate signed by the trusted CA.
This is the mTLS enforcement for the API server → etcd connection. The API server’s --etcd-certfile and --etcd-keyfile flags point to the etcd client certificate, which is signed by the same etcd CA that --trusted-ca-file trusts.
Peer-facing TLS (port 2380):
--peer-cert-file=/etc/etcd/pki/etcd-peer.crt--peer-key-file=/etc/etcd/pki/etcd-peer.key--peer-trusted-ca-file=/etc/etcd/pki/etcd-ca.crt--peer-client-cert-auth=true
--peer-cert-file and --peer-key-file are the peer certificate and private key. When etcd-1 connects to etcd-2 for Raft replication, etcd-1 presents this certificate. --peer-trusted-ca-file is the CA bundle used to verify incoming peer connections. --peer-client-cert-auth=true is the flag that enforces mutual TLS on peer connections — without it, any node that can reach port 2380 could join the replication stream.
Notice that both the client and peer configurations use the same CA file (etcd-ca.crt). That’s because all etcd certificates — server, peer, and client — are issued by the same etcd CA. The CA is shared, but the certificates are distinct. The separation is at the certificate level (different keys, different SANs, different extended key usages), not at the CA level.
The full set of TLS flags in context:
# From the etcd systemd unit (simplified for clarity)ExecStart=/usr/local/bin/etcd \ --name=etcd-1 \ --data-dir=/var/lib/etcd \ --listen-client-urls=https://192.168.64.21:2379,https://127.0.0.1:2379 \ --advertise-client-urls=https://192.168.64.21:2379 \ --listen-peer-urls=https://192.168.64.21:2380 \ --initial-advertise-peer-urls=https://192.168.64.21:2380 \ --initial-cluster=etcd-1=https://192.168.64.21:2380,etcd-2=https://192.168.64.22:2380,etcd-3=https://192.168.64.23:2380 \ --initial-cluster-token=etcd-cluster \ --initial-cluster-state=new \ --cert-file=/etc/etcd/pki/etcd-server.crt \ --key-file=/etc/etcd/pki/etcd-server.key \ --trusted-ca-file=/etc/etcd/pki/etcd-ca.crt \ --client-cert-auth=true \ --peer-cert-file=/etc/etcd/pki/etcd-peer.crt \ --peer-key-file=/etc/etcd/pki/etcd-peer.key \ --peer-trusted-ca-file=/etc/etcd/pki/etcd-ca.crt \ --peer-client-cert-auth=true
The URLs are all https:// — there’s no plaintext HTTP listener. The --listen-client-urls includes both the node’s IP and localhost (so local health checks work). The --initial-cluster flag lists all three peers with their https:// peer URLs. Step 15 in the UTM HA deep dive covers this systemd unit in the context of the full deployment flow.
The mTLS Handshake Step by Step
When etcd-1 connects to etcd-2 on port 2380 for Raft replication, here’s exactly what happens:
1. TCP connection. etcd-1 opens a TCP connection to etcd-2:2380.
2. TLS handshake begins. etcd-2 (the server side of this connection) sends its peer certificate (etcd-peer.crt) to etcd-1.
3. etcd-1 verifies etcd-2’s certificate. etcd-1 checks that etcd-2’s peer certificate is signed by the trusted CA (--peer-trusted-ca-file), that the certificate hasn’t expired, and that the SANs match the address being connected to (etcd-2’s IP or hostname).
4. etcd-2 requests a client certificate. Because --peer-client-cert-auth=true is set, etcd-2 sends a CertificateRequest message asking etcd-1 to authenticate itself.
5. etcd-1 sends its peer certificate. etcd-1 presents its own peer certificate (etcd-peer.crt) to etcd-2.
6. etcd-2 verifies etcd-1’s certificate. Same checks: signed by the trusted CA, not expired, SANs valid.
7. Session keys established. Both sides generate session keys for symmetric encryption. All subsequent traffic on this connection is encrypted and authenticated.
If any verification step fails, the connection is refused. No fallback to unencrypted, no warning-and-continue. etcd simply won’t talk to a peer it can’t verify.
Verifying mTLS on Your Homelab
These commands work on any HA cluster (UTM, Vagrant, OrbStack). SSH to the jump server first.
Verify that mTLS is enforced — try connecting without certs:
# This should fail — no client certificate providedssh etcd-1 'curl -s https://127.0.0.1:2379/health --cacert /etc/etcd/pki/etcd-ca.crt'# curl: (56) OpenSSL SSL_read: error:0A000412:SSL routines::sslv3 alert bad certificate# Without --cacert, it also fails (can't verify server cert)ssh etcd-1 'curl -s https://127.0.0.1:2379/health'# curl: (60) SSL certificate problem: unable to get local issuer certificate
Connect with valid certificates — mTLS in action:
# This succeeds — client cert + CA bundle providedssh etcd-1 'curl -s https://127.0.0.1:2379/health \ --cacert /etc/etcd/pki/etcd-ca.crt \ --cert /etc/etcd/pki/healthcheck-client.crt \ --key /etc/etcd/pki/healthcheck-client.key'# {"health":"true","reason":""}
Verify peer connectivity with etcdctl:
# Check all peer connections across the clusterssh etcd-1 'ETCDCTL_API=3 etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/etcd/pki/ca.crt \ --cert=/etc/etcd/pki/healthcheck-client.crt \ --key=/etc/etcd/pki/healthcheck-client.key \ endpoint status --cluster -w table'# Output shows all 3 nodes, their Raft terms, and leader status# Every endpoint uses https:// — no plaintext connections exist
Inspect the peer certificate details:
# View the peer cert's SANs and issuerssh etcd-1 'openssl x509 -in /etc/etcd/pki/etcd-peer.crt -text -noout | grep -A4 "Subject Alternative"'# Shows IP:192.168.64.21, DNS:etcd-1# Verify the peer cert chains to the etcd CAssh etcd-1 'openssl verify -CAfile /etc/etcd/pki/etcd-ca.crt /etc/etcd/pki/etcd-peer.crt'# etcd-peer.crt: OK# Compare: the server cert has the same SANs but is a different certificatessh etcd-1 'openssl x509 -in /etc/etcd/pki/etcd-server.crt -serial -noout'ssh etcd-1 'openssl x509 -in /etc/etcd/pki/etcd-peer.crt -serial -noout'# Different serial numbers — different certificates
Verify cross-CA rejection — prove Kubernetes certs can’t authenticate to etcd:
# Try to use a Kubernetes CA cert to verify the etcd server certssh master-1 'openssl verify -CAfile /etc/kubernetes/pki/ca.crt /etc/etcd/pki/etcd-server.crt'# Error: unable to get local issuer certificate# This proves etcd certs are from a different CA than Kubernetes certs# A compromised Kubernetes component cert can't be used to access etcd
What Breaks When You Get It Wrong
Understanding the failure modes is as important as understanding the happy path. Here’s what happens with common mTLS misconfigurations:
Wrong CA bundle. If you point --peer-trusted-ca-file at the Kubernetes CA instead of the etcd CA, peer connections fail immediately. Each node presents its peer certificate (signed by the etcd CA), but the other node tries to verify it against the Kubernetes CA — different trust chain, verification fails. The etcd cluster can’t form, and the Kubernetes API server has no backend store. Symptom: etcdctl endpoint health shows all endpoints unreachable, and the API server logs show connection refused or certificate signed by unknown authority.
Missing --peer-client-cert-auth=true. Without this flag, etcd accepts peer connections without verifying the connecting node’s certificate. The server still presents its certificate (so the connection is encrypted), but any node that trusts the CA can connect — there’s no mutual authentication. In a homelab, this might not seem dangerous, but in production it means a compromised machine on the same network could join the etcd cluster and participate in consensus.
Expired certificates. When a peer certificate expires, the mTLS handshake fails with a certificate has expired error. The affected node drops out of the Raft cluster. With 3 nodes, losing 1 to an expired cert leaves you with 2 — still quorum (see Understanding etcd Quorum). Losing 2 to expired certs drops you below quorum and the cluster stops accepting writes. This is why certificate rotation matters — and why the Vault-based approach (re-run the k8s-certs playbook to reissue) is so much simpler than manually regenerating with openssl.
SAN mismatch. If a peer certificate’s SANs don’t include the IP address or hostname that other nodes use in --initial-cluster, the TLS handshake fails with a hostname verification error. The connecting node sees a valid certificate signed by the right CA, but the name doesn’t match — so it rejects the connection. This is the most common issue when manually configuring etcd, and it’s why the Vault PKI roles generate certificates with node-specific SANs from the Ansible inventory.
Swapped server and peer certificates. If you accidentally put the server cert in the --peer-cert-file path (or vice versa), the connection may work if both certificates have the same SANs and compatible key usages. But you’ve broken the separation that limits blast radius. In practice, the extended key usage flags (server auth vs. client auth) may cause the handshake to fail depending on the TLS library’s strictness.
How Vault Generates These Certificates
The etcd certificates are issued by the k8s-certs Ansible role, which calls Vault’s PKI API for each certificate type. The Vault PKI deep dive covers the full role, but here’s the etcd-specific flow:
For each etcd node, the role issues three certificates from the pki_etcd engine:
# Server certificate (conceptually — the Ansible role uses the Vault HTTP API)vault write pki_etcd/issue/etcd-server \ common_name="etcd-1" \ ip_sans="192.168.64.21,127.0.0.1" \ alt_names="etcd-1,localhost" \ ttl="720h"# Peer certificatevault write pki_etcd/issue/etcd-peer \ common_name="etcd-1" \ ip_sans="192.168.64.21" \ alt_names="etcd-1" \ ttl="720h"# Client certificate (deployed to master nodes for API server access)vault write pki_etcd/issue/etcd-client \ common_name="kube-apiserver-etcd-client" \ ttl="720h"
Each Vault PKI role (etcd-server, etcd-peer, etcd-client) has constraints that control what certificates it can issue — allowed common names, IP SANs, DNS SANs, key usages, and maximum TTLs. Even if someone obtained the Vault token, they could only issue certificates that match the role’s policy. A role configured for etcd server certs won’t issue a certificate with a Kubernetes API server SAN.
The role deploys the resulting certificates, private keys, and CA bundles to /etc/etcd/pki/ on each node with the correct file permissions — private keys at 0600, certificates at 0644. This runs identically across UTM, Vagrant, and OrbStack — same role, different inventory IPs.
mTLS in the Broader Security Architecture
etcd peer mTLS is one layer in the project’s defense-in-depth security model. To tamper with etcd data, an attacker would need to:
1. Get past the bastion. SSH to the jump server, then SSH to an etcd node. No direct path from the Mac to any etcd node exists.
2. Obtain valid etcd client certificates. Even with SSH access to an etcd node, you need the client certificate and private key to authenticate to the etcd API. These are deployed with 0600 permissions, readable only by the service user.
3. Present certificates from the right CA. The etcd CA is separate from the Kubernetes CA. A Kubernetes component certificate (even the admin cert) can’t authenticate to etcd.
Each layer is independent. Bypass one and you still face the others. This is the same layered approach used in production Kubernetes environments — network segmentation, certificate-based authentication, CA separation, and least-privilege access.
Key Takeaways
Mutual TLS between etcd peers is what prevents unauthorized nodes from joining the consensus cluster and participating in Raft replication. The three certificate types (server, peer, client) serve distinct trust relationships, even though they’re all signed by the same etcd CA. The systemd flags --peer-cert-file, --peer-key-file, --peer-trusted-ca-file, and --peer-client-cert-auth are the configuration surface that enables all of this — miss any one of them and you’ve either broken the cluster or weakened its security.
The best way to internalize this is to run the verification commands on your homelab cluster. Check the certificates, try connecting without them, verify the CA chains, look at the serial numbers. Build an HA cluster (UTM, Vagrant, or OrbStack) and explore the TLS layer firsthand. The Learning Path maps the progression from Simple clusters (where you first see the components) to HA clusters (where the full mTLS architecture is deployed).
Big tech, small lab. One reel at a time.
Questions, corrections, or want to share how you’re using these repos?
labitlearnit@gmail.com
Leave a Reply