Building a Homelab Kubernetes Platform (Part 1

For years I've run a homelab with services I actually use myself — Vaultwarden for passwords, AWX for automation, Harbor as my own container registry, a handful of other things on top. The problem I always had was keeping track of everything. Which VM was hosting what, which config had drifted from the last known-good state, how to rebuild when something inevitably broke.

That's where Ansible comes in. It's a tool I've come to love, and in this homelab it plays two roles at once:

My IaC — Infrastructure as Code. It takes three clean Rocky Linux VMs to a working Kubernetes cluster with one command, repeatably.
My CaC — Configuration as Code. It deploys every platform service on top of that cluster from a Helm chart and a Jinja values template, with all the config pulled from a single vars file.

I chose it because it's what I'm most familiar with — I use it heavily at work, and that experience is honestly what gave me the idea for this lab in the first place. Most of what you'll see here follows the pattern Ansible teaches: don't repeat yourself, make things dynamic, make things reusable. Helm's templated values are what makes that possible on the Kubernetes side.

Over time the lab turned into a proper little platform — Cilium CNI with Hubble, a shared Gateway API fronting every app, cert-manager signing internal certs, Harbor for images, AWX for automation, Kyverno for policy, Vaultwarden for secrets, kube-prometheus-stack for metrics, Falco for runtime security, and a Cloudflare Tunnel for the one or two things that need to be public. Every bit of it lives in a single Git repo. If I nuke the VMs tomorrow, I can rebuild from clean snapshots by cloning the repo, filling in my secrets, and running the playbooks. That's the whole story.

This post is the first in a three-part series. Here I'm going to walk through how the repo is organized, the one reusable Ansible role that deploys every Helm-based service, and finish with a concrete end-to-end: deploying Vaultwarden. Later parts will cover the cluster bootstrap (Cilium, Gateway API, cert-manager) and the observability + security stack (Prometheus, Falco).

If you want the visual first — here's the full architecture diagram. Hover any arrow to trace a flow; click any box to see what the service does and how it wires into everything else.

What Ansible brings to this setup

Three things in particular carry a lot of the weight here:

It runs code on remote machines over SSH. That's the whole OS-prep phase — installing containerd, disabling swap, patching sysctls, joining nodes to kubeadm. Agentless means I'm not running another daemon on every node just to configure them.
The kubernetes.core collection is solid. kubernetes.core.helm, kubernetes.core.k8s, and kubernetes.core.k8s_info handle 95% of what I do against the API server. I almost never drop down to kubectl apply from a playbook.
Vault. ansible-vault lets me keep an encrypted YAML file sitting next to the plaintext config in the same repo. No sops, no external secret manager, no out-of-band steps. Rotating a secret is "edit, re-encrypt, git push." Which honestly might not be the best, but it works for me, I am definetely researching more options like integratins a secrets manager but thats for the future.

Repo layout

Here's the whole thing:

k8s-kickstart/
├── ansible.cfg                              # roles_path, inventory, SSH defaults
├── inventory                                # control_plane + workers (FQDNs)
├── site.yml                                 # top-level play: roles for each group
│
├── group_vars/all/
│   ├── config.yml                           # non-sensitive vars
│   └── secrets.yml                          # ansible-vault encrypted
│
├── k8s-kickstart/                           # role: bootstrap RHEL node → kubeadm cluster
├── k8s-helm-deploy/                         # role: install Cilium via Helm
├── helm_chart/                              # role: deploy any Helm chart (reusable)
│   └── templates/                           #   one values template per service
│
└── playbooks/
    ├── configure-coredns.yml                # patch CoreDNS with internal zone
    ├── configure-node-dns.yml               # dnsmasq on control plane as a port-53 front-end
    ├── deploy-local-path-provisioner.yml
    ├── deploy-cert-manager.yml
    ├── deploy-metallb.yml
    ├── deploy-gateway-api.yml
    ├── deploy-awx.yml
    ├── deploy-harbor.yml
    ├── deploy-kyverno.yml
    ├── deploy-vaultwarden.yml
    ├── deploy-prometheus-stack.yml
    ├── deploy-falco.yml
    └── deploy-cloudflared.yml

A few things worth calling out:

site.yml is the cluster bootstrap only. It runs the k8s-kickstart role against every node (OS prep, kubeadm) and k8s-helm-deploy against the control plane (Cilium CNI). Everything else — every application, every piece of the platform — is a standalone playbook under playbooks/.
There are no per-environment vars. group_vars/all/ is the one source of truth. If I had a staging cluster I'd use a separate inventory and a separate config.yml, but for a single homelab that's overkill.
Every deploy playbook calls the same role — helm_chart. That's the piece I want to dig into next, because once you understand it, every playbook in the repo is ~20 lines of "here's the chart, here are the values."

The helm_chart role

This is the one reusable bit. Every platform service I deploy — AWX, Harbor, Kyverno, Vaultwarden, Prometheus, Falco, Cloudflared — goes through this role. I wrote it once, and now adding a new Helm-based service is copy-paste-edit.

Here's helm_chart/defaults/main.yml:

---
# Required vars: helm_chart_name, helm_chart_ref, helm_repo_name, helm_repo_url

helm_release_name: "{{ helm_chart_name }}"
helm_namespace:    "{{ helm_chart_name }}"
helm_create_namespace: true
helm_chart_version:    ""
helm_release_state:    present
helm_wait:             true
helm_wait_timeout:     "10m"

# Inline values dict
helm_values: {}

# Plain values files (passed through as-is)
helm_values_files: []

# Jinja-templated values files — rendered before being passed to helm
# Each item is a string path to a .j2 file in the role's templates/
helm_values_templates: []

# Where rendered templates land (on the Ansible controller)
helm_render_dir: "{{ playbook_dir }}/.rendered/{{ helm_release_name }}"

# Workloads to rollout-restart after the helm release is applied (only
# when helm reports changed=true). Kubernetes does not re-read Secrets /
# ConfigMaps in running pods, so a values change that only touches a
# Secret (e.g. UI credentials consumed via envFrom) is invisible until
# the pod restarts. Each item: {kind, name, namespace?}.
helm_rollout_restart: []

And the task file (helm_chart/tasks/main.yml), in order:

Assert the required vars are defined.
If any Jinja templates are passed in, render them into .rendered/<release>/. This is how every service gets its values file: I write a .yml.j2 that references vault vars and config vars, and the role renders it with the current inventory's context before handing the path to helm.
Combine the static values files with the rendered templates into a single list.
Add the Helm repository.
Run kubernetes.core.helm — this is the actual deploy, registered as _helm_release so we can check whether anything changed.
If helm_rollout_restart is non-empty and _helm_release.changed, patch each listed workload with a fresh kubectl.kubernetes.io/restartedAt annotation to force a rollout.

The relevant chunk of tasks/main.yml:

- name: Deploy Helm release ({{ helm_release_name }})
  kubernetes.core.helm:
    name:               "{{ helm_release_name }}"
    chart_ref:          "{{ helm_chart_ref }}"
    chart_version:      "{{ helm_chart_version | default(omit, true) }}"
    release_namespace:  "{{ helm_namespace }}"
    create_namespace:   "{{ helm_create_namespace }}"
    state:              "{{ helm_release_state }}"
    wait:               "{{ helm_wait }}"
    wait_timeout:       "{{ helm_wait_timeout }}"
    values:             "{{ helm_values }}"
    values_files:       "{{ _helm_values_files_combined }}"
  register: _helm_release

# Only restart when helm actually applied a change — a no-op upgrade means
# Secrets/ConfigMaps are identical, so there's nothing stale to pick up.
- name: Rollout-restart workloads so refreshed Secrets/ConfigMaps are picked up
  kubernetes.core.k8s:
    state: present
    definition:
      apiVersion: apps/v1
      kind: "{{ item.kind }}"
      metadata:
        name:      "{{ item.name }}"
        namespace: "{{ item.namespace | default(helm_namespace) }}"
      spec:
        template:
          metadata:
            annotations:
              kubectl.kubernetes.io/restartedAt: "{{ lookup('pipe', 'date -u +%Y-%m-%dT%H:%M:%SZ') }}"
  loop: "{{ helm_rollout_restart }}"
  when:
    - helm_rollout_restart | length > 0
    - _helm_release.changed | default(false)

That last task is the scar from a real incident. I was updating the Falco UI credentials (the password lives in the vault, falcosidekick-ui reads it from a Secret via envFrom), and I could not for the life of me figure out why my new password wasn't working. The Secret in the cluster had the right value. The pod's env vars had the old value. envFrom is read once, at pod start — Kubernetes does not re-read it when the underlying Secret changes. So now any chart that consumes config via envFrom declares its workloads in helm_rollout_restart and they get bounced on every changed deploy.

The _helm_release.changed gate matters. A no-op Helm upgrade means nothing has actually changed in the values, so there's no stale config to pick up — no point bouncing pods. This keeps the role idempotent.

End-to-end: deploying Vaultwarden

Let's look at a real service. Vaultwarden is a Bitwarden-compatible password manager; I run it in the cluster and publish it over a Cloudflare Tunnel (a public password manager behind the Gateway API isn't a great idea). The chart is guerzon/vaultwarden.

1. Variables

Non-sensitive settings in group_vars/all/config.yml:

vaultwarden_namespace: vaultwarden
vaultwarden_chart_version: "0.31.6"
vaultwarden_hostname: "vault.{{ cluster_domain }}"
vaultwarden_storage_class: local-path
vaultwarden_data_size: "5Gi"
# Domain Vaultwarden advertises to clients — MUST match the public URL
# users type in browsers, otherwise Bitwarden clients refuse to log in.
vaultwarden_domain: "https://{{ vaultwarden_hostname }}"

And the one secret, in the vault-encrypted group_vars/all/secrets.yml:

vaultwarden_admin_token: "<argon2 hash generated by vaultwarden/server>"

2. Values template

helm_chart/templates/vaultwarden.yml.j2:

---
image:
  tag: latest

# ADMIN_TOKEN gates the /admin UI — stored in Ansible vault
adminToken:
  value: "{{ vaultwarden_admin_token }}"

# DOMAIN must match the public-facing URL (Bitwarden clients validate it).
domain: "{{ vaultwarden_domain }}"

# Trust X-Forwarded-For so rate limits see real client IPs (cloudflared
# adds it; the Gateway preserves it for internal requests).
ipHeader: "X-Forwarded-For"

service:
  type: ClusterIP
  port: 80

storage:
  data:
    name: data
    size:       "{{ vaultwarden_data_size }}"
    class:      "{{ vaultwarden_storage_class }}"
    accessMode: ReadWriteOnce

ingress:
  enabled: false   # cloudflared hits the Service directly

Every {{ ... }} in that template is just a variable lookup against the inventory context at render time. The role renders it, the rendered file lands in .rendered/vaultwarden/vaultwarden.yml, and Helm gets a plain values file — no secrets in the repo, no custom plumbing.

3. Playbook

playbooks/deploy-vaultwarden.yml is the glue:

---
- name: Deploy Vaultwarden
  hosts: localhost
  connection: local
  gather_facts: false

  pre_tasks:
    - name: Create vaultwarden namespace
      kubernetes.core.k8s:
        name: "{{ vaultwarden_namespace }}"
        api_version: v1
        kind: Namespace
        state: present

  roles:
    - role: helm_chart
      vars:
        helm_repo_name:        guerzon
        helm_repo_url:         https://guerzon.github.io/vaultwarden
        helm_chart_name:       vaultwarden
        helm_chart_ref:        guerzon/vaultwarden
        helm_chart_version:    "{{ vaultwarden_chart_version }}"
        helm_namespace:        "{{ vaultwarden_namespace }}"
        helm_create_namespace: false
        helm_values_templates:
          - templates/vaultwarden.yml.j2

That's the whole deploy. Seven lines of actual config passed to the role. The role handles rendering the values template, adding the chart repo, running the Helm upgrade, waiting for the release to be ready, and (if we'd listed any) rollout-restarting workloads.

4. Running it

ansible-playbook playbooks/deploy-vaultwarden.yml --ask-vault-pass

After that:

The Vaultwarden pod is up with a PVC on local-path.
In the Cloudflare dashboard, I add a public hostname pointing vault.<my-domain> at http://vaultwarden.vaultwarden.svc.cluster.local:80. That's the only manual step, and it's a one-time dashboard click.
The cloudflared connector (deployed separately by deploy-cloudflared.yml) picks up the route automatically and I can log in from my phone.

The whole pattern — config var, values template, seven-line playbook — is how every service in this cluster gets deployed. Falco's 40 lines instead of 7 because it has post-deploy HTTPRoute wiring, but the core deploy is the same shape.

What's in Part 2 and 3

Part 2 will cover the actual cluster bootstrap — the k8s-kickstart role that takes three clean Rocky Linux 10 VMs to a working kubeadm cluster, then adds Cilium with kube-proxy replacement, Hubble, and Gateway API. That's where the interesting decisions are: why skip addon/kube-proxy, why pin Cilium to experimental Gateway API CRDs, how the single wildcard cert + shared Gateway pattern saves you from per-app ingress nightmares.

Part 3 will walk through the observability and security layer — kube-prometheus-stack, Falco with the modern-eBPF driver, Kyverno for policy, and how I ended up needing the rollout-restart mechanic in the first place.

If you want to steal the pattern, the template version of the repo — vars stripped out, .example files for your secrets and inventory, a CONFIGURATION.md inventorying every tunable — is up on GitHub: FrancoCarrera1/k8s-kickstart-template. Fork it, drop in your own vars, and point it at your nodes. Questions, corrections, or "have you considered X instead of Y" takes — leave a comment.

Bonus: Tailscale and HA

Some stuff I didn't decide to include is that I'm doing all of this over Tailscale, which is built on top of the WireGuard protocol (a VPN I've self hosted in the past using Pi-VPN). It's not needed but I would just like to include that I am using it in this lab. The QUICKSTART.md in the github repo goes over it, but again it's not necessarily needed. I just wanted to bring it up because it's a new piece of technology I've come across and so far I LOVE IT. Magic DNS is amazing. And the setup was super simple.

Another thing I'd like to point out: I'm actually using Tailscale for HA, so I have 2 extra nodes somewhere else joined to the cluster. In the project I'm using a Windows VPS with Hyper-V, but that is not a requirement. If you wanted you could use 3 bare-metal servers running K8s. You just have to make sure that the master node with CoreDNS is one of the DNS servers your computer uses, so it can route you to the Gateway API mappings of your Platform Services.

Building a Homelab Kubernetes Platform (Part 1 — Ansible)