This is a followup of the earlier post detailing my homelab. See Part 1 here.

PS: I realize the title mentions 2024H1, but this post is being shared in 2025. I assure you that not much has changed in my homelab over the past six months — just as my tendency to procrastinate remains consistent.

In the previous post, I went over the hardware components which make up the homelab. Here, I will go deeper into the software bits which make the magic happen.

As mentioned earlier, all IaC related to this post lives in this repo: shikharbhardwaj/infra.

Software overview

Given my familiarity with Kubernetes from my day job as a Software Engineer specializing in the Infrastructure space, most of the services running in the Homelab are hosted as Kubernetes applications.

Exceptions to this are services which had some technical limitations/complexities when hosted in the Kubernetes world. A couple of examples:

  • Jellyfin: Need to have the HW accelerated encoding which needs iGPU passthrough.
  • TrueNAS: Needs to have the HBA passed through to work properly, also a prerequisite for the cluster to work.
  • Omada SDN controller: This is a prerequisite for the cluster to work.

Infrastructure components

Hypervisor

All the bare-metal nodes run Proxmox VE, which uses KVM for virtual machines and LXC (Linux Containers) for container-based virtualization.

Some key reasons to virtualize things instead of running stuff bare metal:

  1. Backup, restore and migration of application between cluster nodes is handled well using tools built into Proxmox.
  2. Major software upgrades are easier to do with things like VM templates and the option to one-click restore to the last working state if needed.
  3. Turnkey LXC containers, which have good integration into Proxmox VE, are great time savers.

As mentioned in the previous post, each of the 3 nodes in the cluster runs a VM which acts as a Kubernetes node. Besides these 3 VMs, there are some others which are needed for auxiliary services and the NAS software, which I will detail out below.

Here’s a list of VMs and containers currently running in the cluster

Node Type Name Usage
arete VM tenzing-01 Kubernetes node
perseus VM tenzing-02 Kubernetes node
thor VM tenzing-03 Kubernetes node
arete VM omada-controller SDN controller
arete LXC jellyfin Jellyfin host
thor VM truenas-01 NAS host
thor VM orion Maintenance VM

Here’s an overview of the Proxmox cluster.

Kubernetes cluster

I use K3s to setup a simple 3 node Kubernetes cluster. I went with K3s as it offered the best tradeoff between being lightweight and features which I needed.

Here’s an excellent summary from a blog post I read while making this choice.

Feature minikube kind k3s
runtime VM container native
supported architectures AMD64 AMD64 AMD64, ARMv7, ARM64
supported container runtime Docker, CRI-O, containerd, gvisor Docker Docker, containerd
startup time initial/following 5:19 / 3:15 2:48 / 1:06 0:15 / 0:15
memory requirements 2GB 8GB (Windows, MacOS) 512 MB
requires root? no no yes (rootless is experimental)
multi-cluster support yes yes no (can be achieved using containers)
multi-node support no yes yes
project page minikube kind k3s

TrueNAS

TrueNAS is an open source NAS OS based on OpenZFS. All persistent volumes in the Kubernetes cluster are backed by a TrueNAS instance, exported over NFS or iSCSI.

I use democratic-csi to connect TrueNAS to the Kubernetes cluster. There are 2 different installations of the provided helm chart which provide one storageclass each, one for using NFS shares and the other one for using iSCSI shares.

Initially, I set this up with NFS as the only storageclass as that is relatively simpler compared to iSCSI, but some applications (for eg. those which relied on an embedded sqlite db) ran into problems when using NFS.

In all such cases, only one pod was supposed to access the underlying DB which should be fine according to this sqlite FAQ, but in practice I ran into multiple issues where applications would slow down/get stuck making DB queries with errors like “database is locked”. One possible explanation might be that the application itself might be accessing the sqlite db in multiple threads/ processes which does not play nice with NFS locking.

Any persistent volume created on the Kubernetes cluster has a corresponding dataset and share created within TrueNAS using the TrueNAS API (this is the functionality implemented by democratic-csi). Each NFS dataset is backed up to Backblaze B2 using a nightly backup. I don’t have a good backup strategy for the iSCSI zvols right now, that is something I have to look at. Here’s the structure of the datasets in TrueNAS:

└── main-pool
    ├── ix-applications # TrueNAS internal dataset
    └── live
        ├── baykal      # Main SMB/NFS share for media and backups
        ├── vostok      # Kubernetes NFS PVs
        └── winnipeg    # Kubernetes iSCSI PVs

Omada SDN

Omada is TP-Link’s Software-Defined Networking (SDN) platform. The promise is: using compatible devices and a “controller”, one can store all the networking configurations in software and recreate/restore them on a new device or site as needed. Another similar system is Ubiquiti’s UniFi. A key difference between the two is the cost of the devices, and correspondingly the quality of the products (both hardware and software).

As my system is relatively simple, and the cost of a Unifi gateway is approximately 5x that of the TP-Link device, I decided to go with Omada. So far, no complaints.

Portainer on a Digitalocean droplet

As mentioned in the previous post, I have a small cheap Digitalocean droplet which acts as a reverse proxy to allow for fast direct connections for devices outside of my local network.

Additionally, it also runs a Portainer instance which hosts some other software:

  • Uptime Kuma: Uptime monitoring and status page
  • netdata: Hardware monitoring
  • authelia: SSO for web services

Applications

As mentioned above, all the applications are deployed as Kubernetes artifacts. The two main packaging tools I have used are:

  • Kustomize, with its overlays and patches to deploy slightly different variants of the same app.
  • Helm, for applications which were already well packaged within Helm.

Eventually, I would want to move everything over to Helm to manage CD for all apps via ArgoCD. But for now, I have a on-merge GitHub action that deploys the Kubernetes manifests to the cluster.

Most of this setup is optimized for quickly trying out a new app and having it available at an address like myapp.cluster.domain.com, so it might not have the best practices as defaults in most places.

Kustomize

The most recent list of apps deployed via kustomize is here.

I have scripts and Makefiles setup to allow for the following workflow for deploying a new app:

# Bootstrap a new app
~ ./tools/bootstrap --template-vars image=<IMAGE TAG> <APP NAME>

# Deploy the app
# NOTE: needs a local Bitwarden vault to be unlocked if referencing any secrets
~ make deploy app=<APP NAME> variant=<VARIANT>

Helm

Helm has emerged as the de-facto package manager for Kubernetes applications. Coupled with tools like ArgoCD, it makes customizing and managing application installations relatively straightforward.

For the homelab, I have some software installed manually via helm install and others managed in ArgoCD, shown below.

Maintenance

One key missing piece of work I have to get right in here is ways to do unattended maintenance of most things. Given that the base system has remained relatively unchanged for more than a year at this point, I think now is a good time to start investing in tooling to make the maintenance as hands-free as possible.

Here’s a non-exhaustive list of things that need to be kept relatively up to date for various reasons (security, bugs, support windows for unattended upgrades etc).

Infrastructure

  • Proxmox VE
  • TrueNAS
  • VM templates (Ubuntu)
  • K3s (kubernetes versions)
  • Portainer
  • OmadaSDN

Applications

Upgrades for Helm charts are relatively straightforward (as long as the upgrade does not involve a breaking change).

Kustomize templates on the other hand would require some more custom tooling to have this happen smoothly. I have looked at solutions like Watchtower and Diun, but I would want something that goes through the IaC instead of overriding images directly in the cluster.

Conclusion

With this, we’ve come to the conclusion of this virtual tour of my homelab! I will probably write more detailed posts on specific pieces as I build them going further.