Building My Homelab Docker Stack

At some point the collection of services I rely on daily stopped being a hobby and became infrastructure. I’m not entirely sure when that happened. It probably had something to do with realising that every photo I’d taken in the last decade was locked behind a subscription, every book I owned lived in someone else’s app, and the only DNS I could trust was one I ran myself. Self-hosting scratched all of those itches at once, and the result is a NAS running a stack of Docker containers that handles media, DNS, authentication, CI/CD, and monitoring for my whole home network.

The Hardware

Most of this hardware has been sitting in my possession since 2019 — the CPU, GPU, RAM, and NVMe drives were all part of the original build. The only new additions for this stack are the five 8 TB drives.

The machine is built around an AMD Ryzen Threadripper 2970WX, a 24-core workstation CPU that was already overkill when it was new and is genuinely never the bottleneck for anything running here. It’s paired with 64 GB of RAM, which sounds like a lot until you’ve got a dozen containers running alongside a PhotoPrism AI indexing job.

The GPU is an NVIDIA RTX 2080. It earns its place primarily through Jellyfin’s NVENC hardware transcoding, but it’s sitting on 8 GB of VRAM and there are other things I’d like to do with it over time.

Storage is layered across several tiers:

5x 8 TB SATA HDDs (new) — four data drives pooled via mergerfs, one dedicated parity drive for SnapRAID
Cache SSD — faster SATA solid state for working data and container config
Secondary NVMe — hot data, database files, and services that benefit from low latency
Root NVMe — the OS drive, kept separate from everything else

The GPU was the game-changer for media. Without it, Jellyfin transcodes in software and hammers the CPU; with NVENC, streams to multiple devices at once barely register.

Storage is split into three tiers, each a named mount:

/mnt/cache/ for fast working data: config files, thumbnails, database files, app state
/mnt/storage/ for the main library: photos, media archive, downloads, books

That tiering isn’t accidental. Hot data stays on NVMe, cold data lives on spinning rust, and containers only get write access to what they actually need.

Storage: mergerfs and SnapRAID

The four SATA data drives are XFS formatted and pooled together using mergerfs, which presents them as a single mount point to the rest of the system. From the perspective of every container and every compose file, there’s one storage pool, not four separate drives to keep track of. mergerfs handles the distribution of writes across the underlying drives transparently. The fifth drive is the parity drive for SnapRAID.

SnapRAID is not traditional RAID. It doesn’t mirror writes in real time. Instead, it takes a snapshot of the data across your drives and computes parity from that snapshot. If a drive fails, you can recover the lost data by running a restore against the remaining drives and the parity. The trade-off is that anything written between the last sync and a drive failure is unprotected. For a media library that changes infrequently and where losing a day of new additions is acceptable, that’s a reasonable trade.

snapraid-runner runs via cron daily, automating the sync and scrub cycle. Sync updates the parity file to reflect any changes since the last run; scrub checks a percentage of the data against the existing parity to catch silent corruption. Running it daily means the parity is never more than 24 hours stale, and the scrub pass catches bit rot before it becomes a problem.

The result is four drives that look like one, with parity protection and daily verification, without the write penalty or complexity of a real-time RAID solution.

From Host Services to Containers

Until recently, both Nginx and Jellyfin ran directly on the host. Nginx was installed as a system package and managed with systemd; Jellyfin was the same. It worked, but it was awkward to update, awkward to reconfigure, and the config files lived in system directories rather than somewhere version-controlled and reproducible.

The move to containers happened in one go, motivated by wanting to make the whole stack easier to manage long-term. Jellyfin became a compose service with its GPU passthrough, volume mounts, and network config all declared in one file. Nginx was replaced by Nginx Proxy Manager running in its own container, which added a proper web UI for SSL cert management and proxy host configuration — things that previously meant editing nginx config files by hand and running nginx -t and hoping for the best.

The immediate benefit was that updating either service is now docker compose pull && docker compose up -d. The config is in the repository. Rollback is possible. There’s no drift between what’s running and what’s documented. It’s the same reason to containerise anything, but when you’re actually doing it on services you rely on daily, the improvement is pretty tangible.

One Compose File Per Service

The one decision that’s made everything easier to reason about is keeping each service in its own directory with its own docker-compose.yaml. There’s no monolithic compose file. Each service can be started, stopped, updated, and debugged independently without touching anything else. cd jellyfin && docker compose up -d is the full deployment. It makes the repository readable at a glance and means a broken update to one container can’t cascade into an unplanned restart of unrelated services.

The trade-off is that services on separate bridge networks can’t reach each other by container name. Where services genuinely depend on each other (Pi-hole and Unbound share a dns_network bridge, PhotoPrism and MariaDB are co-located in the same file), they share a network. Otherwise, containers communicate via host.docker.internal or fixed port assignments. It’s slightly more verbose, but a lot more deliberate.

Media Stack

Jellyfin is the centrepiece. It runs with runtime: nvidia and NVIDIA_VISIBLE_DEVICES=all, which hands it full access to NVENC hardware encoding. The media library lives on /mnt/storage mounted read-only; config and cache land on /mnt/cache/jellyfin/. Hardware transcoding means streams to multiple devices at once without a CPU spike. It’s exposed on port 8096 and sits behind the reverse proxy on its own subdomain.

PhotoPrism is where the photo library lives, backed by MariaDB 11. The originals are on /mnt/storage/Photos; thumbnails, sidecars, and search indexes go to /mnt/cache/photoprism/. PhotoPrism’s AI features (face recognition, object classification, scene detection) all run from TensorFlow on CPU, and they’re all enabled. The MariaDB instance is tuned conservatively: 128MB InnoDB buffer pool, READ-COMMITTED isolation, and a healthcheck that Compose uses before PhotoPrism is allowed to start. Imports land in a staging directory under /mnt/cache/photoprism/imports/ before being moved into originals, which keeps the main library clean.

Kavita handles the e-book and comics library, pointing at /mnt/storage/Books. Its config lives on NVMe (/hdd/nvme2/kavita/config) for faster library scans. Simple, fast, does exactly what it says.

DNS: Pi-hole and Unbound

Pi-hole handles local DNS with ad-blocking. Rather than pointing it at an upstream provider like 1.1.1.1 or 8.8.8.8, I’m running Unbound as the recursive resolver. Pi-hole’s upstream is unbound#53, and Unbound talks directly to authoritative nameservers. No third-party DNS provider sits between my network and the root servers.

Both containers share a dns_network bridge so Pi-hole can reach Unbound by container name. Pi-hole binds port 53 on both TCP and UDP. The NET_ADMIN and NET_RAW capabilities are there because Pi-hole needs raw socket access for DHCP and low-level network operations. The dns_network bridge keeps the two containers together without exposing them to anything else on the host.

Security: Authelia SSO

Every service sits behind Nginx Proxy Manager, and most of them sit behind Authelia as well. Authelia provides SSO with TOTP-based two-factor auth, and the access policy is tiered deliberately.

The default_policy in the Authelia config is deny, which means anything not explicitly listed is blocked. Services are then carved into two groups. The bypass list covers things that have their own auth or are designed for frequent access: Jellyfin, Kavita, PhotoPrism, and the Authelia portal itself. Everything else requires authentication, and for the sensitive stuff (Portainer, Gitea, Beszel, NPM, Pi-hole, the Docker registry, Healthchecks), that means full TOTP two-factor.

Sessions are scoped to the .sweetffa.com cookie domain with a short expiration and inactivity timeout, with a “remember me” option for convenience. Brute-force protection is enabled. User storage is file-based with Argon2id password hashing; session and storage state live in SQLite on /mnt/cache/authelia/. Notifications go out via Gmail SMTP.

Reverse Proxy: Nginx Proxy Manager

NPM is the front door. It holds ports 80 and 443, terminates SSL (Let’s Encrypt certs stored at /mnt/cache/npm/letsencrypt/), and routes subdomain traffic to the right container by port. The admin UI runs on a separate high port, accessible on the local network but not part of the public-facing config.

Services use high, non-conflicting ports internally, so NPM can always reach them on localhost without needing containers on a shared network. It’s a simple and predictable pattern.

CI/CD: Gitea, the Runner, and a Private Registry

This is the part I’m most pleased with. Gitea is the self-hosted Git server, running on git.sweetffa.com. Alongside it runs a Gitea act runner (the gitea/act_runner image), which listens for workflow jobs, binds the Docker socket, and executes CI pipelines. Runner config and data are on NVMe at /hdd/nvme2/gitea/runner/.

The private Docker registry runs registry:2 on port 5000 with htpasswd authentication. Images built during CI get pushed here and pulled from here during deployment. No DockerHub dependency for private images, no rate limit surprises.

This site (andymorrell.net) deploys through this pipeline. The Gitea runner builds the Astro site, packages it into a Docker image, pushes to the registry, and the NAS pulls and runs the updated container. The whole thing runs on hardware I control, with credentials I manage, with no third-party CI service in the loop.

Monitoring: Beszel and Healthchecks

Beszel is the system monitoring dashboard. The agent uses the henrygd/beszel-agent-nvidia image specifically, which enables NVML-based GPU metrics alongside the standard CPU/memory/network stats. It has access to all eight block devices: six SATA drives (/dev/sda through /dev/sdf) and both NVMe drives (/dev/nvme0 and /dev/nvme1). The SYS_RAWIO capability is needed for S.M.A.R.T. data on SATA, and SYS_ADMIN covers the NVMe S.M.A.R.T. queries. Extra filesystems (mnt/storage, mnt/cache, hdd/nvme2) are tracked for disk usage on top of the default root.

Healthchecks covers uptime monitoring and alerting. Each service in the stack pings a unique Healthchecks URL on a schedule, and if a ping is missed, an alert goes out via Gmail SMTP. It’s a simple heartbeat model and it works well for services that don’t have native health endpoints.

What’s Still Rough

The Gitea compose file is empty. Gitea itself runs but I haven’t committed the compose file yet, which is a bit embarrassing given the whole point of that service is version control. That’s on the list.

PhotoPrism’s AI indexing is all on CPU, and it shows. Running a full library scan on a large photo collection is slow. GPU acceleration for TensorFlow would fix it, but PhotoPrism’s GPU support is still maturing and I haven’t wanted to take the instability risk yet.

The monitoring setup catches downtime but doesn’t yet do much with trends or alerting on degraded performance. Beszel has the data; I just haven’t wired up thresholds and notifications in a way I’m happy with.

None of that is a complaint. This stack has been running solidly, the Authelia SSO layer has saved me from re-entering credentials constantly, and having a GPU in the NAS continues to feel slightly ridiculous in the best possible way.

What’s Next

I’m not entirely sure. The stack covers everything I actually need right now, which is a strange feeling.

The RTX 2080 has 8 GB of VRAM and it spends most of its time doing nothing outside of Jellyfin. Running a small local LLM is the obvious next step — something like a quantised Llama or Mistral model via Ollama, privately hosted, no API keys, no data leaving the house. The VRAM is enough for a capable 7B or 8B model at a decent quantisation level, and having that sitting on the local network is genuinely useful rather than a novelty.

Beyond that: better monitoring thresholds, the missing Gitea compose file, and probably something I haven’t thought of yet that will seem completely obvious in six months.