Hacker News new | past | comments | ask | show | jobs | submit login
Why I'm building a Home Lab with K8s on Raspberry Pis (iamsafts.com)
43 points by s3rg4fts 3 months ago | hide | past | favorite | 44 comments



For my home lab I started with k3s on VMs which I eventually migrated to a k3s with etcd for HA. I added Raspberry Pi nodes to force myself to deal with multi-architecture builds of my own code in Jenkins and deployments. Some of the Pi's have only wifi and some Ethernet so that got me into node affinity for deploying workloads. At some point I added some bare metal Intel machines to the mix. So now whether it's a Pi, VM, or bare metal machine all I need to do is a base OS install with ssh server, add it to my Ansible inventory and it will be up on my cluster in a few minutes.


I've found Raspberry Pis to be too unreliable for my homelab needs due to constant crashing and random performance drops. Instead, I bought two used 2012 Mac Minis with 16GB RAM, 1TB on eBay for about $120 each. I installed Proxmox on them, and it’s been a much better experience. It's nice not having to deal with ARM. They look decent and, most importantly, are super quiet. So can even use them in the living room. Absolutely zero issues with this setup—boring hardware wins.


This sounds interesting. Can you please elaborate on how you are using Proxmox ?


I'm using Proxmox primarily to host a mix of VMs and containers that serve various purposes around my home and some utility software / local dev & testing servers. HomeAssistant, Plex, Pi-hole, Monica. Backup once a day to Google Drive.


It's certainly fun to mess with the hardware, but my personal preference would be to use VMs.

It's cheaper, in my opinion, to get a refurbished mini desktop from Lenovo or whatever. They're "real" PCs, and I've found the raspberry pi hardware limitations to be onerous. It's just really nice to have SATA, NVMe, and PCI Express lanes/slots. For the price of a couple of PIs, you can get a powerful CPU with loads of RAM.


I listened to this kind of preaching and got myself a Lenovo ThinkCentre with Ryzen 5. What everyone forgets to mention is that these are loud. Fans are small and they spin fast. And fanless used mini pc market is kind of dry.


Huh, that sucks I'm sorry you've had a bad experience. I hadn't considered fan noise, the "Lenovo ThinkCentre M900 Tiny" I got is mostly silent... I probably don't stress it very much!


Plenty of fanless Intel atom/celeron mini PCs out there no?

Also i think you can use ryzenadj to set a very low tdp and disable the fan?


I use a fanless industrial pc for my desktop. That wasn't my intention when I got it (otherwise I would have gotten something better than a celeron processor) but I've been happy with it.


As others have said, this depends on your luck and / or the particular model you've got.

My HP EliteDesk 800 G4 has an annoying fan even at the lowest speed. It rattles. The newer G6 model I have at work is quiet.


Assuming an average workload… devices like the m80q, m920q are not loud at all. Maybe the computer needs cleaning or is overheating for some other reason.


my thinkcentre is completely silent. maybe I was lucky?


"Silent" to one person can be annoyingly noticeable to another person. You often see this kind of back and forth in discussions of fan-having devices.


The relatively exotic Cortex ARM, along with reliability, challenges with USB & microSD, and (at least here in AU) cost & availability challenges for the 'larger' boards, really put me off heading too eagerly down the Pi path.

A few weeks ago I picked up an ex-enterprise server with 256GB ECC RAM, 24/48 Xeon cores, and a handful of disks - for AUD$1000 (~ 600 USD)

Yes, it's a noisy box, and yes 2.5" SAS disks are expensive & small - so you want storage separately - but the same applies to a RPi or NUC based lab.

It draws about 100W while idle (and in a home lab environment it takes some effort to make it sweat)

Proxmox installed and runs like a dream.

I'm sticking with Nomad over k8s (or variants) for container orchestrator across several Debian VM's in the one box -- so I'm effectively relying on ECC, dual-PSUs, and hardware RAID5 for my ersatz HA.


I am currently streaming my work on migrating HomelabOS from a docker-compose based implementation to a k8s (actually k3s) based implementation. With Longhorn backing it, I'm pretty excited by the possibilities. It will stay as generic as HomelabOS is, so it will be deployable on anything from a pi on up to big cloud machines.

I won't be doing any 3d printing or anything like that though, so what the author here is doing looks fun in it's own right!


Wasn't aware of HomelabOS, looks pretty interesting!

How is your Longhorn performing? I tried setting it up on my nodes but with Gigabit networking and possibly the Pis pretty average CPUs I would get pretty awful performance both in distributed volumes (with replicas etc) but also on strict-local ones (for reasons I haven't yet figured out).

I am now considering using something like https://github.com/rancher/local-path-provisioner, since I mainly intend to use Longhorn for DBs that handle fault tolerance / backups etc on their own.


My experience with longhorn, or rook/ceph, or any other StorageClass solution that enables read write many is that eventually, despite all the "self-healing" and "highly available" claims, it all explodes at some point.

Volumes become unmountable, pools stay yellow forever, desynchronization happens... Perhaps (most probably even) I just don't know how to configure them properly, but maybe that is telling of the difficulty to operate/maintain these solutions.

What I took away from this is to try my most to never deploy anything that requires shared persistent volumes. If I need something stateful, it needs to speak to a database or S3 backend, or something that handles redundancy at some other level than filesystem. If I really really need to have a local volume, I'll use local-path-provisoner like you said, which means pinning the pod to a single node, but really that is a concession I am willing to make to not deal with ceph/rook.

Great writeup, it's like I'm reading about my own journey managing kubernetes clusters!

Best of luck


I haven't gotten to performance testing just yet, that should happen pretty soon. Thanks for the local-path-provisioner recommendation, I'll definitely give it a go!


Good luck! This is pretty useful for benchmarks https://github.com/longhorn/kbench?tab=readme-ov-file


I've done the same thing with Pi 3Bs/4s basically for the same reasons. I definitely ran into the limitations of the SDCards and set up USB drives and NAS storage class for better performance. I ended up running my own docker registry on my NAS and running all of the images through that before deploying to the k3s cluster. I also hooked up container scanning and automated it the deployments through ansible.

Things I would do differently are using NixOS or bootable containers (CentOS) (side note, bootable NixOS container would be a killer app) and writing my own helm charts instead of fully customizing my manifests and doing the deployments from ansible, and would recommend against Raspberry Pis for the compute as the 3's and 4's don't support limits, e.g. cpu or ram limits, and I wasn't able to set up firecracker containers correctly on the Pis.

I'm also exploring hyperconvergence infrastructure (HCI) as that seems more like my ultimate goal for homelab stuff.


I think the 3/4s have a lot of limitations indeed. The 5s are a bit more powerful, so I'm expecting your experience would be better.

Wasn't aware of NixOS, looks pretty interesting but I'm not sure about how easy / reliable it'd be to run it on a Pi 5 (https://wiki.nixos.org/wiki/NixOS_on_ARM/Raspberry_Pi_5). I'll be keeping an eye on it though!

As far as Helm vs Ansible, I'm using Ansible to deploy the basics (bootstrap control plane & worker nodes, networks plugin) and then everything is deployed with IaC (Pulumi) which installs Helm releases.


I built mine before the 5 was released. I ended up running Ubuntu server on the nodes and configure them all using ansible playbooks (installing tailscale, k3s, updates, OS tweaks, etc.). I started looking at helm but there is so much inconsistency using community helm charts. I think writing my own would have been a better approach instead of templatizing my manifests and playbooks and doing it that way, however it is very easy to stand up a new service (assuming it's only 1-2 pods). If I end up DRY'ing my deployments it could end up being not too bad as a distinct deployment method from IaC or helm.

How do you like Pulumi? It seems similar to AWS CDK...


If you're templatizing manifests, kubespray does this pretty well I think. At least for the basics, it's pretty helpful so far. But indeed, I'm looking into deploying more things with Helm if possible.

Most services I've been using so far offer official Helm charts. But I get your point, it can be cumbersome and if there isn't an official one, then they can be pretty undocumented / hard to work around.

I haven't used CDK, but the concept is definitely similar. I think Pulumi most likely has wider support, since it's based on Terraform and even if you don't have a provider available on Pulumi you can "port it" (although never tried it, not sure if it works well). I like how it stores the state for you and secrets as well, saves quite a bit of trouble.


If you're looking at HCI Proxmox is amazing for homelabbing, with a 3 node cluster or 2 nodes with a qdevice. I use PBS for automated backups and RAIDZ and while there's quite a learning curve and initial setup time actual cluster maintainance is pretty hands off.

With enterprise gear it becomes outdated and then you replace it at your leisure, whereas with consumer equipment it dies so you need to replace it. The disvantage of that is the noise and power consumption so it's a tradeoff you need to consider.


I've heard of Proxmox before, will definitely check it out! What is PBS? I have RAIDZ but have not explored backup options just yet.


PBS is the Proxmox Backup Server, basically it does differential backups of your VMs, integrity checks, and allows you to back up to multiple different servers. Proxmox does have a basic backup system that can save the VM images to disk though but I typically recommend the full PBS if you're building a cluster.

One interesting thing you can do is run PBS as a giant VM and back up or migrate the whole thing, just like you can run a small NAS as a VM as well.


I've been using a Raspberry Pi to serve as a kubernetes master, with a few Intel NUCs to run as the workers, and I've loved it. The whole setup is small and cheap to run; I installed everything into a switch depth rack in my basement. I'm now trying to figure out how to add some GPU to the cluster, and I'm undecided whether to build a switch depth 3U server or go another route. Most people I've talked to with homelabs are running a full depth rack, so they can easily accommodate full size equipment, but I have size constraints so I can't fit something that large.


> These servers were difficult to maintain, monitor, upgrade etc and a number of solutions were developed by system administrators to work with them. It all though felt very cumbersome

There’s a lot of irony saying this about bare metal vis-à-vis k8s. Maintaining individual servers isn’t difficult, doing it at scale with high availability and the requirement of upfront investment (hardware, colo, staffing, etc.) is. Doing k8s at scale isn’t a walk in the park or a cheap date either, though.


It most definitely is not, you are right! I think the main difference IMO is that by paying that price with k8s I at least have something that resembles an application platform and I can easily ship containers, deploy my app and not deal with hardware so much as I'd have to do in the past deploying to servers.

But, as mentioned, I only see it as a developer. My end product with k8s resembles something that's closer to my development tools than what I'd have maintaining a series of servers and using other tools to deploy apps onto.


> does require a fair amount of maintenance

Cargo culting the processes required for extreme use cases outside of that context is only good for learning to get paid money to do it in those contexts in the future. In and of itself it is very silly. For a human person a process running on an OS on running on a computer is far, far less work, maintenance, and complexity with better performance and longer lifetime.


We all never find nirvana if we don't try. Maybe someone does really get it slick and right.

Eventually this system probably will become quasi stable, sustaining. And for some folks, replicable & practicable. The author will probably soldier through a good amount of the pain then it'll tick along. Doom isn't certain. (But it is probable!)


> a process running on an OS on running on a computer

Such a process can be deployed/backed up/restored in minutes by anyone with basic sysadmin skills and can scale to serve nontrivial user bases while costing much less than SaaS.

Classic sysadmin is seriously underrated.


labs are for experiments


Ugh, good luck.

I tried doing a homelab with six Nvidia Jetson Nanos using K8S, maintained it for about a year, and I have no desire to ever do that again. I ended up just buying a single rack mount server and using that for two years, and now I bought a mini gaming computer which I use as a single server. Maintaining the k8s cluster was becoming a second job that actually costs me money, that I enjoyed less and less every day, and it made me dread actually using any aspect of my server, meaning that when something broke it would take me a long time to actually muster up the strength to fix anything. My home server runs a Transmission server, Jellyfin, Apache Kafka, Apache Spark, Cassandra, and RabbitMQ, with 32 gigs of RAM, and it works fine.

Distributed systems are cool and they're fun to play with, but the combinatorial explosion of maintenance shouldn't be underestimated. If you're making something that needs to serve 10,000+ users, then it's probably worth it, but homelabs generally aren't that. Generally a homelab situation has like a Plex/Emby/Jellyfin server, a torrent server, a reverse proxy, maybe some kind of message queuing solution, and it generally only has like four concurrent users.

Obviously if your goal is to learn K8s, then doing it with a bunch of Raspberry Pis isn't a bad idea at all, and try to have fun doing it. However, I would warn anyone thinking that they're going to make their NAS a k8s cluster, you're likely going to regret it. I recommend buying a slightly beefier computer and just installing NixOS or something.


I do dread that exact scenario you're mentioning. I know it is a possibility but I'm hoping that should this day arrive I'll be able to put it to rest and hopefully have learned stuff along the way.


You should absolutely give it a try, and maybe you'll have better luck than me, and it's a fun little experiment if nothing else.

If you get frustrated and don't want to do it anymore, you might also look at Docker Swarm. For homelab stuff, I found it considerably easier to work with. I'm not sure how well it would work with hundreds of services, but for a homelab I found it considerably less frustrating to deal with than K8s. It can still be useful to play with distributed systems and it's a bit more intro-friendly in my opinion.

It's something that I think most engineers should try at least once, if nothing else to learn firsthand the annoyances of distributed computing. It's easy to read about these things in textbooks and blog posts and university courses (and you should!), but at least for me these things didn't really sink in until had to fix broken stuff, or deal with weird heartbeat issues, or see things that should go N times faster actually go slower because I didn't take into account latency etc.


I have pretty much landed on a similar simple solution for my homelab. A minipc with dietpi + docker + dockge + bunch of homelab apps using docker compose. Important config backups on a S3 compatible bucket in cloud + photos backup on connected USB HDD.


A generic plea: please stop using AI-generated images in blog posts that don't add any real value to the article.


I did a less ambitious version of this a while back. Docs here: https://docs.google.com/document/d/12TT49VgyPRSH7F4b_oC5rOv1...

I opted for bundlewrap over Ansible/TF/etc, and keepalived over MetalLB.

I also didn’t add any kind of real storage or networking: they’re just using WiFi.

After I got it set up and working, and was forced to read up on the high-level k8s concepts, I never really did anything much with it again, but the learning was valuable. :-)


OT

kinda annoying that on this blog post, if i press the up arrow cursor button or page up button it takes me to the top of the page , instead of scrolling up

the page down and cursor down button work as expected

--

kinda found the bug, when you first open the blog, the cursor stays on the top site navigation menu, as long as the cursor is up there, page up or cursor up will take you to the top of the page

click anywhere on the page that will change the cursor position, and now page up and cursor up work as expected


I went with bare metal initially and regretted it. VMs are much easier to manage, especially when you don't have much storage space.


And than your network stops working for couple of days, and you are @#£#&. Or your backups did not work for couple of months, and you find out the hard way!

I have some hardware for fun and learning, but it is well isolated. Using K8s for home infrastructure is bad idea.


Whatever technology you use, it doesn't dispense you from monitoring and making it resilient. K8s isn't different from anything else in that regard.


With K8s on bunch of Pis I have dozens of layers, machines, network connections to monitor.

With single server, complexity is much much lower. I installed my server years ago, once a month I check logs and do updates (10 minutes of work). Some time next year I will upgrade to Ubuntu 24.04 and replace SSD (just in case), about 1 hour of work.

K8s is really horrible when it comes to time efficiency!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: