Thanks for the insight. I've tried Proxmox before, and as much as I appreciate w...

pas · on Oct 12, 2023

Ceph is ... well, it's an amazing journey. Especially if you were there when it started, and watched as each release made it more capable & crazier.

From the earily days of trying to remember WTF the CRUSH and "rados" acronyms actually meant and focusing on network and storage concerns and [0] ... architectural optimizations [1] ... a RedHat acquisition ... adjusting to the NVMe boom [2] ... and then Rook, probably one of the first k8s operators trying to somehow operate this underwatere beast in a sane manner.

If you are interested in it ... set it up manually once (get the binaries, write config files, generate keyrings, start the monitors, set up OSDs, RGW and you can use FileZilla or any S3 client). For some more productionish usage there's a great all-in-one docker image https://quay.io/repository/ceph/ceph?tab=tags .

[0] oh don't forget to put the intra-cluster chatter on an internal network, or use a virtual interface and traffic shaping to have enough bandwidth for replication and restore, and what to do if PGs are not pairing and OSDs are not coming up]

[1] raw block device support, called BlueStore, which basically creates a GPT partitioned device, with ~4 partitions, stores the object map in LevelDB - and then later RocksDB

[2] SeaStore, using the Seastar framework, SPDK and DPDK, optimize everything for chunky block I/O in userspace

sgarland · on Oct 12, 2023

> I prefer a more hands-on approach ... which is why these k8s-native solutions are appealing

"K8s-native" here implies Rook, which is in no way hands-on for Ceph.

> as long as ... the solution is simple enough to manage without a special degree

Ceph is not simple, that's my point. I assume you've read the docs [0] already; if not, please do so _in their entirety_ before you use it. There are so many ways to go wrong, from hardware (clock skew due to BIOS-level CPU power management, proper PLP on your drives...), to configuration (incorrect PG sizing, inadequate replication and/or EC settings...) and more.

I'm not trying to dissuade you from tackling this, I'm just saying it is in no way easy or simple. Statements like "k8s-native solutions" always make me cringe, because it usually means you want to use an abstraction without understanding the fundamentals.

To be clear, I have read the docs, set it up on my own, and decided I didn't want to try to manage it. I've ran a ZFS pool for a few years on Debian and shifted over to TrueNAS Scale last week; not because I was unable to deal with ZFS' complexity (knock on wood, the only [temporary] data loss I ever had was an incorrect `rm -rf`, and snapshots fixed that), but because of continual NFS issues. I may yet switch back; I don't know - I just no longer had the time to troubleshoot the data layer. Ceph makes ZFS looks like child's play in comparison.

[0]: https://docs.ceph.com/en/latest/

imiric · on Oct 12, 2023

By "k8s-native" I meant solutions that were built from the ground up with k8s in mind. Rook is built on Ceph, and as such requires knowledge and maintenance of both, which seems like much more difficult than managing something like Longhorn.

I think you misunderstood me. I'm not inclined to give Ceph/Rook a try because of its complexity, regardless of the state of its documentation. I was refering to Proxmox in my previous comment, in the sense that I don't need a VM/container manager/orchestrator with a pretty UI. If I'm already running k3s, I can manage the infrastructure via the CLI and IaaC, and removing that one layer of abstraction that Proxmox provides is a positive to me. Which is why adding just a storage backend like Longhorn to a k3s cluster seems like the path of least resistance for my use case.

Ultimately, I don't _want_ to deal with the low-level storage details. If I did, I'd probably be managing RAID, ZFS, or even Ceph. For my current NAS I just use a single JBOD server with SnapRAID and MergerFS. This works great for my use case, but I want to have pseudo-HA and better fault tolerance, and experiment with k8s/k3s in the process.

So I'm looking for a k8s-native tool that I can throw a bunch of nodes and disks at, easily configure it to serve a few volumes, and that it gives me somewhat performant, reliable, and hopefully maintenance-free, block or object-level access. Persistent storage has always been a headache in k8s land, but I'm hoping that nowadays such user friendly and capable solutions exist that will allow me to level up my homelab.

sgarland · on Oct 12, 2023

Ahhh, yes, I misunderstood - sorry!

Also, I feel obligated to ask “are you me?” at your stated homelab journey. I also went from mergerfs + SnapRAID, albeit shifting to ZFS. I also went with k3s, but opted for k3os which is now dead and thus leaving me needing to shift again (I’m moving to Talos). Finally, everything in my homelab is also in IaC, with VMs buiit by Packer + Ansible, and deployed with Terraform.

Happy to discuss this more at length if you have any questions; my email is in my profile.