I have always wondered how a database and its persistence is handled in a contai...

johannes1234321 · on July 16, 2023

The way you handle persistence is by using storage volumes mounted form the outside. Don't put the data in the container, only the software, which then can be replaced. Then managing is similar to other environments. For updating you don't replace rpm/deb packages and restart, but replace the container using the same volume, which should trigger the dbms-specific update routine. Similarly you do backups similar to outside container world (depending on DBMS and your exact choice, from filesystem snapshot to logical backup (dump) or something more or less smart in between)

Alifatisk · on July 17, 2023

So all the containers access the same volume on the host? Does that scale well or will it bottleneck at one point?

Or is that when you have a layer of cache inbetween like Redis?

johannes1234321 · on July 17, 2023

No, not the same volume, a volume. Each container can mount an arbitrary amout of volumes. It scales as well as your machine scales.

A container is nothing but a process with restrictions on which filesystem subtrees kt can see, what resources (CPU, memory) it may use and which networks it can access, with some tooling to manage self-contained images of directory structures.

justinclift · on July 16, 2023

With PostgreSQL containers used in docker-compose, the common approach is to use a bind mount so the database files are persisted on the host.

I've not done stuff with kubernetes yet though, so I have no idea how it's done there.

johannes1234321 · on July 16, 2023

> I've not done stuff with kubernetes yet though, so I have no idea how it's done there.

Essentially the same, except that K8s gives you a wide variety of storage backend integrations (Storage classes + storage providers) which can attach "anything" (local volumes on the node, NFS, NAS, Cloud Volumes, ...) depending on your local environment and needs.

tempest_ · on July 16, 2023

Are people putting their databases in K8s now? I thought the old rule of thumb was don't do that but perhaps it has changed.

Feels like the database would take a massive performance hit using network backed storage unless the software is aware of that fact.

freedomben · on July 17, 2023

A lot of people running on prem k8s clusters have block storage. When I worked on open shift it wasn't uncommon for people to run databases in the cluster, backed by their block storage.

mschuster91 · on July 16, 2023

If you're running in the cloud, say on AWS EKS, it makes sense to use in-cluster databases for development environments, and only use RDS DB's for production/integration to save on hosting costs.

williamdclt · on July 16, 2023

You take the risk of not catching RDS (or network) quirks in staging, but that’s a trade-off

johannes1234321 · on July 16, 2023

There is a huge push for doing that. Whether it is abstract the right thing can be questioned, but many IT departments decided to standardize around Kubernetes for all datacenter management and push that way and in some environments (5G networking) it's part of the specified stack.

hotpotamus · on July 16, 2023

Saving it on one host's local filesystem doesn't feel particularly production-ready. There is a distributed store system for Kubernetes called "Longhorn" that I've heard good things about, but I haven't really looked into it much myself. I just run a pair of VMs with a manual primary/replica setup and have never needed to fail over to the replica yet, but I can imagine some sort of fully orchestrated container solution in the future.

justinclift · on July 16, 2023

Heh Heh Heh

I'm just pointing out how it's commonly done. Of course people add things like replication, distributed filesystems, (etc) to the mix to suit their needs. :)

hotpotamus · on July 16, 2023

Yep, it seems like the most common answer is “pay exorbitant prices to your cloud provider for a managed SQL database”, but we’ve managed to save a chunk of money running it on our own. I’ve always said that between three engineers(me being one of them), we can form one competent DBA, but our needs are also pretty modest.

Kab1r · on July 16, 2023

I've had bad luck with longhorn, but I have heard good things about using Rook with Ceph for PVCs

bheadmaster · on July 16, 2023

Both Docker and Kubernetes can use volumes to provide persistant storage to containers/pods without bind mount.

justinclift · on July 16, 2023

Cool, thanks all. :)