Hacker News new | past | comments | ask | show | jobs | submit login
Single dependency stacks (brandur.org)
225 points by jeffreyrogers on Feb 9, 2022 | hide | past | favorite | 107 comments



I'm a big fan of this approach, having built a Django monolith with the standard Celery/RMQ, dabbled in Redis for latency-sensitive things like session caching, and never hitting scale where any of those specialized tools were actually required (despite generating substantial revenue with the company).

One thing to note, if you use pq or another Postgres-as-queue approach, you should be aware of the work you'll need to do to move off it -- this pattern lets you do exactly-once processing by consuming your tasks in the same DB transaction where you process the side-effects. In general when using a separate task queue (RMQ, SQS, etc.) you need to do idempotent processing (at least once message semantics). A possible exception is if you use Kafka and use transactional event processing, but it's not serializable isolation.

This is probably a reason in favor of using a Postgres task queue initially since exactly-once is way simpler to build, but just be aware that you're going to need to rethink some of your architectural foundation if/when you need to move to a higher-throughput queue implementation.


Your whole first paragraph literally describes our situation exactly, same stack and all. Classic premature optimisation.

It’s made it so clear to me that so much of what you read on HN about the latest and greatest scaling tricks are only relative to a tiny tiny fraction of business.


Getting good at solving problems with relational databases is a highly underrated skill.

They are really underutilized by many projects.

Not to say any other software is bad, just that keeping the stack simple can help small teams move quickly and not get bogged down fighting fires. Also, the paths to scale up the major RDBMSes are well documented and understood. With a newer service and many interacting systems in your back end you end up having to be a pioneer (which takes time away from implementing new features).


Absolutely. I've also seen my fair share of horrendous home-grown "ETL" programs that waste more time shuffling bytes to and from database with poor ORM queries in loops, that could be done with a couple half decent SQL queries.

Probably the most useful things for me was learning relational algebra in college, and having been thrown on the deep end on a team that was very SQL heavy (not withstanding attempting to debug Oracle PL/SQL syntax errors while pulling your hair out about a missing closing parenthesis -- of which isn't the problem).

The usual challenge seems to be fetching data from external services or performing complex business that may conditionally load things -- things that can be awkward in procedural SQL. At the end of the day, you're building a messy ad-hoc dependency graph that is being manually executed very inefficiently. Would be better to just have your code just describe the dependency graph and treat each value transparently as a promise/future and then have a separate engine execute it.

Anyhow, something something monads with lipstick, I digress...


ETL using ORM? Really?


Actual ETL systems would not, however, if you have a homegrown pile of scripts fulfilling the ETL role, I would not be surprised if those scripts would use ORM.


Agreed. Are there any resources you recommend for getting good at solving problems with relational databases?


FWIW, setting up a Redis server and setting up some basic caching middleware is pretty straightforward in my experience. Did this at my job a month ago in an afternoon.

I'd say the biggest overhead is adding Redis if you don't already have it, and that addition's difficulty will vary based on how you host Redis. We use Elasticache on AWS, so just a few clicks and set it in the same VPC.

I guess the real question comes down to how you feel about an extra moving part. Redis is probably the part of our system that has had the least hiccups (very much set and forget and no resource issues so far), but I can understand in the case where you'd rather not add more than a DB.

I'd say it's just as easy to setup as Postgres. Elastic search I hear is a pain, though I have no personal experience.


The pain is not initially setting it up, it’s in the ongoing maintenance. Redis is one of the less painful services to support, especially if using a managed version. But I don’t like the trend of defaulting to using Redis without really justifying it.


You're right for sure if the service is small - Redis would be overkill. I guess I'm coming at it from the perspective I'm most used to where we use it for data caching and session data since we have multiple servers, so handling it any other way would be more work.


Redis is a better default-without-thinking than Postgres IMO. Transactions and referential integrity make some big tradeoffs and it's easy to accidentally rely on them without thinking through the consequences. Global serializability is theoretically even worse, but the direct analogy with application-layer threading makes it easier to understand and avoid the pitfalls.


Funny, I think the exact opposite is the case. It's rare that an application needs more throughput than Postgres on modern hardware can provide. Unless you know you actually need that throughput, you're better off defaulting to Postgres and enjoy that referential integrity.


> It's rare that an application needs more throughput than Postgres on modern hardware can provide.

Agreed, but it's extremely common to need more resilience and/or better multi-DC support than Postgres can provide, and if you've implicitly built your datamodel on Postgres' guarantees then you might find it very hard to migrate to something more distributable. Frankly true master-master HA should have been table stakes for any serious datastore for at least the past decade, as soon as it became clear that cloud was a thing or even before that. (Redis also falls down on that front in terms of what it offers itself, but it's much easier to migrate to something more distributable, IME).


From my view, the issue in the case is not how difficult it is to setup Redis for caching (it is indeed just a couple of clicks/commands), but the new issues one has to deal with when resorting to caching things too prematurely, instead of making the app fast enough with minimal effort.


Setting up caching usually isn't the problem, it's invalidation and eviction that bites you.


https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock... describes the benefits of the above approach.

Something to bear in mind is that if you have a bug or crash in your task handler that causes a rollback, another worker will likely try to grab the same task again, and you might end up clogging all of your workers trying the same failed tasks over and over again. We use a hybrid approach where a worker takes responsibility for a task atomically using SKIP LOCKED and setting a work-in-progress flag, but actually does the bulk of the work outside of a transaction; you can then run arbitrary cleanup code periodically for things that were set as work-in-progress but abandoned, perhaps putting them into lower-priority queues or tracking how many failures were seen in a row.

Postgres is absolutely incredible. If you are at anything less than unicorn scale, outside of analytics where columnar stores are better (though Citus apparently now has a solution for this - https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...), and highly customizable full-text search (it's hard to beat Elastic/Lucene's bitmap handling), there are very few things that other databases can do better if you have the right indices, tuning, and read replicas on your Postgres database.


> just be aware that you're going to need to rethink some of your architectural foundation if/when you need to move to a higher-throughput queue implementation

I haven't seen many if any projects that didn't require some architectural rethinking over their lifetime. I have seen more, that where arguably over engineered in the beginning, but then never lived long enough to actually benefit from it.

Not saying everyone should use Postgres-as-queue for every project. But for a lot of projects it is going to be much harder to acquire the active user base generating the throughput Postgres can't handle, than doing continuous refactoring of the system to deal with the changing requirements.


Dependencies are not equal in this regard. For example in a corporate context, we have basically 1.5 people in Eastern Europe maintaining Redis for 5,000 engineers. Kafka is more like 15.


For Node folks interested in a postgres-based task queue, I find graphile-worker[1] to be pretty terrific. Docs make it sounds like it's only for postgraphile/postgrest but it's great with any Node app IMO.

[1] https://www.npmjs.com/package/graphile-worker


Note that you can approximate rate-limiting in Redis with Postgres' UNLOGGED tables [0]. They're a lot faster than regular tables since they don't write to the WAL. Of course, if you restart your server or if it crashes then you lose the table contents. But for stuff like rate-limiting you probably don't care. And unless you're using some kind of persistence in Redis it happens there also.

I tend to run this kind of stuff on a separate PG server so that the query velocity doesn't affect more biz-critical things.

[0] https://www.postgresql.org/docs/current/sql-createtable.html...


I hadn't seen UNLOGGED tables before, that's a really neat trick, thanks.


Nice, Redis is awesome but definitely something I've seen pulled in prematurely all the time.


> But for stuff like rate-limiting you probably don't care.

I guess you would care in this scenario, otherwise you have cascading failures (something makes pg crash, and the lack of rate limit allows the abuse to continue).

So implementing a rate limiter separate from the rest may make sense too. I like their idea of doing it on memory and keeping the load balanced, as it doesn’t rely on any dependency.


If you’re running a separate pg server, doesn’t that remove the main benefit of having everything under one system?


I think the "single dependency" was moreso about the technology stack, not the number of instances of whatever the technology is.


They mention that one of the reasons is having fewer things that can fail and bring down your system... two separate PG systems doubles the failure chance


If you don't maintain redundancy (i.e. more than one instance), you will be 100% certain to have a service outage at some point in your future.


Sure, but you still have double the chance of failure.. Two distinct PG clusters you depend on are twice as likely to have an outage as one.


You don't lose data on a clean restart.


Yes good point! Clean restarts will preserve the data.


Personally staking my own future on this "less is more" approach having seen some serious horror flicks in terms of app/infrastructure stacks the past few years.

What continues to surprise me: a lot of time and money has been or is being wasted on reinventing the wheel (speaking specifically about webdev here).

There are a ton of great tools out there that are battle-tested (e.g., my favorite find of late as someone just digging into hand-rolled—meaning no k8s/docker/etc—infrastructure is haproxy).


I've seen millions of dollars spent on infrastructure to support what Heroku could do for a few thousand a month. Not to mention the egregious dev heartache caused by having to work against it. Anyone who argued it was a waste of time and money just "didn't get it" apparently.

I'm a huge fan of all the cool container stuff, queues, stream processing, all the weird topologies for apps built with node, Golang. I'm with it, kids. But for an MVP, just use Heroku, GAE, a Droplet with SCP for god sakes.

If you need to do something more complicated, growth will tell you.


I worked for a very profitable small internet biz whose entire deployment was git-archive + rsync. I never had to troubleshoot that thing even once. Now it seems like everybody is playing golf trying to see how many AWS services they can use to unpack a gzip on a server.


> If you need to do something more complicated, growth will tell you.

+1. Needing to refactor & scale your infrastructure to enable more growth is almost always a "good problem to have".

You've steadily grown to $100mm revenue and your backend is starting to show it because you prioritized productivity over premature optimizations? Oh no, the world is ending! (said no one ever)


Sorry to nitpick, and I agree with almost the entire statement you made, but I don’t quite understand how Go fits in there. I chose it in large part because it sharply reduces dependencies. Just curious about your inclusion of it in that list.

Totally not trying to get into language wars or anything. Just want to understand your thinking.


I guess it doesn't fit in besides I've seen it be used for microservices where microservices don't apply


Wait until people learn about this hot new technology that revolutionizes web development:

A C# web application with a SQL DB.

That's it. Go home at 5 pm and spend time with your family in your $300k non-coastal-city house.


Working with asp .net core then intermittently jumping into different stacks in other projects is blowing my mind. Lots of "Node wants how many packages to do this?"


Simple manually deployed docker images have been a great win for us.

You get to declare all your dependencies in the docker build. All config is in one .env file.

Installs and roll backs are trivial.

Setting up new dev environments is easy.


What do you mean when you say "manually deployed docker images"?

It could mean that you build the images on one machine, then export the images as a tar files, copy those to the destination server, and then import the images.

Or it could mean that you copy the Dockerfile and any necessary context files to the destination server and run the Docker build there.

Or it could mean you still use a Docker registry (Docker Hub, AWS ECR, or a self-hosted registry), but you're manually running docker or docker-compose commands on the destination server instead of using an orchestrator like Kubernetes.

As for me, I've done pretty well with that last option. I still use either an in-house Docker registry or AWS ECR, but I haven't needed anything like Kubernetes yet.


I do the last thing as well! I essentially use docker compose files as glorified installer scripts.

Back in the day I would have installed whatever software comes in the container by hand, then set up a service with systemd / cron or whatever. Nowadays I just set the container to start at systemboot.

Works pretty well for me so far and unlike scripts the docker containers so far never failed to just set up and run.


Oh CI builds and pushes to a repo. The CI then generates a docker compose file and a default .env.

Installs are just a matter of copying over the docker compose. The running docker compose up in detached mode.

Edit: I didn't set this up. But it's amazing to use.


What other examples are there of "single dependency stacks"?

This article is really about the versatility and reliability of postgres.

And I'm all in agreement.

Reminiscent of:

https://www.craigkerstiens.com/2017/04/30/why-postgres-five-...

https://webapp.io/blog/postgres-is-the-answer/

http://rachbelaid.com/postgres-full-text-search-is-good-enou...

http://boringtechnology.club/

As much as HN could lead you astray with the hype of this and that tech, articles like the above are some of the most consistently upvoted on this website.


Also:

https://sive.rs/pg2 - Simplify: move code into database functions

https://sive.rs/pg - PostgreSQL example of self-contained stored procedures

some linked examples: https://github.com/sivers/store/tree/master/store/functions

I like this idea in theory ... although it would cause me to need to know a lot more SQL, which is a powerful but hostile language :-/

I care about factoring stuff out into expressions / functions and SQL fails in that regard ...

https://www.scattered-thoughts.net/writing/against-sql/

It's hard to imagine doing this with a ton of duplication. I have written SQL by hand and there are probably more confusing corners than in shell, which is saying a lot!


Depending on what database server you are on you might not he constrained to just SQL.

Now, PSQL isn't exactly all that friendly of a language either, but it does you to somewhat break stuff up into functions and reuse code across different SQL statements.


While it's not exactly the same, Elixir + Phoenix is really all you need for full-stack development.


I think this is the right idea. The pendulum between having as many dependencies as possible and having no dependencies at all has flung way too far in the 'as many dependencies as possible' side. It is a major PITA when yet another random component breaks. Let us say that A can be done in B in, let us say, three man weeks. I would say it is worth it. The advantage is that A will never break because it is not there. Note that A also may break 3 years from now when everybody who knows anything about A has left the company.... B is now used in more places so people are more likely to have been forced to learn it so when the emulation of A break there is a better chance that people will know what to do. I see mostly advantages.


There's definitely potential to go too far into monolith territory and misinterpret how simple your architecture actually is.

An example: Django backed by Postgres. I tend to view this as 1 architectural unit, i.e. Postgres is wholly embedded in Django. I am under no illusion that I have both a Django project and a PostgreSQL instance. I have a Django-backed-by-Postgres. I can have that PostgreSQL instance be a standalone interface, but that means increasing my architectural units from 1 to 2. Instead, if I want to integrate with Django's raw tables, I'm going to do it on Django's terms (via a custom HTTP API) rather than fighting the ORM over who gets to DBA the database. Bad for performance? No doubt. We'll worry about that when we get there.

Yes, you can run a web app server directly out of Postgres without an additional "app layer" like Django (Crunchy has some cool tools for this). But should you?

To be clear, I'm a big fan of KISS, just skeptical of false minimalism.


Agreed. This quote seems relevant: "Everything should be made as simple as possible, but not simpler."

The article talks about using rate limiting using Redis and dropping it in favor of handling it on each server node and assuming uniform distribution of requests. Doing that is a trade-off of precision rate-limiting for a simpler architecture.

That may be a good trade-off, but only if you can get away with it. If they were required to have more precise rate-limiting then the simpler architecture would not have been possible.

In my own work, I used Memcached instead of Redis for rate limiting data. The applications were coded to fall back to the per-node rate limiting if Memcached weren't reachable. Memcached may have been another dependency, but it was one of the less troublesome dependencies. I never experienced a problem with it in production. The fallback behavior meant that we didn't even need Memcached in a dev environment.

I guess my point is this: Not all dependencies are as troublesome as others.


What's the crunchy tool for this?


I believe they're referring to some tools like pg_tileserv which gives you a turnkey tile server on top of PostGIS. As it stands today we don't have anything to automatically run that app from Postgres itself (but stay tuned we might be launching something around that in just a few weeks). Tileserv is in an interesting category like many other turn key APIs or services (like PostgREST or Postgraphile) on top of a database, but I don't view them as different than say running a Django app for example.


I haven't django'd much, but to me a postgrest interface is very much different than having a bespoke app. With postgrest I can think of my app as interfacing with sql over a http translation layer, and things like authorization, data models, etc. are actually in the database.

I'm sure bespoke apps can be written to be as predictable and "database-like" as I found postgrest, but I haven't seen any.


I am so for this, being the sole developer in my company for the last 10 years I introduced far to many “moving parts” as it grew and I’m now going through the process of simplifying it all.

I love Redis but it’s next to go, currently used for user sessions and a task queue, both of which Postgres is more than capable of doing [0]. Also, as a mostly Python backend, I want to rip out a couple of Node parts, shrink that Docker image.

0: https://news.ycombinator.com/item?id=21536698


Apparently Tailscale for a long time just used a JSON file as their data storage, and moved from that to SQLite with a hot-swappable backup with Litestream [0], and hey they've done fine with that.

[0]: https://securitycryptographywhatever.buzzsprout.com/1822302/...


I kind of love this idea.

It reminds me of a redis use case we had at a former employer.

We had a cluster with a high double digit number of nodes that delivered a lot of data to various external APIs.

Some of those APIs required some metadata along with the payload. That metadata was large enough that it had made sense to cache it in Redis. But over time, with growth the cluster got large enough, and high volume enough that just fetching that data from Redis was saturating the 10Gbps Nic on the ElastiCache instance, creating a significant scaling bottleneck. (I don't remember if we moved up to the 25Gbps ones or not.)

But we could have just as easily done a local cache (on disk or something) for this metadata on each node and avoided the cost of the ElastiCache and all the scaling and operational issues. It would have also avoided the network round trips to Redis, and the whole thing probably would have just performed better.


This is why I love Ansible. As a DevOps enigneer I do not design or implement complex systems or programs. But I am responsible for the reliability of our systems and infrastructure. And Ansible is just pleasant to use for the same reasons stated by the author:

- a single packaged without any additional dependencies - no client side software - pure SSH - simple playbooks written in only YAML

Focusing on simplicity and maintainabilty has helped me deliver reliable systems.


Many can even get away with less: sqlite.

One less process to worry about.


I have nothing to disagree with here, but it's worth noting that his company Crunchy Data are themselves a postgres provider. So they, more then most, have the chops and incentive to do a great deal in postgres alone.

https://www.crunchydata.com/


This article makes me embarrassed. Not that they wrote the article, but that it had to be written.

A single web application with a single database is how everything should be built unless necessity demands more complexity.

It's the way people do things in the "uncool" enterprise world every single day.


There are so many benefits of keeping things as simple as possible.

  - easier troubleshooting
  - easier to maintain documentation
  - quicker onboarding of new devs
  - easier to migrate to new hosting if needed
  - quicker to add features (or decide not to add)


Zero dependency is better.

Your database does not need to be a separate process. It can just be a library embedded in your application.

If your application is distributed across several physical servers, then your database can be as well.


I'm interested -- would you be willing to develop on this a little?


For a single server, SQLite, or boltdb[0]

I've never had to scale horizontally. I develop in Go and you can get very far along with just vertical scaling (aka beefier hardware).

Therefore I can't give concrete examples of a distributed db-as-a-library.

But all that you need is to extend the functions that fetch data to not just fetch from disk but from "peers" as well. For this to work you need servers (instances) to know about each other, and as you add more they also get added to their peers - sort of like a bittorrent network. I don't think it's difficult to do.

SQLite might not be suited for being distributed (although RQlite[1] claims to have done it).

Making a distributed data storage based on boltdb[0] is probably more feasible.

Whatever the case, there's no reason why a data storage engine can't be a library, even if it's distributed.

[0]: https://github.com/boltdb/bolt

[1]: https://github.com/rqlite/rqlite


rqlite author here.

It's not quite what rqlite does. rqlite is a distributed database, and it uses SQLite as its storage engine. But it's not a drop-in, distributed, version of SQLite (but it's close, as SQLite is basically fully exposed).

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md


> normally I’d push all rate limiting to Redis. Here, we rate limit in memory and assume roughly uniform distribution between server nodes.

Dumb question: what do they mean by rate limiting in memory vs via Redis? Does that mean keeping track of request origins+counts using those storage mechanisms, or something else?


You can use an in memory LRU cache of request orgin+count. You can also periodically take that data and do an INCREMENT against your DB to get fairly scalable rate limiting.


I'm guessing process-local memory. Like a Python dict or something


This idea is super cool.

I am not sure it necessarily has to be just one single dependency, but keeping the number of dependencies as low as possible makes a lot of sense to me. At least the overhead of introducing any given new dependency should be taken into serious consideration and held against the concrete benefits that will be gained from it.

I wrote a blog post on a very similar subject, essentially all of the same arguments, but targeted more towards the dependencies and abstractions found within a given systems code structure and application architecture.

If you are interested, you can read it here: https://betterprogramming.pub/avoiding-premature-software-ab...


I’m a big fan of getting rid of the MVC frameworks too. Get down with PostgREST or Postgraphile on your control plane. Use Postgres to manage security, views, migrations, the whole nine yards.

On your data plane.. yeah queues on Postgres work too. Subscribe/notify are decent. Or go with using a streaming replication client to send out events to a queue and decouple downstream stuff like your analysis datastores, notification systems, policy monitoring, what have you.

Postgres can handle a lot. It’s an awesome piece of tech.


> 1 Okay fine, S3 too, but that’s a different animal.

I think people forget that AWS S3 isn't immutable. Unlike an EBS volume, it is impossible to "snapshot" S3 the way you can a database. There are arbitrary global limitations outside the scope of your control, and a dozen different problems with trying to restore or move or version all the things about buckets that aren't objects (although the objects too can be problematic depending on a series of factors).

If you want real simplicity/repeatability/reliability, but have to use S3, host your own internal S3 service. This way you can completely snapshot both the metadata and block devices used by your S3 service, making it actually immutable. Plus you can do things like re-use a bucket name or move it to any region without worrying about global conflicts. (All of that is hard/expensive to do, however, so you should just use AWS S3 and be very careful not to use it in a way that is unreliable)


My guess is that they use S3 mainly for things like backups, where you write once to a brand new key.

I'd be surprised if they were using mutable S3 objects that constantly get updated in-place. They have PostgreSQL for that!


> it is impossible to "snapshot" S3 the way you can a database

FWIW, they added point-in-time S3 snapshots via AWS Backup recently.


Have you seen how complicated it is? https://aws.amazon.com/blogs/storage/point-in-time-restore-f... It also seems to only be object changes, not all bucket changes, so treating the entire bucket as an immutable versioned artifact still isn't possible.


I am sure there is momentary thrill of achieving minimalism but alas the world is not so simple anymore. I would refer the OP and the community here to the paper from the creator of PostgreSQL: http://cs.brown.edu/~ugur/fits_all.pdf


I think that paper is making an argument orthogonal to OP. OP is saying Postgres is a good enough solution, that the advantages of simplifying the stack outweigh the disadvantages of using a non-optimal database for basic use cases.


(2005)


Exactly. The title is - “One Size Fits All”: An Idea Whose Time Has Come and Gone

As is so often the case in this industry an idea comes, goes, and comes back around again. Time to reevaluate.


I always start with just MySQL and introduce things as needed - not as guessed. These days I don't work on anything with enough traffic that needs more than that.

An RDBMS is a lot more than just SQL these days, and they offer a lot of good enough solutions to a wide variety of problems.


Completely agreed, sadly we’re seeing a ton of developers who are honestly more interested in getting half baked solutions out the door so they can move on to the next project. We have one customer who run a huge national project on a few MariaDB servers, one can technically run the whole thing, it’s no problem. Another customer is smaller but insist on using Hibernate, but they don’t really know how to use it, so they’ll frequently kill the database generating silly queries. Instead of accepting that may they don’t fully understand their choose stack, they try to “solve” the problem by adding things like Kubernetes and Kafka, complicating everything.

Modern databases, and servers in general is capable of amazing things, but there’s a shortage of developers with the skills to utilize them.


I wholeheartedly agree with the author that holding all your state within one system has massive benefits. As soon as you have to make your state cross process boundaries yourself, you are going to have a much more difficult time and are exposing yourself to a much larger problem space.

> If we were to need search, we’d use Postgres’ full text search instead of ElasticSearch.

I think this is significantly easier said than done, I don’t really consider those two pieces of tech equivalent. Postgres FTS is pretty good, but I don’t think it really can do a lot of the stuff ES can do well.


This is great, but you might want to have multiple postgreses for the different workloads. DB postgres != rate-limit PG != search PG. It's pretty hard to optimize one DB for every workload.


Counter point: most people operate on workloads so trivial that they don't need optimized.

I think the most important line in the article is the "let's see how far it gets us." It is absolutely trivial to invent situations where an architecture wouldn't work well, or scale, or "be optimal." It's far, far harder to just exist in reality, where most things are boring, and your "bad" architecture is all you ever need.


Why? You can have multiple databases in one instance, running multiple pg instances seems counterproductive?


Replication works across the whole instance. I'm working on a PBX that uses two PostgreSQL instances: 1 for configuration, 1 for call logs. I can replicate the configuration database everywhere and only keep 1 copy of the call logs database.


Multiple databases in postgres fundamentally share the same underlying infrastructure (i.e., WAL), and so do not offer much in terms of scalability or blast-radius protection compared to putting all tables in the same database.


Maybe instance-level configuration?


The next level up of this approach is running everything on one box.


Redis is overused in my opinion. For many requests it does not beat a database for the same amount of money. There can be other reasons for using a cache though. I often hear that people claim that the cache “protects” the database. From my experience it is more common that once the database has problems it spills over to the cache. If then for instance a circuit breaker opens to the cache the database will be smacked senseless.


Often, cache is relied on so much that we are afraid to clear it because no one knows what the impact will be on the database. We now duplicate our data in many cases, have to deal with cache invalidation, and ironically create more risk than protection. Cache should be extremely selective and encapsulated very well.


Most projects I work on use a lot of edge caching but it is not business critical. It is for speed. It is a problem if the design depends on both a cache and a database if the cache is dependent on the database.


I love this concept.

For anyone who’s interested in doing this, but doesn’t want to manage the infrastructure you should check out Supabase. (I’m not affiliated, just a fan.)


I love Postgres and mostly agree, but there’s one caveat: Postgres’s full text indexing is better than most non-Lucene options, but it’s still a toy compared to anything using Lucene (eg ElasticSearch). If you need to support indexed search of documents, especially multi-language corpora, you need to use the right tool for the job and Postgres isn’t it.


operationally, makes sense. but the inevitable moment (if you survive) you need to migrate to smth else depending on a different queue system, it'll be a pain to retrofit the code relying on db-level transactions and locks.


> but the inevitable moment (if you survive)

It's probably not inevitable. Simple is fast, and fast can scale you really far.

Sure, if you end up being google-scale then yeah, the world changes. But there's very few companies that large, yours is probably not growing to that size.

Over a decade ago I joined a mid-sized startup and took ownership of a service using MySQL. The first urgent warnings I was given was that they had to migrate to cassandra ASAP because soon MySQL couldn't possibly handle it.

I took a look at the traffic and the growth curve and projected customer adoption. And then put that project on hold, no need yet. Company went on to an IPO, grew a lot, pretty successful in its industry. And years later when I left, it was still going strong on MySQL with no hint of approaching any limitations.


There's a pretty big spread between google-scale and needing to use a "real" queue instead of a RDBMS though. I honestly think you're better off using something like SQS to start with and taking the very minimal ops burden of the extra dependency.


Except it's not inevitable. We have a few 15+ year old profitable projects that are still working fine on RDBMS backed queues.


This only makes sense if the effort to migrate is more than the accumulated effort of working with and maintaining that solution from the start.


The effort to use something like SQS is almost nothing.


The if you survive but is key. If you survive to the point you need to scale like this you will no doubt have more resources available. Do what you can to get going now. Solve future problems as they come.


This is one thing that really appeals to me about postgres - it may not be the best at everything, but it can do it all. Love seeing people actually load testing it before adding another stateful service.


I was a bit disappointed. I though they were going to implement their entire system using stored procedures. That would be “single dependency”. As it stands it is “all my app tier dependencies and postgres.


I was worried about how long the initial indexing would take for a recent full text search implementation in Postgres.

Took less than a second on a few hundred thousand rows.

Naive and simple is good enough for now.


my take on this looks similar, but I'll have more going on:

1. kubernetes. 2. postgres. 3. application.

where the kubernetes bit is used for the more integration test side of things.

a lot of machinery can be employed that gets in the way of "wtf just happened".


I'm not sure what qualifies as "stateful", but

> Fewer dependencies to fail and take down the service.

No logging? No metrics? No monitoring? (& yes, you'd think those shouldn't take down the stack if they went offline. And I'd agree. And yet, I've witnessed that failure mode multiple times. In one, a call to Sentry was synchronous & a hard-fail, so when Sentry went down, that service 500'd. In another, syslog couldn't push logs out to the logging service, as that was very down, having been inadvertently deleted by someone who ran "terraform apply", didn't read the plan, & then said "make it so"; syslog then responded to the logging service being down by logging that error to a local file. Repeatedly. As fast as it possibly could. Fill the disk. Disk is full. Service fails.)

I've also seen our alerting provider have an outage during an outage we're having & thus not sending pages for our outage, causing me to ponder and wonder how I'd just rolled a 1 on the SRE d20 and what god did I anger? Also who watches the watchmen?

> A common pitfall is to introduce something like ElasticSearch, only to realize a few months later that no one knows how to run it.

Yeah I've seen that exact pit fallen into.

No DNS? Global Cloudflare outage == fun.

No certificates?

I've seen certs fail so many different way. Of course not getting renewed, that's your table stakes "welcome to certs!" failure mode. Certs get renewed but an allegedly Semver compatible upgrade changed the defaults, and required extensions don't get included leading to the client rejecting the cert. I've seen a service which watches certs to make sure they don't expire (see the outage earlier in this paragraph!) have an outage (which, b/c it's monitoring, wasn't customer visible) because a tool issued a malformed cert (…by… default…) that the monitor failed to parse (as it was malformed). Oh, and then the LE cross-signing expiration took out an Azure service that wasn't ready for it, a service from a third-party of ours that wasn't ready for it, and our CI system b/c several tools were out of date including an up to date system on Debian that was theoretically "supported"… but still shipped an ancient crypto library riddled with bugs in its path validation.

> Okay fine, S3 too, but that’s a different animal.

Is it? I've seen that have outages too, & bring down a service with it. (There really wasn't a choice there; S3 was the service's backing store, & without it, the service was truly screwed.)

But of course, all this is to say I violently agree with the article's core point: think carefully about each dependency, as they have a very real production cost.

(I've recently been considering changing my title to SRE because I have done very little in the way of SWE recently…)


Metrics feels like another area where you should allow exception. Prometheus is like a tank, I’ve never seen it misbehave and if you set retention short it’s very close to a fire and forget deployment with very little configuration, especially if you only use it to monitor two other static services.

These metrics will be invaluable to tell you how close to the limit you are running your single dependency, to avoid adding premature cache in front or whatnot.

The integration of Prometheus with other tools also make it worth the extra component, rather than reinventing your own time series in PG.

Logging is a whole different beast. Especially if you even think about Elastic.


> Logging is a whole different beast. Especially if you even think about Elastic.

I think you kinda have to have something like Elastic. (Not necessarily Elastic, but honestly, I've yet to see anything better. Which … isn't great.) As soon as you're dealing with multiple services or multiple instances of a service, you'll want some way to aggregate logs. Even for a single service, on a single instance, some way to search logs (no, grep does not count), and at some point, alert.


Take a look at Grafana Loki. It’s a log aggregator designed for ease of operation. With some compromises on query capabilities instead, well worth it as it’s still incredibly powerful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: