Postgres is the greatest.

mrslave · on Feb 9, 2022

Agree. I've been hoping for a PostgreSQL extension with in-built sharding (think Netezza or Teradata). I know this is ambiguous so a low-effort definition is in order: a tightly bound cluster (nodes are aware of each other and share data to fulfill queries) where you specify distribution for a table but there is no explicit rebalancing command. Admins can add nodes and the user is none-the-wiser (except for improved performance, of course). Cross-node joins work (reasonably) well. I've been watching Citus for a while but - unless I'm misunderstanding - the sharding is a bit more explicit and sometimes manual.

ozgune · on Feb 9, 2022

(Ozgun from Citus / Microsoft)

Hi there, thanks for mentioning Citus. Could you share a bit more about the user experience you're looking for with sharding?

With Citus, you create your Postgres table as-is. If you'd like for the table to be distributed, yes, you'd need to pick a distribution key. You'd do this by calling: SELECT create_distributed_table('postgres-table-name', 'distribution-column');

We also thought about picking a distribution key on behalf of the user. This however has performance implications, particularly as you add more nodes to the cluster.

azurelake · on Feb 9, 2022

Not only is not the greatest, it's literally the worst database technology that's in use for the things PgCat is trying to solve. And that's not a knock on PgCat all (or on Postgres!), it's a knock on the "golden hammer" type worship of Postgres that's been the zeitgiest for the past few years.

The state of the art for sharding, connection scaling, and failover for Postgres is far behind everything else. MySql has Orchestrator and Vitess, the NewSQL systems are doing lots of interesting stuff with replication and sharding, etc. etc.

edit: Look at the work that Notion had to as of just 3 months ago to shard Postges: https://www.notion.so/blog/sharding-postgres-at-notion. Maybe they would make the same choice over again, and that's fine, but doing stone age level work to shard a database in 2021 doesn't jive with the whole "just use Postgres" idea to me.

dikei · on Feb 9, 2022

Most application never reaches the scale that requires sharding. I'll just use the quote from the notion blog as arguments against automated sharding solution like Vitess.

>> Besides introducing needless complexity, an underrated danger of premature sharding is that it can constrain the product model before it has been well-defined on the business side. For example, if a team shards by user and subsequently pivots to a team-focused product strategy, the architectural impedance mismatch can cause significant technical pain and even constrain certain features.

>> During our initial research, we also considered packaged sharding/clustering solutions such as Citus for Postgres or Vitess for MySQL. While these solutions appeal in their simplicity and provide cross-shard tooling out of the box, the actual clustering logic is opaque, and we wanted control over the distribution of our data.²

azurelake · on Feb 9, 2022

https://news.ycombinator.com/item?id=30274912

3np · on Feb 9, 2022

When you’re at the point that sharding is an important consideration, "just use X" doesn’t apply for any X anymore (except "the expertise in your own team" etc).

The advise is clearly targeted at those with a scale orders of magnitude smaller than Notion.

I don’t think anyone is claiming "you should just use Postgres" without knowing about your constraints. And if they do, I guess that’s a useful heuristic to know when someone doesn’t know what they’re talking about.

It’s still good advise for the 95%, even if you yourself fall outside of that. Part of the reason you get paid well is to know when advise applies and whet it does not, and to not waste energy arguing in either case.

CuriouslyC · on Feb 9, 2022

Postgres has focused on being a stellar database, not on being a stellar exascale application platform. It deserves the love it's getting on the database side, and for most people the weak exascale story isn't a problem.

azurelake · on Feb 9, 2022

Sure, but this post is specifically about a tool that relates to that weak story. In the context of the use cases PgCat is trying to address, Postgres is most definitely not "the greatest". Apparently, that's a controversial statement :)

CuriouslyC · on Feb 9, 2022

I don't think calling out the far end of the postgres scaling story is controversial. On the low end, you can add more hardware or improve queries/indexes, and the mid level works fine with a write master and read replicas, but once you get into multi-master/sharding it's definitely more involved than other systems, and if you just use postgres as a dumb data dump, maybe the value isn't there. It's all about the right tool for the job.

Of course, there will always be fanboys who take things too far.

whateveracct · on Feb 9, 2022

Postgres deficiencies you describe can be solved with elbow grease

MySQL's cannot

lmm · on Feb 9, 2022

That's backwards IME. MySQL has bad defaults and bad error reporting, but you can work around these things by being careful about the configuration of every instance and making sure you always use strict modes and check for errors after each statement. Postgres has generally better defaults and better behaviour upfront, but things like true (master-master) HA or particular secondary index behaviour that's needed for some high-performance workloads are just impossible.