Ask HN: What could a modern database do that PostgreSQL and MySQL can't

GeneralMayhem · on Sept 6, 2021

Lots of people are citing cutting-edge bells and whistles for advanced query plans and data types, but after working at $BIGCO and seeing what exists there (and has existed for years), I get frustrated by the lack of ergonomic and operational essentials in the public-cloud options.

1. Horizontal scaling. PG can do sharding or replication, but that's not the same thing. Manual resharding is something that, once you've experienced it, you don't want to do it ever again, especially when things like Spanner/Cockroach exist. Database-level liquid sharding is such a dramatically superior solution that it makes any system that depends on an enumerated set of master instances seem completely obsolete.

2. Strong commit timestamps, again a la Spanner/Cockroach. Globally-ordered commit times and read snapshots aren't something you'll need in every schema, but when you do, they're magical - suddenly, you can use your OLTP system for strong lease assignment (conditioning other transactions on lease validity), you can construct arbitrary application-level transaction semantics (non-transactional read followed by transactional compare-and-swap is pretty powerful in some scenarios), and all sorts of other things.

3. Interface abstraction. Databases that ship language-support drivers should also ship a fake in-memory implementation for testing purposes that supports the same interfaces and options. There's no reason why I should have to start a subprocess and emulate a network connection to tell whether a query functions properly or not, or whether I've set isolation properties correctly for a pipeline.

4. Declarative schemas, and a well-defined update process (that isn't human-written DDL), are essential at any sort of organizational scale. Every time I see an "ALTER TABLE" statement checked into a repository, I tear my hair out.

5. Queue support, where message publishing/consumption is transactional with other read/write operations. You can always build a queue with a time-keyed table, but it's kind of a pain in the ass to get all the edge cases right. It's much better as a first-class concept supporting automatic sharding, retries, batching, and bounded-out-of-order delivery with a global cursor.

yarcob · on Sept 6, 2021

> Horizontal scaling

I think the importance of horizontal scaling is overhyped. 99% of PostgreSQL applications are at a size where a single machine can easily handle the workload.

To enable horizontal scaling, you need to make so many tradeoffs that I don't think it's worth it for most applications.

pmelendez · on Sept 6, 2021

I see people responding to this thread with examples of high-volume workflows that use a small number of machines for the database layer.

I think those examples are missing the point. One of the examples uses a Redis cluster, which is a sign that the DB alone can't support the workflow.

Another example is using a database for certificate issuance. Which I suspect is using one table or multiple that can be shard and it doesn't suffer from lock contention or joints of multiple tables.

I wouldn't be brave enough to count the number of cases where horizontal scaling is needed but I would say it is definitely not zero, especially for read replicas when high availability is needed

GeneralMayhem · on Sept 6, 2021

I'd be willing to believe 99% of applications, yes - for the same reason that Wordpress can claim to drive some ridiculous percentage of websites - but since the larger use cases tend to come from larger organizations, 99% of developers don't work on systems where one instance suffices. There's a reasonably large number of companies for which that's not true, and there's not very many good "next step" options when you're too big for vertical scaling but too small to fund building your own engine.

lazy_kangaroo · on Sept 6, 2021

where did you get the 99% metric ? Most of the companies even with a single saas product have insane amount of data these days.

Not just application data, There is also a whole lot of analytical data collected at every step of the product usage cycle.

traceroute66 · on Sept 6, 2021

> where did you get the 99% metric ? Most of the companies even with a single saas product have insane amount of data these days.

I will just leave this link here ....

https://letsencrypt.org/2021/01/21/next-gen-database-servers...

cutemonster · on Sept 6, 2021

And this one -- StackOverflow, one single live db:

https://stackexchange.com/performance

KingOfCoders · on Sept 6, 2021

From managing databases over the last two decades (starting with Ingres/mSQL), hardware has growing much faster (RAM/CPU/NVMe) than the data needs. I remember the first time we put 2Tb RAM into a machine to handle most analytics in memory.

From my experience TimescaleDB is fast and takes a lot of data in. If you're multi tenant, it's usually easy to shard.

And we do dataware housing on BigQuery, no need to have your own machine and manage the database.

Of course there are people who need unlimited horizontal scaling.

LoriP · on Sept 6, 2021

TimescaleDB also offers a good lot of supplementary functions to the PostgeSQL core product to help with time series data analysis... saves a lot of SQL acrobatics!

jpgvm · on Sept 6, 2021

Generally you don't have your main application database also serve as your data warehouse for lots of reasons.

Generally you offload that to purpose built systems that favor those aforementioned tradeoffs or onto services that run them for you, i.e BigQuery, Snowflake, etc.

Your main application database is unlikely to need sharding unless you really do have a phenomenal amount of customers or you need regional sharding to meet legal requirements about data sovereignty for example.

geon · on Sept 6, 2021

If you do sas, you are already the 1 %.

mosselman · on Sept 6, 2021

You are right, the metric should probably be 99.999999999999999%. 99% is way to conservative.

papaf · on Sept 6, 2021

Two things stick out here that I don't fully understand.

Declarative schemas, and a well-defined update process (that isn't human-written DDL), are essential at any sort of organizational scale.

Isn't this impossible because some schema changes require data migration? A data migration cannot be declaratively automated as far as I know.

Queue support

Why not use a dedicated and feature rich queue such as Rabbit MQ or, if you want to get really fancy, Kafka?

GeneralMayhem · on Sept 6, 2021

>Isn't this impossible because some schema changes require data migration?

In the most general case, sure - although there are workarounds for some specific cases (e.g., including previously-known-as names in the declarative schema to allow automatically planning renames). But 99% of the time, you're adding and removing tables and columns in a way that's very well defined. This is one of those areas where the best solution is to legislate away the hard problems - a tool that covers 98% of schema changes automatically (the provably safe ones), and then fails/requires a human to approve the last 2% is still dramatically better than having humans manually write and sequence change plans all the time.

Data migrations will require human effort, but you can sequence the changes and isolate the parts that need different kinds of work. If you're changing a string to an integer, for instance, you can make it clear in your change history that you (1) add the new column, (2) start dual writes, (3) backfill, (4) move readers to the new column, (5) stop the dual write, (6) drop the old column. You can do that with checked-in migration code, but think about what you end up with at the end - with imperative migrations, you have the clutter from all of that history; with declarative schema definitions, you just have the final state, just like how it works for code.

Declarative schemas also usually come with nice ancillary properties - for instance, they can give you automatic rollbacks, because you can pack up the whole schema definition as a static artifact.

>Why not use a dedicated and feature rich queue such as Rabbit MQ or, if you want to get really fancy, Kafka?

Atomicity. It's really, really powerful to be able to write a row and send or receive a message as a single exactly-once transaction. Imagine a queue of events. The writer writes the event data, and sends a message as a notification that it exists; the consumer will keep some sort of aggregate (count, filtered total, etc.). With a separate queueing system, you have to explicitly manage the timing and error cases bridging different atomicity domains, usually reimplementing some sort of idempotency log on your own. If it's all in one place, you just write a transaction that reads the old value, consumes the queue message, and writes a new value.

throwaway_pdp09 · on Sept 6, 2021

I'm doing a project perhaps relevant to this. Talk of declarative schemas (or what you imply they offer) is very interesting and I'd like to know more but I can't find anything relevant (just magento and sqlalchemy). Indeed, searching for <<<"Declarative schemas" "sql">>> in google gets this very thread on the first results page.

Any links to clear, actionable and reasonably comprehensive examples of these would be most helpful. Obviously abstract statements of the required semantics are also needed, but I also need to see what actual code would look like.

TIA

GeneralMayhem · on Sept 6, 2021

Magento (the new XML-based version, not the old create/update/revert-script version) gives a lot of these properties. Making it part of the database instead of a third-party tool would be better, though - it lets you cover more features, and with deeper integration you can get transactional updates and history logs.

saulrh · on Sept 6, 2021

> Why not use a dedicated and feature rich queue such > as Rabbit MQ or, if you want to get really fancy, Kafka?

If your queue is in the same database as your data, you can do "ack message/mark job as complete" and "commit result of job to database" in the same transaction. That simplifies a lot of the harder parts of things like materialization pipelines or BI systems.

Bubblegum21 · on Sept 6, 2021

Wanted to ask you more about the experiences you wrote. Have some similar thoughts as your later points. Do you have a contact method like an email address or something?

___luigi · on Sept 6, 2021

> .. scaling

Did you try Stolon?. https://github.com/sorintlab/stolon

GeneralMayhem · on Sept 6, 2021

Maybe I'm reading it wrong, but that looks like yet another multi-master high-availability system, which is also an important property for a system to have but more or less unrelated tow hat I was talking about.

As far as open-source add-ons for true horizontal scaling, I think [Citus](https://github.com/citusdata/citus) is the most well-known and sophisticated, but I don't have enough experience with it in production to have a particularly strong opinion yet. It might work quite well, but I still fundamentally doubt that it will ever be as good as a system that was designed to support horizontal sharding from the ground up.

KronisLV · on Sept 6, 2021

> 1. Horizontal scaling. PG can do sharding or replication, but that's not the same thing. Manual resharding is something that, once you've experienced it, you don't want to do it ever again, especially when things like Spanner/Cockroach exist. Database-level liquid sharding is such a dramatically superior solution that it makes any system that depends on an enumerated set of master instances seem completely obsolete.

It feels like the industry has largely failed at this in regards to the traditional RDBMSes. Solutions like TiDB attempt to remedy this in a transparent way, but there are simply too many limitations for it to be a feasible alternative for many projects out there: https://docs.pingcap.com/tidb/stable/mysql-compatibility#uns...

I've only ever seen PostgreSQL/MySQL/MariaDB/Oracle be deployed successfully as single instances in prod that are scaled vertically, because attempts to do otherwise either create issues with performance and resiliency, or make the entire system only have eventual consistency, which is as problematic as it is inevitable when it comes to horizontally scaled and distributed systems.

I don't think that there is a good answer to this, honestly. You will get either horizontal scalability OR data consistency - pick one.

> 3. Interface abstraction. Databases that ship language-support drivers should also ship a fake in-memory implementation for testing purposes that supports the same interfaces and options. There's no reason why I should have to start a subprocess and emulate a network connection to tell whether a query functions properly or not, or whether I've set isolation properties correctly for a pipeline.

That's a good idea, but personally i've seen that almost all abstractions are leaky in one way or another. In the day of containers and highly customizable software that can scale from tens or hundreds of megabytes of memory with a fraction of a CPU core to an entire server with a hundred gigabytes of memory and dozens of CPU cores, it feels somewhat unnecessary to risk these abstractions.

Just make sure that your process for initializing the DB schema and seeding it with data for testing is fully automated and reproducible, and then spin up a disposable database container as a part of your CI process. Either that, or just don't test the actual DB implementation, which i've also seen be done to pretty horrendous results.

Edit: Perhaps my past experience makes me have certain biases - i've also seen manually set up DB instances which eventually leads to there being no good way to test N parallel feature branches, each of which needs a different schema, because that also necessitates N DB instances to be present, half or more of which could in theory break. I don't see any good options apart from containers here, especially when you need extensions which these in-memory implementations wouldn't play nicely with.

GeneralMayhem · on Sept 6, 2021

It depends on what you're testing. Performance properties and version updates, sure, it's hard to find a substitute for containers. But that's not 99% of testing. If the database has a stable interface (for instance, its dialect of SQL+DDL), it can ship a fake implementation of that interface to use in unit tests or toy examples.

KronisLV · on Sept 6, 2021

> If the database has a stable interface (for instance, its dialect of SQL+DDL), it can ship a fake implementation of that interface to use in unit tests or toy examples.

Sometimes the interface just has too large of a surface area for it to be simulated properly.

For example, suppose that you have an Oracle stored procedure, which, when given about 20 parameters, goes through a decision tree in a PL/SQL block that spans a few thousands of lines and along the way uses both DBMS_OUTPUT.PUT_LINE, as well as DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS, any number of methods from DBMS_PREDICTIVE_ANALYTICS as well as any number of other packages that are included in the full release: https://docs.oracle.com/en/database/oracle/oracle-database/1... which may also include logic that makes network requests, sends e-mails or even triggers other pieces of code to be executed.

All of that would be almost impossible to fake well enough because of the scope of functionality that's used, especially if methods are not only called, but their output is also checked and the execution depends on this. Should systems be simpler and do less? Probably, but that's not the objective reality in many projects out there.

That said, i agree that it's a good thing to have if you treat your DB as a data store and not much more, which in my opinion is a pretty good and stress free way to go about things overall. But somehow that's also like saying "Just use an ORM to have your system be DB agnostic!" which never actually works out in practice because of implementation details that leak through.

Chyzwar · on Sept 5, 2021

1. High performance read/write of Scylla/Cassandra with high availability[1]. It has some limitations for OLTP workloads and require careful planning. Postgres without Citus is not really HA and even then companies would often invent custom sharding/cluster management system.

2. Powerful graph query language like Cypher. It might not perform well in real life, but my personal experience left me amazed[2]. There is a number of issues with SQL that could be addressed[3], but current standard is way too much prevalent.

3. Zero impedance mismatch between database and application representation. In database like Smalltalk GemStone it is really seamless experience for developer to write database code and application code[4]. To some extent, MongoDB success can be attributed to this aspect.

4. Datomic temporal capabilities[5]. It is hard to maintain temporal tables in postgres. There are some use cases where you really want query in point of time. Strictly not an OLTP feature, but I can see this be usefully in many scenarios.

  [1] https://www.youtube.com/watch?v=Fo1dPRqbF-Q
  [2] https://www.youtube.com/watch?v=pMjwgKqMzi8&t=726s
  [3] https://www.edgedb.com/blog/we-can-do-better-than-sql
  [4] https://www.youtube.com/watch?v=EyBkLbNlzbM
  [5] https://www.youtube.com/watch?v=7lm3K8zVOdY

Edited: independence -> impedance

charlysl · on Sept 6, 2021

> Zero impedance mismatch between database and application representation

Does this include informacion hiding/encapsulation? (to prevent saved objects' internal representation from being exposed).

Traditional databases don't have an encapsulation mechanism AFAIK, which is one of the reasons for impedance mismatch.

This is important because it is a good practice for client code to make no assumptions about the internal representation, accessing data only via the a public interface.

If it happens to be exposed by the database, the clients can use it in their queries. If the internal representation changed later on, such clients would be broken.

Of course, this can be solved by only allowing data access via, say, well designed restful apis (that don't expose internal details), but this would still provide no guarantees.

How about another reason for impedance mismatch, that of storing objects that belong to a class hierarchy?

takeda · on Sept 6, 2021

> Traditional databases don't have an encapsulation mechanism AFAIK, which is one of the reasons for impedance mismatch.

It actually does, those are views and functions.

The real problem with impedance mismatch is that SQL is declarative (you say what you want and database figures out how to get it) when most programming languages are iterative (you say what should be done).

The issue is that you have two very different languages. For one you have powerful IDE with type checking auto completion and refactoring capabilities, the SQL often is sent as a string and don't have these benefits. The various ORM are attempts to use iterative and object oriented language to access relational objects using a declarative language.

I think JetBrains is addressing the problem the right way. They added Data Grip functionality to their IDEs like PyCharm for example. What it does is that if you connect the IDE to a database and let it download the schema you get the same functionality for the data. Basically it will detect SQL statements in the string and offer the same capability for it as you have with the primary language.

At that point the impedance mismatch no longer feels like a mismatch. You basically have two languages, one to obtain data you need and another to process/present it. You can get database to return exact fields you need for your projects and even the object mapping starts feeling unnecessary.

Why data is stored in a relational way? Because that's most optimal way to store the data and the way it is stored allows multiple applications access the same data differently.

For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

Ultimately the data is the most important thing businesses and it stays, while applications that use it come and go.

wruza · on Sept 6, 2021

For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

Honestly, SQL has this problem too, but it presents itself not in the way you store, but in the way you query. There are simple schemas and complex ones, and irrespective of that there are obvious query sets and unplanned ones (i.e. written at runtime as part of the data analysis process). SQL and its autoplanning is required only for complex+unplanned cases, in my opinion. In all other cases I know my data and I’d better walk through the indexes myself rather than writing 4-story queries to satisfy the planner. At the end of the day, nested loops through the indexes is what RDBMS does. There is no declarative magic at the fetch-and-iterate level.

Iow, it would be nice to have “extql” a direct access to indexes and rows, in sort of a way EXPLAIN works, and skip SQL completely.

  function get_items(store_id) {
    for (var item in items.id) {
      var res = item.{name, code}
      var item_id = i.id
      res.price = prices.any({item_id, store_id})?.price
      if (!res.price) continue
      res.props = todict(props.all({item_id}).{name, value})
      yield res // or collect for a bigger picture
    }
  }

This query could be an equivalent of “select from items inner join prices on (store_id, item_id) left join props on (item_id)” but saving space for many props and being much more programmable. Also, it would be nice to have the same engine (sql+extql) at the “client” side, where the inverse problem exists – all your data is nosql, no chance to walk indexes or declare relations.

takeda · on Sept 6, 2021

> Honestly, SQL has this problem too, but it presents itself not in the way you store, but in the way you query. There are simple schemas and complex ones, and irrespective of that there are obvious query sets and unplanned ones (i.e. written at runtime as part of the data analysis process). SQL and its autoplanning is required only for complex+unplanned cases, in my opinion. In all other cases I know my data and I’d better walk through the indexes myself rather than writing 4-story queries to satisfy the planner. At the end of the day, nested loops through the indexes is what RDBMS does. There is no declarative magic at the fetch-and-iterate level.

The thing is that what worked at specific time can change. For example if you have simple join with two tables, let say A and B. You search by column in target A to get value from column in table B. Now if both tables are large then it makes sense to lookup in A by an index, then use foreign key and index to find the row in table B.

Now if A and B have few elements. Even if there is an index on both of them, it actually is faster just to scan one or both tables.

It might be actually more beneficial to ensure that tables are properly analyzed, have right indices and preferences in the query planner are tuned.

If you need to override query planner, you don't have to make sophisticated queries, you can just use this[1] extension. Though if things aren't working right it is either lack of data, mis-configuration or a bug.

[1] http://pghintplan.osdn.jp/pg_hint_plan.html

charlysl · on Sept 6, 2021

> For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

This point is very important, and well explained in Stonebraker's paper "What Goes Around Comes Around". What is most interesting is that he is actually talking about half a century old pre-relational IMS IBM databases, but they had exactly the same issue, hence the paper's title. Codd invented the relational model after watching how developers struggled with the very problem you mentioned.

Stonebraker famously quipped that "NoSQL really stands for not-yet-SQL".

He also addresses the impedance matching issue in the "OO databases" section; there is actually a lot more to it, and he gives it all an insider's historical perspective.

Scarbutt · on Sept 6, 2021

Your view of what impedance mismatch is doesn't sound accurate. It's not about declarative or imperative or syntax or strings, etc...

It's about data modeling, one models data using relations, the other models data in a hierarchical way (using maps, arrays, objects, etc...). They are two different ways to structure your data, hence the impedance mismatch.

takeda · on Sept 7, 2021

Perhaps I was using the wrong word. I was referring that traditionally things fell a bit off when working with SQL. Because often it wasn't a code, just a bit of strings that you were sending.

Because of that, developers started to abstract that with code and objects that were then populated with data.

With IDEs understanding the SQL that's no longer necessary. I can construct a specific SQL to get the exact structure my program needs. Even if it is hierarchical I can use various jsonb aggregation functions. That's a game changer to me.

bbsimonbb · on Sept 6, 2021

Not just jetbrains. pgtyped does this for postgres and typescript, and queryfirst for c# against sql server/postgres/mysql.

takeda · on Sept 6, 2021

I did not know that, I first encountered that in PyCharm.

I'm glad there's more.

Edit: actually what you mentioned is slightly different. This is what I'm talking about: https://youtu.be/_FlpiNno088?t=2863

vaughan · on Sept 6, 2021

> good practice for client code to make no assumptions about the internal representation

If your internal representation and API start to differ then it adds complexity fast. Its far better to have as close to a 1-1 mapping for your backend and frontend data models as possible.

zozbot234 · on Sept 5, 2021

Postgres comes with the building blocks for both sharding and HA out of the box, and they're extensively discussed in the docs. You don't need proprietary addons other than as pure convenience.

zem · on Sept 5, 2021

don't underestimate the importance of convenience. I'm convinced one of the reasons MySQL had so much more mindshare than postgres back in the day was that it was far easier to get up and running, even if postgres might have been easier to use once everything was set up correctly.

bostik · on Sept 6, 2021

That's funny, I must have been an outlier then.

I've been using Postgres since 1998, and I tried getting MySQL up first. There was more documentation available for the latter, so it should have been simple. Failed. It just didn't work.

Out of frustration I then tried Postgres, because I just wanted a decent database for my project. It was surprisingly easy, I only had to learn about pg_hba.conf to get to a functional state. Everything else was in place out of the box.

I've been a happy user ever since. MySQL may have had the mindshare (thanks to prevalence of LAMP) but everything outside the magic happy path was confusing and fragile.

ahoka · on Sept 6, 2021

No one cares what was in 1998, that's the whole point of this discussion. Postgres devs kept digging their heads into sand for many years, saying that high availability is somehow not the task of the database. In reality only the datastores need to be HA/durable in an otherwise stateless architecture.

BarryMilo · on Sept 6, 2021

I don't even remember choosing MySQL when I started (15 years ago). It was just so dominant, we didn't question it.

Nowadays I would still use it because I assume it is the dumbest database system and that's exactly what I need for my 1-5 user app.

5e92cb50239222b · on Sept 6, 2021

It still has some features over PostgreSQL that pushed me to choose it (actually MariaDB) for a new project about a year ago, namely multi-master replication. Yes, I know, terrible database and horrible feature, but it really helped in that particular domain.

I couldn't find anything decent for postgres, while MariaDB/MySQL have that built-in, with some differences in implementation. Especially for a customer who refuses to pay for his software, because there are some commercial solutions.

jjirsa · on Sept 6, 2021

Sharding is not the same as natural clustering, because eventually you’ll need to reshard and then you’ll be writing a lot more code.

jasfi · on Sept 6, 2021

Postgres has many HA solutions, and it's getting better all the time. Postgres has good performance, benchmarks need to show the various trade-offs systems make so that informed decisions can be made. Feel free to post a link.

The Graph model can be made available through extensions. See AGE: https://age.incubator.apache.org/ They plan to support OpenCypher.

The JSONB type allows for No-SQL like development if that's what you really want.

zeroc8 · on Sept 7, 2021

Just because you can use JSONB, it doesn't mean that it is as easy to work with as it is the case with MongoDB. So if you really need a JSON store, just use Mongo.

pas · on Sept 5, 2021

> independence mismatch

you probably mean impedance mismatch, right?

https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

vaughan · on Sept 6, 2021

> Zero impedance mismatch

I see the problem here being that too many manual optimizations need to be done when implementing a schema.

You start with a logical schema (ERD diagram) and then implement it via a physical schema with denormalizations added for efficiency (usually because the relational model has scaling limits with number of joins, or because db migrations are too difficult, or handling unstructured data without introducing a ton of new tables). The db should do the denormalization automatically, allowing the user to interface with their logical schema directly.

Another reason is we can't use SQL in the browser - we have to go through many caching layers and API layers which complicate things.

cwp · on Sept 6, 2021

+1 to the application/database mismatch. GemStone is really amazing. I never got to do commercial work with it, but damn is it a great environment. I've never seen anything like it.

grobbie · on Sept 5, 2021

CockroachDB is getting a lot of interest these days.

It has broad PGSQL language (and also wire I think) compatibility yet has a clustered peer architecture well suited to running in a dynamic environment like cloud or k8s. Nodes can join dynamically and it can survive them leaving dynamically as long as there's a quorum. Data is distributed across the nodes without administrator needing to make any shard rebalance type interventions.

PGSQL is designed for deployment as a single server with replica servers for HA. It's not really designed for horizontal scalability like Cockroach. You can do it - the foreign data wrappers feature and table partitioning can give you poor man's scale out. Or you can use Citus which won itself a FOSS license earlier this year. And there are other Foss and proprietary approaches too.

MySQL is similar - you can do it, like with their recent router feature, but it has been retrofitted, and it's not as fluid as Cockroach. IIRC MySQL router is similar in configuration to Galera - that is, a static config file containing a list of cluster members.

Listen I'm sure that the design approach of Cockroach could be retrofitted to PGSQL and MySQL, but I'm pretty sure that doing a good job of it would be a lot of work.

So in answer to your question, I'm not sure that there's all that much RDBMS can't be made to do. Geospatial, Graph, Timeseries, GPU acceleration. Postgres has it all and often the new stuff comes to Postgres first.

By the way I love MySQL and PostgreSQL, and the amazing extensions for PGSQL make it extra awesome. Many are super mature and make pgsql perfect for many many diverse use cases.

For XXL use cases though, CockroachDB is taking a very interesting new path and I think it's worth watching.

dvtkrlbs · on Sept 5, 2021

Not a database expert but here is my thoughts after using Cockroach DB for a mid sized: - Single cluster mode is really nice for local deployment - WebUI is a cool idea but I wish it had more info - The biggest problem is Postgres compatibility there are some quirks that was annoying examples being. The default integer types having different sizes, cockroachdb having a really nice if not exists clause for create type but having no way to maintain a Postgres compatible way of doing it (i think this one is on postgres only being able to do it with plsql scripts is cumbersome) - For me the neutral one. IT HAS ZERO DOWNTIME SCHEMA CHANGES. But if you are just coming from Postgres and just using a regular migration tool and transactions and schema changes having serious limitations and could end up your database in inconsistent state is scary. (Docs really document the behavior but still I would expect a runtime warning for it.

Supermancho · on Sept 5, 2021

CDB was great until we started doing table creation on the minute and hundreds of inserts on those tables. When we tried to drop tables on a schedule, CDB never could catch up and would generally just crash (3 nodes). I really don't like the magic you have to do with CDB for things that your commodity DBs can be expected do.

vvern · on Sept 5, 2021

CockroachDB dev here. We've gotten a good bit better in the last few versions in terms of schema change stability. We're still not very good at large schemas with more than 10s of thousands of tables but we've got projects under way to fix that which I expect will be in the release in the spring of 22.

I'd like to hear more about the magic.

Aeolun · on Sept 6, 2021

To be fair, while I kind of agree the system should be able to handle it regardless, 10,000s of tables sounds outside the realm of 99.99% of all use cases.

CydeWeys · on Sept 6, 2021

You might be surprised to learn how common the "store the results of the user's query into a temporary table" pattern is.

vvern · on Sept 6, 2021

Temporary tables in cockroach exist, but the implementation was done largely to fulfill compatibility rather than for serious use.

The implementation effectively just creates real tables that get cleaned up; they have all the same durability and distributed state despite not being accessible outside of the current session.

Getting something done here turned out to be a big deal in order to get ORM and driver tests to run, which is extremely high value.

A better implementation would just store the data locally and not involve any of the distributed infrastructure. If we did that, then temp tables wouldn't run into the other schema scalability bottlenecks I'm raising above.

aidenn0 · on Sept 6, 2021

Thanks for all of that information in those 2 posts.

np_tedious · on Sept 6, 2021

Can you tell me more about the use case where you'd create a new table every minute?

valzam · on Sept 6, 2021

Not OP but for analytics use cases it is common to create new tables for each minute, hour, day or whatever granularity you collect data in. This makes it easier to aggregate later, you don't end up with extremely big tables, you can drop a subset of the data without affecting the performance of the table currently being written to etc..

dafelst · on Sept 6, 2021

That sounds like what table row partitioning is for, I thought all the major databases supported that?

sandstrom · on Sept 6, 2021

I agree, CockroachDB is overall very good!

A few things I'm missing from it though:

- Advisory Locks (only exist as a shim/no-op today)

- LISTEN/NOTIFY

- CTEs (Common Table Expressions)

- Introspection and management features (for example pg_locks, pg_stat_activity, pg_backend_pid(), pg_terminate_backend())

They are making good progress though, and more recently they've spent effort on making adapters for well, e.g. their ActiveRecord adapter.

I think this is clever, since the deciding factor for many companies will simply be "Can I easily replace this with PostgreSQL in my Rails/Django/etc. project?"

traceroute66 · on Sept 5, 2021

> It has broad PGSQL language compatibility

Depends how you define broad. :)

Many key features are missing, including but not limited to:

    - UDFs and sprocs.
    - Useful datatypes such as TSTZRANGE
    - More limited constraints

I was recently looking at Cockroach as a PGSQL replacement because of its distributed nature. But the equivalence featureset is still lagging badly, unfortunatley.

rad_gruchalski · on Sept 5, 2021

May I suggest looking at YugabyteDB, which is a distributed SQL database built on actual PostgreSQL 11.2 engine? It's way ahead of CockroachDB in terms of feature support.

YugabyteDB has UDFs, stored procedures, distributed transactions, the range types are working from what I can tell, at least the example from here: https://wiki.postgresql.org/wiki/Extract_days_from_range_typ... works right out of the box, just copy paste. Postgres extensions are of course working. Orafce, postgis (with extra work needed for gin indexes, afair...), custom extensions.

YugabyteBD == Postgres. The query planner, analyzer and executor are all Postgres. Mind you, some features are not readily available because handling them properly in a distributed manner takes effort. Those unsupported features are disabled on the grammar level, before being worked on. But - unsupported features will not corrupt your data. Missing features are enabled very fast.

For example, I have recently contributed foreign data wrapper support: https://github.com/yugabyte/yugabyte-db/pull/9650 (enables postgres_fdw at a usable level) and working on table inheritance now.

Yugabyte is an amazing bit of technology and more people should know about it. By the way - it's Apache 2 with no strings attached.

For clarification: I'm not associated with Yugabyte in any way other than very happy user and contributor. Yugabyte organizes the Distributed SQL Summit later this month: https://distributedsql.org/. Might be a good opportunity to learn more about it. Second clarification: I'll be speaking at the event.

tuckerconnelly · on Sept 5, 2021

Are you still using postgres? Thinking about a similar transition, would really appreciate any shared learnings :)

traceroute66 · on Sept 6, 2021

> Are you still using postgres?

At the moment yes.

The final decision has not been made yet, but I think the reality is we're getting tired of evaluating all these distributed databases that claim Postgres compatibility only to find its the usual clickbait marketing speak.

So the most likely outcome is we're going to stick with Postgres and give the emerging distributed tech a couple more years to pull their socks up. We're not going to go round changing all our Postgres code and schemas just to shoehorn into the limitations of a random definition of "postgres support".

why-el · on Sept 5, 2021

I don't know if anybody from CockroachDB is reading this but their article[1] on serializable transactions is somewhat questionable, as it compares CockroachDB's Serializable to Postgres's Read Committed, which seems to imply CockroachDB is better. Of course, Postgres has serializable as well. The only novelty in the article is that Cockroach DB runs Serializable by default, so I am not sure what that comparison was doing.

[1] https://www.cockroachlabs.com/docs/stable/demo-serializable....

ummonk · on Sept 5, 2021

I don't get that implication from the article at all. What I do get is that the only transaction isolation level available under CockroachDB is serializable. And to show why serializable is valuable, they have to demonstrate using Postgres' default of read committed (because they would be unable to demonstrate this using CockroachDB which doesn't allow any other transaction isolation levels.

why-el · on Sept 6, 2021

Ah, fair reading, thanks. As Postgres supports that level of isolation as well, I was thrown off a bit. but your phrasing makes sense.

AdamProut · on Sept 6, 2021

I don't think CockroachDB supports anything other then serializable isolation level, so of course they are going to be pushing this type of argument. Its an interesting philosophy really. It would be one thing if the performance penalty of running all transactions at serializable isolation level vs weaker isolation levels was reasonably low, but its not (see [1] - yes I know the entire difference in performance here is not due to CDB using serializable transactions for everything, but I'm guessing a very large part of it is). Its kind of like saying the only cars in the world should be ford pintos because we don't trust folks to safely drive faster cars.

[1] https://www.scylladb.com/2021/01/21/cockroachdb-vs-scylla-be...

CRConrad · on Sept 6, 2021

> Its kind of like saying the only cars in the world should be ford pintos because we don't trust folks to safely drive faster cars.

Heh... The Pinto, for safety? https://en.wikipedia.org/wiki/Ford_Pinto#Fuel_system_fires,_...

mirekrusin · on Sept 5, 2021

It always rubs me the wrong way - all those paxos/raft approaches (which are great, but...) simply elect the leader to pick writes. In that sense there is no distribution of computation at all. It's still single target that has to cruch through updates. Replication is just for reads. Are we going to have something better anytime soon? Like real distribution, when you add more servers writes distribute as well?

josephg · on Sept 6, 2021

> Like real distribution, when you add more servers writes distribute as well?

There's a really interesting suggestion in The Mythical Man Month. He suggests that instead of hiring more programmers to work in parallel, maybe we should scale teams by keeping one person writing all the code but have a whole team supporting them.

I don't know how well that works for programming, but with databases I think its a great idea. CPUs are obnoxiously fast. Its IO and memory which are slow. I could imagine making a single CPU core's job to be simply ordering all incoming writes relative to each other using strict serialization counters (in registers / L1 cache). If the stream of writes coming in (and going out) happened over DPDK or something, you could probably get performance on the order of 10m-100m counter updates per second.

Then a whole cluster of computers sit around that core. On one side you have computers feeding it with writes. (And handling the retry logic if the write was speculatively misordered.) And on the other side you have computers taking the firehose of writes, doing fan-out (kafka style), updating indexes, saving everything durably to disk, and so on.

If that would work, you would get full serializable database ordering with crazy fast speeds.

You would hit a hard limit based on the speed of the fastest CPU you can buy, but I can't think of much software on the planet which needs to handle writes at a rate faster than 100m per second. And doesn't have some natural sharding keys anyway. Facebook's analytics engine and CERN are the only two which come to mind.

yencabulator · on Sept 9, 2021

> I could imagine making a single CPU core's job to be simply ordering all incoming writes relative to each other

Calvin is an interesting alternate design that puts "reach global consensus on transaction order" as its first priority, and derives pretty much everything else from that. Don't even need to bottleneck through a single CPU.

http://cs-www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.p...

Like VoltDB, the one huge trade-off is that there's no `BEGIN ... COMMIT` interaction where the client gets to do arbitrary things in the middle. Which would be fine, if programming business logic in the database was saner than with Postgres.

hodgesrm · on Sept 6, 2021

What you suggest would work if you just want to order writes. However, it won't support an ACID transaction model, which a lot of applications expect, or even simpler models where you can read values before you write.

For instance: consider an application that looks at the current value of a counter and adds 1 if the counter is less than 100. You can execute this on the primary (resource bottleneck because you need to see current state and only the primary has the full picture) or do the operation on a local node and try to submit it to the primary for ordering (coordination required over the network, e.g., to ensure data are current).

There are other approaches but they generally result in the same kind of conflicts and consequent slow-downs. Or they limit the operations you can handle, for example by only allowing conflict-free operations. That's what CRDTs do. [1]

[1] https://hal.inria.fr/hal-00932836/file/CRDTs_SSS-2011.pdf

josephg · on Sept 7, 2021

> However, it won't support an ACID transaction model

I think you could add ACID support, while the process doing ordering still not caring about the data. You do something like this:

- Split the keyset into N buckets. Each bucket has an incrementing version number. (The first change is 1, then 2, then 3, and so on). The whole system has a vector clock with a known size (eg [3, 5, 1, 12, etc] with one version per bucket.)

- Each change specifies a validity vector clock - eg "this operation is valid if bucket 3 has version 100 and bucket 20 has version 330". This "validity clock" is configured on a replica, which is actually looking at the data itself. The change also specifies which buckets are updated if the txn is accepted.

- The primary machine only compares bucket IDs. Its job is just to receive validity vector clocks and make the decision of whether the corresponding write is accepted or rejected. If accepted, the set of "write buckets" have their versions incremented.

- If the change is rejected, the secondary waits to get the more recent bucket changes (something caused it to be rejected) and either retries the txn (if the keys actually didn't conflict) or fails the transaction back to the end user application.

So in your example:

> consider an application that looks at the current value of a counter and adds 1 if the counter is less than 100

So the write says "read key X, write key X". Key X is in bucket 12. The replica looks at its known bucket version and says "RW bucket 12 if bucket 12 has version 100". This change is sent to the primary, which compares bucket 12's version. In our case it rejects the txn because another replica had a concurrent write to another key in bucket 12. The replica receives the rejection message, checks if the concurrent change conflicts (it doesn't), then retries saying "RW bucket 12 if bucket 12 has version 101". This time the primary accepts the change, bumps its local counter and announces the change to all replicas (via fan-out).

The primary is just doing compare-and-set on a small known array of integers which fit in L1 cache, so it would be obscenely fast. The trick would be designing the rest of the system to keep replica retries down. And managing to merge the firehose of changes - but because atomicity is guaranteed you could shard pretty easily. And there's lots of ways to improve that anyway - like coalescing txns together on replicas, and so on.

hodgesrm · on Sept 13, 2021

I'm really sorry didn't see your comment earlier. I like your solution but am a little stuck on the limitations.

1.) It requires applications to specify the entire transaction in advance. ACID transactions allow you to begin a transaction, poke around, change something, and commit. You can derive the transaction by running it on a replica, then submitting to the primary, in which case this becomes an implementation of optimistic locking including transaction retries.

2.) As you pointed out the trick is to keep the array small. However, this only works if you have few conflicts. Jim Grey, Pat Helland, and friends pointed this out in 1996. [1] That in turn seems to imply a very large number of buckets for any non-trivial system, which seems like a contradiction to the conditions for high performance. In the limit you would have an ID for every key in the DBMS.

3.) Finally, what about failures? You'll still need a distributed log and leader election in case your fast machine dies. This implies coordination, once again slowing things down to the speed of establishing consensus on the network to commit log records.

Incidentally Galera uses a similar algorithm to what you propose. [2] There are definitely applications where this works.

[1] https://dsf.berkeley.edu/cs286/papers/dangers-sigmod1996.pdf

[2] https://galeracluster.com/library/documentation/certificatio...

brotherofsteel · on Sept 6, 2021

>There's a really interesting suggestion in The Mythical Man Month. He suggests that instead of hiring more programmers to work in parallel, maybe we should scale teams by keeping one person writing all the code but have a whole team supporting them.

Sounds like mob programming!

Nican · on Sept 5, 2021

There are two ways I see databases doing paxos. The basic way, like some databases like Percona, basically is a single raft across the whole database. It can help with high availability, but the database scale is still kind of constrained to the capability of a single writer.

What you really want is databases like CRDB/Yugabyte/TiDB, which are sharding+raft. Tables are sharded into 128MB chunks, and each chunk has their own raft. The database handles transactions, distributed queries, and auto-balancing transparently.

rad_gruchalski · on Sept 5, 2021

Definitely, YugabyteDB falls into the second category. Tables are split into so called tablets. This splitting can be controlled by choosing the hashing algorithm for the primary key (or transparent row key hash, if no primary key for a table exists). Each tablet is replicated and has its own raft group. Different tables can have different replication factors, the number of tablets to split the table into can be selected and modified.

YugabyteDB recently added support for table spaces on steroids where table spaces are allocated to physical nodes in the cluster. This enables geo-replication features where tables or rows can be placed within selected geo locations.

All the data shifting is done transparently by the database.

remram · on Sept 6, 2021

Doesn't CockroachDB work this way as well, with each "range" running a separate Raft? That is what I get from https://www.cockroachlabs.com/docs/stable/architecture/overv...

skyde · on Sept 5, 2021

All those noSQL or newSQL have more than one leader per table they have a distinct leader for each partition.

So if you have 6 server and 2 partition you could have 3 servers for partition number #1 and 3 different servers for partition number #2.

If you want extra performance you could make the server simply store key->value mapping using quorum write like Cassandra is doing but to keep data consistency you still have 2 choice

#1 use Optimistic concurrency (an app performing an update will verify if the data has changed since the app last read that data).

#2 using some kind of Lease, elect one machine to be the only one allowed to write to that partition for some time period.

Option #1 do not give faster transaction throughput but could offer lower tail latency.

Option #2 bring you back to square one of having a leader so you better just use (Paxos/Raf)

FpUser · on Sept 5, 2021

You are asking for a miracle or just simply - breach of the physics laws. The only way to horizontally scale write speed is to shard your data to be written somehow. You can easily do it but then your reading queries have to go to multiple servers to assemble single result. You can't scale both at the same time. There are some in between solutions like eventual read consistency that are relying on traffic at some point easing enough so that the conductor can actually synchronize servers. But if you have a steady stream of read/write requests the only way you really can scale is to tell amazon to sod off, buy yourself big fat multicore / multi CPU server with nice SSD array (preferably of Optane type) and watch your processing power suddenly shoot through the roof while the cost greatly decreases ;)

zozbot234 · on Sept 5, 2021

> There are some in between solutions like eventual read consistency that are relying on traffic at some point easing enough so that the conductor can actually synchronize servers.

You can use CRDT's to give a formally correct semantics to these "inconsistent" scenarios. And they might well be something that's best explored in a not-purely-relational model, since the way they work is pretty unique and hard to square with ordinary relational db's.

FpUser · on Sept 6, 2021

CRDT is just a fancy term for the strategy that still has to eventually merge data. In case of CRDT the data organized / designed in a way that makes it easier. The keyword here is "merging" which by definition kills "infinite" scalability.

You can dance around all you want but you just can't beat laws of nature.

Sure you can always design something that works for your particular case but generic solution is not possible.

mirekrusin · on Sept 6, 2021

Can't you arrange merging into ie. binary tree - in that setup you'd be collapsing merges into single one at the root and cummulative throughput at leaf nodes could be exponentially higher?

riku_iki · on Sept 5, 2021

data is also sharded in those databases

mirekrusin · on Sept 5, 2021

true, thanks for pointing out, but this is a bit cheating isn't it? ie. atomic transactions cross shards flip over, right? it's basically ergonomic equivalent of just using multiple databases?

eis · on Sept 5, 2021

Not 100% sure what you mean by flipping over (fail?) but at least in CRDB you can of course do cross shard transactions. They wont be as fast as a transaction that can be completely handled by a single leader but it works fine and is transparent to the client.

radicalbyte · on Sept 5, 2021

You move over into the world of distributed transactions, which can be really expensive.

Thankfully sharding works great for a large number of applications (or in other cases you can accept eventual consistency).

GeneralMayhem · on Sept 6, 2021

Cross-shard transactions will use some sort of two-phase commit protocol internally. They still work just fine (the extra complexity is transparent to the client), but with somewhat worse performance. Most of the difficulty/expertise involved in using such systems optimally is in designing your schema and access patterns such that you minimize the number of cross-shard transactions.

calmoo · on Sept 5, 2021

CAP theorem. Choose two, it's a fundamental limitation of scaling a database.

j16sdiz · on Sept 6, 2021

CAP theorem. You can't have guarantee for having all three all the time. But you can still have nice thing most of the time.

Even better, sometimes you can change which guarantee you need.

We can do better then "pick two"

pas · on Sept 5, 2021

yes, but no. because in reality what we really want is not true pure AP or CP (or CA, which you can't have anyway)

https://www.youtube.com/watch?v=hUd_9FENShA

(CAP is a very important result about a very strong assumption of consistency: linearizable events, but for that you can't lose any messages [if I remember it correctly], otherwise the system will become inconsistent)

calmoo · on Sept 6, 2021

Thanks for the video, very interesting.

ralusek · on Sept 6, 2021

I've always found CAP theorem to be better described as:

Given the need for Partitions, you must choose between prioritizing Consistency or Availability.

Thaxll · on Sept 5, 2021

Google Spanner does the three.

erik_seaberg · on Sept 6, 2021

> The purist answer is “no” because partitions can happen and in fact have happened at Google, and during some partitions, Spanner chooses C and forfeits A. It is technically a CP system.

— https://cloud.google.com/blog/products/databases/inside-clou...

someelephant · on Sept 5, 2021

Make sure you thoroughly test your multi-region deploys. Last time we tried the system was not stable when a region went down. Also beware of this anti pattern: https://www.cockroachlabs.com/docs/stable/topology-patterns....

bogomipz · on Sept 5, 2021

Interesting. So you followed the advice of deploying to greater than two regions and your DB didn't survive when you lost a region or you only deployed to two and got bit by the anti-pattern?

someelephant · on Sept 6, 2021

Even with >2 when one goes down CDB performed poorly.

alberth · on Sept 5, 2021

Sounds like Mnesia, which has existed for 20+ years.

https://erlang.org/doc/man/mnesia.html

ramchip · on Sept 5, 2021

Mnesia isn't SQL-compatible, can't scale horizontally (each node has a full copy of the DB + writes hit all nodes), needs external intervention to recover from netsplits, and expects a static cluster.

It's pretty much the opposite of what the parent described IMHO. It's meant more like a configuration store than a proper DB. That's how RabbitMQ uses it for instance, it stores metadata in Mnesia and messages in a separate store, but they're working on replacing Mnesia with their own more modern store based on Raft.

toast0 · on Sept 5, 2021

> Mnesia isn't SQL-compatible,

I think this is accurate. There are some references to a masters project to do SQL with mnesia, but afaik, it's not supported or used by anyone.

> can't scale horizontally (each node has a full copy of the DB + writes hit all nodes),

This isn't accurate, each table (or fragment, if you use mnesia_frag) can have a different node list; the schema table does need to be on all the nodes, but hopefully doesn't have a lot of updates (if it does need a lot of updates, probably figure something else out)

> needs external intervention to recover from netsplits,

This is true; but splits and joijs are hookable, so you can do whatever you think is right. I think the OTP/Ericsson team uses mnesia in a pretty limited setting of redundant nodes in the same chasis, so netsplit is pretty catastrophic and they don't handle it. Different teams and tables have different needs, but it would probably be nice to have good options to choose from. There's also no built-in concept of logging changes for a split node to replay later, which makes it harder to do the right thing, even if you only did quorum writes.

> and expects a static cluster.

It's certainly easier to use with a static cluster, especially since schema changes can be slow[1], but you can automate schema changes to match your cluster changes. Depending on how dynamic your cluster is, it might be appropriate to have a static config defined based on 'virtual' node names, so TableX is on db101, but db101 is a name that could be applied to different physical nodes depending on whatever (but hopefully only one node at a time or you'll have a lot of confusion).

[1] The schema changes themselves are generally not actually slow, especially if it's just saying which node gets a copy, ignoring the part where the node actually gets the copy which could take a while for large enough tables or slow enough networks; it's that mnesia won't change the schema until all of the nodes are not currently doing a 'table dump' where up to date tables are written to disk so table change logs on disk can be cleared and those dumps can take a while to complete and any pending schema operations need to wait.

rad_gruchalski · on Sept 5, 2021

That’s a huge oversimplification. Mnesia is distributed but that’s about it. There’s so much more to sql than selecting data. Transactions, user defined functions, stored procedures, custom types…

alberth · on Sept 5, 2021

None of what you detail though is included in what the GP describes as a modern database.

I’m not trying to nitpick but what GP describes aligns to what Mnesia provides.

rad_gruchalski · on Sept 5, 2021

The GP mentioned CockroachDB though. There’s a lot of implicit functionality behind that name. Even just behind mentioning Postgres.

The distributed part of modern sql is fairly recent.

rkarthik007 · on Sept 5, 2021

When building YugabyteDB, we reuse the "upper half" of PostgreSQL just like Amazon Aurora PostgreSQL and hence support most of the functionality in PG (including advanced ones like triggers, stored procedures, full suite of indexes like partial/expression/function indexes, extensions, etc).

We think a lot about this exact question. Here are some of the things YugabyteDB can do as a "modern database" that a PostgreSQL/MySQL cannot (or will struggle to):

* High availability with resilience / zero data loss on failures and upgrades. This is because of the inherent architecture, whereas with traditional leader-follower replication you could lose data and with solutions like Patroni, you can lose availability / optimal utilization of the cluster resources.

* Scaling the database. This includes scaling transactions, connections and data sets *without* complicating the app (like having to read from replicas some times and from the primary other times depending on the query). Scaling connections is also important for lambdas/functions style apps in the cloud, as they could all try to connect to the DB in a short burst.

* Replicating data across regions. Use cases like geo-partitioning, multi-region sync replication to tolerate a region failure without compromising ACID properties. Some folks think this is far fetched - its not. Examples: the recent fire on an OVH datacenter and the Texas snowstorm both caused regional outages.

* Built-in async replication. Typically, async replication of data is "external" to DBs like PG and MySQL. In YugabyteDB, since replication is a first-class feature, it is supported out of the box.

* Follower reads / reads from nearest region with programmatic bounds. So read stale data for a particular query from the local region if the data is no more than x seconds old.

* We recently enhanced the JDBC driver to be cluster aware, eliminating the need to maintain an external load balancer because each node of the cluster is "aware" of the other nodes at all times - including node failures / add / remove / etc.

* Finally, we give users control over how data is distributed across nodes - for example, do you want to preserve ASC/DESC ordering of the PKs or use a HASH based distribution of data.

There are a few others, but this should give an idea.

(Disclosure: I am the cto/co-founder of Yugabyte)

masa331 · on Sept 5, 2021

This title kind of implies that PG or MySQL aren't modern or modern enough which i think is is very wrong. Look what they bring in every update. I think they are quite modern!

Jemaclus · on Sept 6, 2021

Not sure if you knew this or not, but Phil Eaton (the OP) writes databases for fun[1]. This post isn't saying Postgres and MySQL aren't modern, but is more trying to get ideas for what to play around with next.

[1] https://notes.eatonphil.com/database-basics.html

biztos · on Sept 5, 2021

Postgres is very modern and MySQL is also... em... also modern in a different sense. ;-)

I understood the question as:

> What can those newer database systems that are not beholden to RDBMS conventions do for us that Postgres and MySQL (and Oracle and SQL Server and DB2) can not?

CRConrad · on Sept 6, 2021

And the answer is: Not all that much.

kixiQu · on Sept 5, 2021

The title seems fair to me. Anything that's actually used has certain commitments it made earlier in its lifecycle from which it now can't deviate, even if later developments made the commitments problematic.

tmh88j · on Sept 5, 2021

>Anything that's actually used has certain commitments it made earlier in its lifecycle from which it now can't deviate

Perhaps I'm misunderstanding your comment, but MySQL has definitely deprecated and removed features over the years. https://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html

kixiQu · on Sept 6, 2021

Sure. There are probably also things they wish they could deprecate, but realistically can't. User expectations they wish they could replace and can't. I think that's the charitable interpretation of things asked for.

jffry · on Sept 5, 2021

I think it's unfair. "Modern" is in no way equivalent to what the thread is really asking, which is what could you do if you built a new database today with all of the knowledge but none of the existing commitments of a large mainstream database.

Retric · on Sept 5, 2021

It’s fair because these databases made trade offs for older ideas presumably for good reasons. The question isn’t what could the have done differently up to now, but rather what new ideas might designers have included if they existed back then.

jffry · on Sept 5, 2021

Modern software development practices and knowledge are orthogonal to feature set / compatibility guarantees. The title asks about the former and the text asks about the latter, and I don't agree it's correct to conflate the two

bawolff · on Sept 6, 2021

Although most of the responses have been for ideas that did exist back then, but have significant trade-offs, both then and now. They're mostly just different approaches to db, not something fundamentally "modern".

mcguire · on Sept 5, 2021

The question is fair if it applies to database systems that aren't actually used?

kixiQu · on Sept 6, 2021

Right, just like how any new system starts off. It's asking about how one might greenfield differently with the OLTP knowledge we have today.

randomdata · on Sept 5, 2021

Postgres was more modern when it used QUEL. SQL was a big a step backwards technically (although beneficial from a marketing standpoint).

lifepillar · on Sept 5, 2021

One thing PostgreSQL would likely not be able to adapt to, at least without significant effort, is dropping MVCC in favor of more traditional locking protocols.

While MVCC is fashionable nowadays, and more or less every platform offers it at least as an option, my experience, and also opinions I have heard from people using SQL Server and similar platforms professionally, is that for true OLTP at least, good ol’ locking-based protocols in practice outperform MVCC-based protocols (when transactions are well programmed).

The “inconvenient truth” [0] that maintaining multiple versions of records badly affects performance might in the future make MVCC less appealing. There’s ongoing research, such as [0], to improve things, but it’s not clear to me at this point that MVCC is a winning idea.

[0] https://dl.acm.org/doi/10.1145/3448016.3452783

zmmmmm · on Sept 5, 2021

I sort of want the opposite. Except for extremely high velocity mutable data, why do we ever drop an old version of any record? I want the whole database to look more like git commits - completely immutable, versionable, every change attributable to a specific commit, connection, client, user.

So much complexity and stress and work at the moment comes from the fear of data loss or corruption. Schema updates, migrations, backups, all the distributed computing stuff where every node has to assume every other node could have mutated the data .... And then there are countless applications full of "history" type tables to reinstate audit trails for the mutable data. It's kind of ridiculous when you think about it.

It all made sense when storage was super expensive but these days all the counter measures we have to implement to deal with mutable state are far more expensive than just using more disk space.

singron · on Sept 6, 2021

If the old versions of records stay where they are, they will start to dominate heap pages and lead to a kind of heap fragmentation. If the records are still indexed, then they will create an enormous index bloat. Both of these will make caches less effective and either require more RAM or IOPS, both of which are scarce in a relational db.

You probably need a drastically different strategy, like moving old records to separate cold storage instead (assuming you might ocassionally want to query it. Otherwise you can just retain your WAL files forever).

zmmmmm · on Sept 6, 2021

Absolutely ... it needs to be designed from the ground up - why I think it fits the question of something a modern database could do that Postgres would struggle with.

5e92cb50239222b · on Sept 6, 2021

> moving old records to separate cold storage

FWIW this is available in ClickHouse (which is an analytics database, though)

https://clickhouse.tech/docs/en/engines/table-engines/merget...

zachmu · on Sept 7, 2021

You should check out dolt, does exactly what you're describing, and is a drop-in MySQL replacement:

https://github.com/dolthub/dolt

throwawayboise · on Sept 5, 2021

The downside of "good ol' locking" is that you can end up with more fights and possibly deadlocks over who gets access and who has to wait.

With Postgres/Oracle MVCC model, readers don't block writers and writers don't block readers.

It's true that an awareness of the data concurrency model, whatever it is, is essential for developers to be able to write transactions that work as they intended.

lifepillar · on Sept 7, 2021

It’s not that conflicts magically disappear if you use MVCC. In some cases, PostgreSQL has to rollback transactions whereas a 2PL-based system would schedule the same transactions just fine. Often, those failures are reported ad “serialization errors”, but the practical result is the same as if a deadlock had occurred. And Postgres deadlocks as well.

atombender · on Sept 5, 2021

This is already being developed. zheap [1] is based on the undo/redo log system that databases such as Oracle use, and will be one of the options once Postgres supports pluggable storage backends.

[1] https://www.cybertec-postgresql.com/en/postgresql-zheap-curr...

ww520 · on Sept 5, 2021

I don't know. Lock-based concurrency control has problem scaling up concurrent access among readers and writers for read consistency. Of course Oracle with MVCC still beats everybody performance wise.

natmaka · on Sept 6, 2021

Does "more traditional locking" mean that the clients request locks? Isn't it already offered by PG with table-level and row-level locks, and also in an extremely flexible way (letting the client developer define the semantics) by pg_advisory_lock?

https://www.postgresql.org/docs/current/explicit-locking.htm...

Andys · on Sept 5, 2021

Having time travel capabilities (eg. cockroachdb) is a really useful side effect of MVCC though. Postgres once had this capability. Need for garbage collection/vacuuming is a downside.

I think it all depends on the pattern of reads, writes, and the types of writes. Mysqls innodb is often faster than postgres but under some usage patterns suffers from significant lock contention. (I have found it gets worse as you add more indexes)

jasonhansel · on Sept 5, 2021

Personally I wish Postgres would add support for optimistic concurrency control (for both row-level and "predicate" locks), which can be a big win for workloads with high throughput and low contention.

flowerlad · on Sept 6, 2021

> good ol’ locking-based protocols in practice outperform MVCC-based protocols

Doesn't make sense to me. Oracle supports MVCC. SQL Server doesn't scale as well as Oracle.

truenindb · on Sept 5, 2021

There is absolutely no reason at all that using a Model View Controller architecture should say anything about your persistence layer. Model View Controller is an abstraction for managing code that generates things that are put on a screen. It says that you should have code that represents underlying data separate from code that represents different ways to show the data to the user and also separate from code that represents different ways for the user to specify how they would like to view or modify the data.

The Model portion of MVC should entirely encapsulate whether you are using a relational database or a fucking abacus to store your state. Obviously serializing and deserializing to an Abacus will negatively impact user experience, but theoretically it may be a more reliable data store than something named after a Czech writer famous for his portrayl of Hofbureaucralypse Now .

nextaccountic · on Sept 5, 2021

What's being discussed is MVCC, or multiversion concurrency control

https://en.wikipedia.org/wiki/Multiversion_concurrency_contr...

estebarb · on Sept 5, 2021

Materialize (https://materialize.com/) is capable of performing SQL operations over a stream, using incremental calculation based on differential dataflow.

CockroachDB is a distributed SQL database, with strong consistency. I think that Yugabyte is similar.

Support for realtime changes, including queries over those changes, is better in other databases, like RethinkDB, OVSDB or Firebase.

Relational is not always the best way to store your data. Sometimes a graph model is better. Or for full text search something specialized is better. In that sense dgraph or elastic search may be better OLTP databases in some cases.

Columnar storage, like used in Vertical or BigQuery have several advantages in processing and data compression. But implementing it in PG or MySQL I think that would require almost a full rewrite.

strken · on Sept 6, 2021

I really want to do streaming queries over updates and inserts in a normal SQL database. Imagine writing

    SUBSCRIBE SELECT customers.id, customers.name 
    FROM EVENTS(customer_purchases) AS cpe
    LEFT OUTER JOIN customers ON cpe.row.customer_id = customers.id
    WHERE cpe.row.product_id = '123abc' AND cpe.type IN ('update', 'insert')

in postgres itself and just getting every incoming purchase for a particular product.

splike · on Sept 6, 2021

Someone with more in depth knowledge can tell me why this isn't a good idea, but I'm pretty sure you can get something very close to this with graphql subscription query against hasura backed with postgres

hchasestevens · on Sept 6, 2021

https://ksqldb.io/ is great for this, assuming you're already using Kafka.

gigatexal · on Sept 6, 2021

I wonder if Materalize gets you close?

strken · on Sept 6, 2021

Materialize excites me, but probably can't be used for the current problems I'm working on, because of the operational overhead we'd be putting on our clients. Streaming SQL queries inside an existing well-understood database would be the killer feature.

zinclozenge · on Sept 5, 2021

1. Automatic backups to an S3 compatible storage, out of the box.

2. Progressive automatic scalability. As load increases or storage runs out, the DB should be able to automatically. NewSQL databases do this already.

3. Tiered storage.

4. Support streaming, stream processing, in memory data structures, etc. I feel like this is one of those weird things but I keep wishing this were possible when I work on side projects or startups. I don't want to have to spin up mysql/postgres, kafka/pulsar, flink/whatever/stream processing, redis, etc separately because that would be prohibitively expense when you're just getting off the ground unless you have VC money. So I find myself wishing I could deploy something that would do all of those things, that could also scale somewhat until the need to break everything into their own infrastructure, and if it was wire compatible with popular projects then that would be perfect. Will it happen? I doubt it, but it would be lovely assuming it worked well.

eitland · on Sept 6, 2021

> 3. Tiered storage.

Do you mean tablespaces? https://www.postgresql.org/docs/10/manage-ag-tablespaces.htm...

They'll allow you to have some parts on database on faster devices for instance.

zinclozenge · on Sept 6, 2021

That does seem neat, but I was actually thinking more like off-loading cold data to S3, or some such. Maybe even having distinctions between (ultra) hot, warm, cold data.

vaughan · on Sept 6, 2021

Very cool - I had no idea this existed. I wonder if this could be useful for using ramdisks when working with temp tables.

eitland · on Sept 6, 2021

Very possible.

I learned about the possibility a good decade ago when working on Oracle. For high performance use we used to have two areas:

- one small SSD-backed area where new inserts happened (Server class SSD was new and relatively expensive back then.)

- one (or more) areas for permanent storage, based on ordinary disks with good storage/price

Then we had processes that processed data from the SSD Pool and wrote them to long term storage asynchronously.

Edit: I should say I was very happy to find the same feature on Postgres, Oracle is a real hassle in more than one way.

jatone · on Sept 7, 2021

yes, ramdisks are great. =) I've used that approach to replace redis with a faster version on postgres. =)

vaughan · on Sept 6, 2021

> 4. Support streaming

I feel the same. Streaming is so lacking in most dbs today. RethinkDB's `.changes()` was really cool. I wonder if PSQL will eventually do it, or whether a new DB will take over. Everyone went to Mongo then ran back to PSQL, but maybe after lessons-learned there is room for a new db optimized for in-memory usage and with great first-class streaming support.

jd_mongodb · on Sept 6, 2021

MongoDB has supported a streaming API since version 3.6

https://docs.mongodb.com/manual/changeStreams/

vaughan · on Sept 6, 2021

But not relational...

tlack · on Sept 5, 2021

Great list. For #4, you should take a look at Erlang or Elixir - they have "just enough" concurrency with streaming primitives and a functional style that makes it easy to use. They make a lot of "bolt on" stuff you need for a regular startup (Redis, Celery, etc) superfluous.

dkh · on Sept 6, 2021

Could you briefly elaborate on this? Are you suggesting the right structures within Elixir/Erlang are both concurrent and safe enough to negate the need for these things at some level? (And are you referring to things like OTP, or more general than that?)

EDIT: For context, I'm familiar with the concurrency model and with how fairly bulletproof "processes" are in their context, but had never considered putting these to use in lieu of a Redis cache or certain other datastore use-cases. (My brief foray into Elixir was, however, when looking to improve reliability of a high-volume messaging system and various task queues attached to it, so that use-case I am at least aware of)

tlack · on Sept 6, 2021

I wouldn't say Elixir is highly concurrent in the sense that you'll get something like a CRDT out of the box. You'll still have to design all of your distribution logic.

I'm saying that the tools provided in that toolbox - like processes, OTP and ETS - let you build in-memory or disk backed structures that live inside the concern of the main application and are more responsive to your specific needs.

For instance, let's say you need a scoreboard that persists in between page refreshes.

With Node.js, you're left in a "hmm - not sure" space where you need to cobble together all the different pieces. At the end of the day you'll be facing cache consistency issues, problems saving state, abstraction leaks and communication overhead.

In Elixir, it's trivial to use pubsub to collect events, a GenServer to store that state or act as a broker, and ETS to perform simple queries against it. It's like you're still given just a box of tools, but all of the tools work together better for dynamic, live, complex applications.

Your mileage may vary! Let me know if I can help.

zinclozenge · on Sept 6, 2021

I am actually super familiar with Erlang and Elixir and I agree with your assessment, but with the caveat that if you go that route you end up with your own hand-rolled solution that won't be wire/API compatible with the most used OS projects. Or in the case of Erlang/Elixir you'll be running it in process! But it is great for that reason.

rc_mob · on Sept 5, 2021

i mean it took me just a couple hours to write script to backup postgres db to s3. not a lot of work

musingsole · on Sept 6, 2021

And it's work that gets replicated again and again

jake_morrison · on Sept 5, 2021

VoltDB is a good example of rethinking relational databases for for modern technology. Traditional databases assume that everything is stored on disk, with a bit of RAM available for cacheing. VoltDB assumes that everything is stored in RAM first, using checkpointing to disk and replication to get durability.

https://www.usenix.org/legacy/events/lisa11/tech/slides/ston...

Have a look at Michael Stonebraker, he's a database genius, and keeps starting "NewSQL" companies which use the relational model, but specialized for different applications, e.g. column stores.

https://en.wikipedia.org/wiki/Michael_Stonebraker

winrid · on Sept 5, 2021

Neat. Storing everything in ram and checkpointing to disk is how a lot of game servers work, too. :)

hestefisk · on Sept 6, 2021

Volt is indeed very cool. Michael Stonebraker wrote part of the original Ingres code, didn’t he?

thom · on Sept 5, 2021

I realise I'm straying a bit from core OLTP stuff but also I think removing the historical need for a separate OLAP database is something modern systems should address. Off the top of my head:

1) Incremental materialized view maintenance, à la Materialize (bonus points for supporting even gnarly bits of SQL like window functions).

2) Really ergonomic and scalable pub/sub of some sort, à la RethinkDB.

3) Fine tuned control over query plans if I want it.

4) Probably very deep Apache Arrow integration.

JohnBooty · on Sept 5, 2021

    3) Fine tuned control over query plans if I want it.

I feel like when most people say this, what they really want is a better query planner.

The optimum query plan depends on a lot of dynamically changing factors: system load, free RAM, and of course the data in the tables themselves. Any hints we give the query planner are going to help at certain times and be somewhere between "suboptimial" and "disasterous" at most other times.

It's certainly true that the query planners in major RDBMS could be better or, at least, give insight into why they made the choices they did.

It would be cool if EXPLAIN ANALYZE also perhaps showed other query plans the planner considered but discarded, and why. Imagine if the planner tried multiple plans and adjusted itself accordingly.

For Postgres in particular, I think the user-specified costs in pg.conf like `seq_page_cost` and `random_page_cost` feel like one obvious area for improvement: why am I guessing at these values? Postgres should be determining and adjusting these costs on the fly. But, I could be wrong.

hamburglar · on Sept 5, 2021

> The optimum query plan depends on a lot of dynamically changing factors

Except … I’m part of a large organization with a very thorough incident post-mortem process and I’ve read a lot of analyses that end up blamed on the dreaded “query plan flip.” This ends up being an unplanned change that has an unexpected perf cost (and no real “rollback”) and causes an outage. The lesson I’ve seen learned over and over is to have very good understanding (and constant re-evaluation) of your hot queries’ perf , and to lock the query plan so it can only change when you intend it to.

magicalhippo · on Sept 6, 2021

Yeah we've had to add some weekly forced statistics recalculation on certain key tables just to prevent the large customers from going down due to indexer suddenly not wanting to use an index.

Hasn't happened often, but the few occasions have of course been at the worst possible time.

tpetry · on Sept 6, 2021

> It would be cool if EXPLAIN ANALYZE also perhaps showed other query plans the planner considered but discarded, and why. Imagine if the planner tried multiple plans and adjusted itself accordingly.

The interesting point is in that regard MySQL is really better than PostgreSQL. MySQL can give you more execution plans it evaluated and tell you why it didn't use them (cost value is higher) and even tell you why it didn't use a specific index it. Combined with the enforcing specific indexes you can sometimes trick it into a more efficient query plan even if it believed to be worse.

But to be honest, PostgreSQL query planner is a lot more intelligent and does stupid things very seldom. And the ability to instruct PostgreSQL collecting more statistics on some attributes (https://www.postgresql.org/docs/13/sql-createstatistics.html) is a huge improvement to getting PostgreSQL make more intelligent plans.

What i really miss in PostgreSQL is:

* Seeing other query plans it discarded as you described to get a feeling how to hint PostgreSQL into a direction if the query planner is doing stupid things

* MySQLs query profile showing the execution time of multiple subtasks of the query: https://dev.mysql.com/doc/refman/8.0/en/show-profile.html

* The ability to enforce specific query plans. The PostgreSQL devs stated they are against this but pg_hint_plan is really usefull and i was able to drastically improve some very complex queries.

thom · on Sept 5, 2021

It's true a better query planner is what I want, i.e. one that always does what I want it to do. That not being the case, I will settle for being in control, footguns and all.

I agree the config is hard to wrangle though, you effectively find yourself doing grid search with a bunch of common workloads, it does feel like something the machine should be doing for me (the most common config variable we end up tweaking is "how much money we give Amazon").

taffer · on Sept 5, 2021

Another interesting approach is HyPer[1]. HyPer uses many new techniques to combine OLTP and OLAP in one database. For example, to achieve good OLAP performance, a columnar storage layout is used, but the columns are chunked for locality to achieve good OLTP performance at the same time. OLTP queries are executed in memory, but cold data that is not used for OLTP is automatically compressed and moved to secondary storage for OLAP.

[1] https://hyper-db.de/

gigatexal · on Sept 6, 2021

Not much one can do with Hyper as it sold the commercial license to Tableau and I can’t find any mention of an OSS version to play with anywhere on the site.

nicoburns · on Sept 5, 2021

Subscriptions. Databases like Firebase will automatically push changes to query results down to clients. You can add this to Postgres with tools like Hasura, but it's poll based and not very efficient. It's a super-useful feature for keeping UIs in sync with database state.

alerighi · on Sept 6, 2021

Why can't you do that with triggers?

By the way subscriptions seems a good idea in theory, in practice polling is often the best solution. There are not a lot of applications where you need the data that real time (I mean with a latency that is less of a couple of seconds), and for that applications you should build something custom.

Subscriptions to work are based on websockets that have its problems, and also requires a constant connection to the server. The application then needs to handle the incoming data properly, another thing that is not entirely obvious.

Another thing is that most of the times you don't have a 1:1 mapping between the database schema and the API (and if you do, you shouldn't, since a change of the internal database representation will require a change of the API and will break clients). You have in practice an application server in between. Well is that application server that can send subscriptions to clients. And it can do that without subscribing to updates on database tables (assuming that the tables are only changed by the application itself, that seems reasonable). The application when updates the data can publish it to whatever subscription system it wants to notify clients.

vaughan · on Sept 6, 2021

> and for that applications you should build something custom.

Given the spirit of the original post, I think avoiding "build something custom" is the point.

It should be a toggle in the database.

> you don't have a 1:1 mapping between the database schema and the API (and if you do, you shouldn't, since a change of the internal database representation will require a change of the API and will break clients)

I'd argue this is a huge problem. When DB and API get out of sync it creates so much complexity, especially when working in a relational model, and things become impossible to change, or make client caching really difficult.

> The application when updates the data can publish it to whatever subscription system it wants to notify clients.

This is so much complexity though, compared with the experience of Firebase. Good for keeping more devs employed though.

alerighi · on Sept 7, 2021

> This is so much complexity though, compared with the experience of Firebase. Good for keeping more devs employed though.

The solution of Google is not less complex. Rather the complexity is hidden, that if all goes well is good, when things start to not work as expected it means weeks spent in debugging and trying to work around the issue. And we are not even talking about the possibility that Google changes API or closes down the service entirely, or it makes it more expensive, the so called vendor lock-in.

At the other side a custom solution takes more time to develop initially, but then is entirely under your control, if something doesn't work you know how to fix it because you built it, you are not bounded to a particular platform, and there is not the possibility that Google decides to change the API in an incompatible way and you have to do extra work just to make things work as they did before, you decide when and if to update the software.

kiwicopple · on Sept 6, 2021

We built a real-time server to solve this using Postgres Write Ahead Log: https://github.com/supabase/realtime. We also have some neat stuff coming with integrated Row Level Security.

Disclosure (Supabase cofounder)

pornel · on Sept 5, 2021

Postgres has LISTEN and NOTIFY. People build DIY pub-sub with this.

nicoburns · on Sept 6, 2021

Sure, but it's a very manual process. I want to be able to write an artbitrary SQL query (or subset of SQL query), and have the subscription aspect "just work".

acjohnson55 · on Sept 6, 2021

Materialize.io does something like this, although in a separate system from Postgres, and I'm not sure it's possible to push live queries to a client.

gigatexal · on Sept 6, 2021

Alas RethinkDB was supposed to do this but the project imploded. It’s been OSSed I think though.

vaughan · on Sept 6, 2021

I feel like Rethink would have done a lot better if it were started closer to today. Funnily enough, in 2009 it was originally an SSD-optimized storage engine for MySQL.

The dev experience and API were great. It was the "relational mongodb".

Though no one feels daring enough to part from PSQL as a backend these days. With all the cloud deployment options and optimizations in PSQL, it's hard to justify writing your own db engine, but for a great real-time experience, that's what needs to happen.

dkh · on Sept 6, 2021

Yes, and as a happy early adopter with many projects running off it, I was very happy about this. However the pace of development since then has been absolutely glacial and seemingly without direction. It's truly a shame.

rishav_sharan · on Sept 6, 2021

Have you tried Postgres Notify? Its the native event subscription.

https://www.postgresql.org/docs/current/sql-notify.html

XCSme · on Sept 5, 2021

Can't you add something like this to MySQL using triggers or some similar system?

spiffytech · on Sept 5, 2021

Hasura evaluated Postgres' listen/notify feature to power their subscriptions, but chose polling instead:

https://github.com/hasura/graphql-engine/blob/master/archite...

> Listen/Notify: Requires instrumenting all tables with triggers, events consumed by consumer (the web-server) might be dropped in case of the consumer restarting or a network disruption.

It was substantially non-trivial for them to implement subscriptions that were both robust and efficient using this approach.

Many applications need the extra step on top of listen/notify of relaying subscriptions to an untrusted client (e.g., a browser). I'd like to see more DBs bake that feature in, like RethinkDB did.

nicoburns · on Sept 5, 2021

The Superbase (https://github.com/supabase/realtime) approach is really interesting. It listens to the logical replication stream. Makes a lot of sense to me. Unfortunately our postgres instances hosted on heroku don't expose this, so I've been unable to try it out.

vaughan · on Sept 6, 2021

Hasura discusses why they chose polling here (https://github.com/hasura/graphql-engine/blob/master/archite...).

> WAL: Reliable stream, but LR slots are expensive which makes horizontal scaling hard, and are often not available on managed database vendors. Heavy write loads can pollute the WAL and will need throttling at the application layer.

Would like to hear Supabase's response to this.

kiwicopple · on Sept 6, 2021

Hasura's approach is great and there are just tradeoffs between approaches.

Sure, slots (can be) expensive; GCP doesn't offer slots but all the others do as far as I know (and I see that as a GCP issue, not a Supabase issue); a lot of writes can definitely fill the WAL.

Stepping back, the main comment I can give is that we each approach problems with a slightly different philosophy. Hasura tackles problems using middleware and Supabase tackles problems using the database. Both are fine.

Our approach is important (to us) as we evolve the product. For example, we are adding Row Level Security to our Realtime instance. Because it all sits in the database it's essentially just a Postgres extension, minimizing chatter between a middleware server and the database. Also because it's at the "bottom of the stack", other vendors/integrations can make use of the functionality we provide.

(supabase cofounder)

vlasky · on Sept 6, 2021

I'm the defacto maintainer of the Meteor MySQL integration and the Node.js package mysql-live-select.

These implement pub/sub and reactive queries using the MySQL binary log as the event source:

https://github.com/vlasky/meteor-mysql

https://github.com/vlasky/mysql-live-select

reilly3000 · on Sept 5, 2021

At least speaking for Postgres, it _is_ modern in that it’s very actively developed with all kinds of innovations layered on top of the core system. It can be a timeseries db, a real-time db, horizontally sharded, a self-contained REST/graphql API server, a graph db, an interface to many APIs via foreign data wrappers, and much more. In itself it has many analytics functions, and most cloud OLAP dbs use its syntax over a columnar storage scheme. I’m afraid most would-be database projects would be cloned by a Postgres extension before they could make a dent in its share.

anaganisk · on Sept 6, 2021

My opinion is Postgres is Jack of many trades, but master of none. I would be more afraid of a DB that tries to be everything. Every feature adds a baggage.

reilly3000 · on Sept 6, 2021

It is a master of plain old rbdms, but able to accomplish more with specific extensions. Sure, it probably makes more sense to use Elastic than Postgres for full-text search, but there is a solid extension for it. In a lot of operational contexts it’s easier to host another Postgres instance with a specialized extension than to try deploy an entirely new DB.

anaganisk · on Sept 6, 2021

I agree with you, when you say its a master of RDBMS. But each extension brings its own baggage. Specialised databases are written for a reason. For majority of the people the extensions may fit . But its not the best way once you want to optimise.

OrvalWintermute · on Sept 6, 2021

Postgresql can do everything I need, but not everything I want, and apologies to mysql fans, but I refuse to touch it.

And, I think over the years they have been such an awesome community that has helped me irregularly when I popped into IRC every year or two for a problem I was having. https://github.com/davidfetter ; https://github.com/RhodiumToad ; https://github.com/bmomjian - you are great peeps!

What I would like to see are more security usability enhancements over and above what you find in https://www.crunchydata.com/products/hardened-postgres/

Security Enhancements List:

Attack Resistance

Better protections for the DBMS itself in assume compromised scenarios, or malicious users

Customized implementations per compliance/security regime

Accelerators for various security configurations

Self-audit, self-secure

Functionality Enhancements List:

Continuous Sync Postgres to other serious DB (not a one-way sync, not a DB move)

aserafini · on Sept 5, 2021

One thing I find really interesting from Redis that I would love in a relational database is the concept of ‘blocking’ queries, which block until they get a result.

For example: https://redis.io/commands/BLPOP

If I could do something like:

    Select id, msg
    From job
    Where id > 1224

And have that query block until there actually was a job with an id > 1224, it would open up some interesting use cases.

natmaka · on Sept 6, 2021

I would love to be able to "nice" queries, a high "nice" attribute meaning, for the engine, "work on this whenever it does not slow down anything else", especially on the I/O side (à la Linux "ionice").

A "niced" query blocks (and its working set may even be "swapped out") as long as: - the read buffer does not contain anything pertinent for it, - there are "too much" pending I/O requests (<=> submitting an I/O request useful for this query would slow down other less-niced queries)

The "niceness" of a query may be dynamically modified by an authorized user.

Bonus: - any role is associated to a minimal level of niceness (GRANT'able) - the underlying logic interacts with the OS corresponding logic in order to take into account all I/Os (this is especially useful if PG runs on a non-dedicated server)

eitland · on Sept 6, 2021

Not exactly what you are asking for but worth mentioning anyway: https://wiki.postgresql.org/wiki/Priorities

bigblind · on Sept 5, 2021

It's always been insane to me that databases aren't better at migrations. There are so many blog posts out there on how to do zero downtime migrations, and I feel like a lot more of that could probably be abstracted away.

vaughan · on Sept 6, 2021

There are so many times I wish a blog post was an app that just had a nice gui to do guide me through the steps they suggest, instead of manually modifying config files and running bash scripts. SQL GUI tooling is lacking.

zamalek · on Sept 5, 2021

Write-expensive, read-cheap[1]: the exact opposite of the mentioned.

Once your developers have completed an iteration, your DB will see the same queries over and over again (if it doesn't, then it should be an OLAP aggregate). These databases optimize for writes, and defer complexity to reads and, considering that you could see millions more reads than writes, makes no sense whatsoever.

[1]: https://github.com/mit-pdos/noria

klysm · on Sept 5, 2021

I think automatically maintaining materialized views isn’t well supported for a lot of reasons. But I also don’t think that it’s impossible for something like Postgres to support it more in the future.

vaughan · on Sept 6, 2021

https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...

reilly3000 · on Sept 5, 2021

PipelineDB!

cafxx · on Sept 6, 2021

- Blurring (safely!) the line between database and the app using it: transparently switch between bringing data to compute, or compute to data.

- Comprehensive auto tuning: automatic index creation, automatic schema tuning, dynamically switching between column/row-oriented, etc. User specifies SLOs, database does the rest.

- Deeply related to the previous two points: perfect horizontal scalability

- Configurable per-query ACID properties (e.g. delayed indexing)

- All of the above while maintaining complex SQL features like arbitrary joins, large transactions, triggers, etc.

Sure, some of these are in some form in some existing databases. But none offer all of them.

vaughan · on Sept 6, 2021

> Blurring (safely!) the line between database and the app using it

What do you mean by this?

> Comprehensive auto tuning...automatic schema tuning

You should be able to maintain a logical database schema, and then flip a toggle for things you want to have denormalized (and eventually have the db just do it automatically). Or maybe even take a blob of unstructured data and automatically normalize it.