There is one thing I'm curious about and never found a good answer: why there is no VACUUM equivalent in MySQL or MS SQL? I guess VACUUM is necessary because of the way Postgres stores data? How do those other RDBMS clean up indexes?
When you update a row in a SQL database, you need to keep the old row around until the transaction is committed.
In Postgres, a new row is written, and both the new and old coexist in the table. Even after the transaction is committed, the old row stays in the table. VACUUM scans the table, finds rows that are no longer needed, and marks them as okay to overwrite.
In MySQL, the row is overwritten in the table immediately, and then the old row is temporarily written to a separate area (the UNDO Segment), so the old record is available if needed. Since old rows don't accumulate in the main table, there's no need for VACUUM.
Your point about MyRocks is a good one. It's also true that MyRocks takes a similar approach to Postgres when it comes to implementing the higher isolation levels. It wouldn't make sense to add InnoDB style next-key locks to MyRocks, either.
Systems that closely adhere to traditional 2PL designs such as InnoDB or DB2 end up with tight coupling between recovery, concurrency control, and storage. This has many consequences, both good and bad. It's no coincidence that Postgres can support quite a variety of index access methods, including support for transactional full text search. You get the transactional stuff "as standard" with the Postgres approach to versioned storage. There is no need to bake concurrency control into each and every index access method.
Thank you for your answer, this makes sense! I guess that there must be some tradeoff, right? What I mean is that while in MySQL you don't need to vacuum, perhaps updates would be slightly slower (due to the need to write into the UNDO segment)?
Recovery in Postgres is simplified and sped up considerably by the approach taken to MVCC. Same is true with transaction rollback. SQL Server users (on recent SQL Server versions) can optionally get a similar trade-off by enabling constant time recovery (CTR), which is called Accelerated Database recovery (ADR) in the MS marketing material. See the paper:
SQL Server has something called "background cleanup" when this feature is enabled, which takes place in a separate process called "the cleanup process" - just like VACUUM. The paper itself explicitly acknowledges the similarity. All of the advantages the paper claims for CTR/ADR are also advantages for Postgres.
It's also true that both traditional SQL Server (SS with ADR/CTR disabled) and MySQL have background cleanup processes of their own for indexes, but that is a little different. It's about asynchronously reclaiming space for index entries that were already delete-marked.
Conceptually you still need to clean up the old row versions, regardless of where they're located. InnoDB has purge threads to clean up undo log entries that are older than the oldest active transaction. If too many of them accumulate, such as due to a long-running transaction preventing purge progress, it increasingly harms performance. This can be a major problem in MySQL for workloads which mix a high-volume of quick UPDATE and/or DELETE operations concurrently with other slower/longer transactions.
I think this might provide more context which you are looking for (and your suspicion seems spot on!):
> This means more writes to do on updates and makes access to old row versions quite a lot slower, but gets rid of the need for asynchronous vacuum and means you don't have table bloat issues. Instead you can have huge rollback segments or run out of space for rollback.
“In MySQL, the row is overwritten in the table immediately, and then the old row is temporarily written to a separate area (the UNDO Segment)”
I would hope and expect MySQL does that in reverse order, as “yeah, we have that row in memory” isn’t a good answer to “are you sure you can roll back that change if needed?”
It would be great if we stop adding features now. Are there any pieces of software that has started doing this? Why are we perpetually in this loop of feature creep?
Like a painter, you must know when to stop painting. This is a real dilemma all painters face, especially when dealing with watercolor. If anyone has tried painting here, you know exactly the problem - how do you know you're done? It applies to writing and music as well, but it is more evident in visual arts than any other field.
I wish there was a Postgres branch that took previous version and then just applied optimizations and bugfixes. No more.
> It would be great if we stop adding features now... I wish there was a Postgres branch that took previous version and then just applied optimizations and bugfixes. No more.
To be fair, quoting from the article:
> There are no big new features in Postgres 13, but there are a lot of small but important incremental improvements. Let's take a look.
But also, in general, yes there are pieces of software that do this - most recently Moment.js[1]. There was some discussion earlier this week[2].
Actually, Postgres does not add any performance features to point releases at all. Point releases are only for bug fixes. You may be thinking of MySQL.
Occasionally, there are cases that could be argued to be exceptions to the general rule. But that's a hard argument to make -- everything committed to a back branch is officially a bug fix. Things like optimizer regression fixes are "performance enhancements" in a certain sense, but are nevertheless justified as bug fixes.
I've reopened your comment upthread for editing. Please don't post off topic complaints like this though. The guidelines ask you not to go on about downvotes and also not to use allcaps for emphasis.
It's not as if Postgres is compromising on quality to add features, they're still very good at keeping Postgres stable even at the impressive rate that new features are added.
And while there are cases where I might agree with the general sentiment, I strongly disagree with this for Postgres. The new features are important and useful. Postgres is not a very specialized tool, it is a general purpose database that is used in many different ways. It isn't just done and feature-complete, there's still a lot of potential to improve it for various use cases.
Seriously, any analogy would have been better than an artistic painting one. They could have said "like a goatherd, you need to know when to stop feeding your goats" and been closer to the mark.
Suppose they had done that 10 years ago, would we be better of? Don't you want to know wher Postgres will be with 20 more years of innovation?
I'm afraid if postgres stopped adding features now, in 10 years it would no longer be relevant in software development. Most developers would probably jump ship to a more innovative database.
The world changes, which means requirements change. What was once good enough, will be insufficient in the future if you don't update it.
Well, there are branches that just apply bug fixes. We have five years of support for our (yearly) releases.
Optimizations kind of are features (sometimes kinda bugs too). Why should those be singled out vs other features?
The primary "bottlenecks" using postgres are different for different people. For you it may be performance. For others it's easier administration. For others it's SQL level query capabilities. Etc.
You need to RTFA. Also, database development is absolutely nothing like painting. "Postgres 13 features and enhancements :: Making a watercolor painting" is Top Ten Worst HN Analogies material.
Why? There's lots of features that Postgres doesn't have, but could, which would fit perfectly well into the current offering. Right off the top of my head, built-in bitemporal support recently added to MariaDB would be welcome in Postgres, and would have no negative impact on any existing functionality. "Stop adding features" is sensible when adding features degrades existing features or the experience thereof, it makes no sense at all when it has no negative consequence.
Some people need or benefit from things that don't interest you. If that's the case with a RDBMS, why not just... not upgrade?
If you go through the changelog, v13 as a release is mostly about improving existing features but I don't think pg is at that point yet where it could "stop" adding features. It almost stopped 8-10 years back and that is when mysql had overtaken it completely. I believe it still is the best open source SQL engine but features like native partitioning, JSON support etc that have been added in the last few years were absolutely needed in this century to be competitive. And now that they're there, they need to be improved and made stable.