Hacker News new | past | comments | ask | show | jobs | submit login
Things we learned running Postgres 13 (pganalyze.com)
377 points by lfittl on Sept 21, 2020 | hide | past | favorite | 49 comments



As with any Postgres update, I recommend Noriyoshi Shinoda‘s super detailed change log [PDF] with examples as a reference for DBAs looking to upgrade. This version it’s 81 pages (so far): https://www.hpe.com/content/dam/hpe/download/pdf/japan/linux...


Is there a list of these somewhere? I can find a few editions on search engines, just curious if there's a more canonical list.

And thanks for sharing, this is a great resource!


I've only started reading them back when PG11 was in beta and found them mostly by googling.

Here's what I can find:

- PG 9.6: https://www.slideshare.net/noriyoshishinoda/postgresql-96-ne...

- PG 10: https://www.hpe.com/content/dam/hpe/download/pdf/japan/linux...

- PG 11: https://h50146.www5.hpe.com/products/software/oe/linux/mains...

- PG 12: https://h50146.www5.hpe.com/products/software/oe/linux/mains...

There are more, but those are in Japanese.



Wow, thanks for sharing this!


This is a treasure. Thank you for sharing.


There is one thing I'm curious about and never found a good answer: why there is no VACUUM equivalent in MySQL or MS SQL? I guess VACUUM is necessary because of the way Postgres stores data? How do those other RDBMS clean up indexes?


When you update a row in a SQL database, you need to keep the old row around until the transaction is committed.

In Postgres, a new row is written, and both the new and old coexist in the table. Even after the transaction is committed, the old row stays in the table. VACUUM scans the table, finds rows that are no longer needed, and marks them as okay to overwrite.

In MySQL, the row is overwritten in the table immediately, and then the old row is temporarily written to a separate area (the UNDO Segment), so the old record is available if needed. Since old rows don't accumulate in the main table, there's no need for VACUUM.


Actually for MySQL it depends on the storage engine, what you described is how InnoDB works.

MyRocks storage, however, only appends data, and requires regular Compaction to clean up old rows, similar to Postgres VACUUM.


Your point about MyRocks is a good one. It's also true that MyRocks takes a similar approach to Postgres when it comes to implementing the higher isolation levels. It wouldn't make sense to add InnoDB style next-key locks to MyRocks, either.

Systems that closely adhere to traditional 2PL designs such as InnoDB or DB2 end up with tight coupling between recovery, concurrency control, and storage. This has many consequences, both good and bad. It's no coincidence that Postgres can support quite a variety of index access methods, including support for transactional full text search. You get the transactional stuff "as standard" with the Postgres approach to versioned storage. There is no need to bake concurrency control into each and every index access method.


Yeah, I should have specified that. Thanks for mentioning it.


Thank you for your answer, this makes sense! I guess that there must be some tradeoff, right? What I mean is that while in MySQL you don't need to vacuum, perhaps updates would be slightly slower (due to the need to write into the UNDO segment)?


Recovery in Postgres is simplified and sped up considerably by the approach taken to MVCC. Same is true with transaction rollback. SQL Server users (on recent SQL Server versions) can optionally get a similar trade-off by enabling constant time recovery (CTR), which is called Accelerated Database recovery (ADR) in the MS marketing material. See the paper:

https://www.microsoft.com/en-us/research/uploads/prod/2019/0...

SQL Server has something called "background cleanup" when this feature is enabled, which takes place in a separate process called "the cleanup process" - just like VACUUM. The paper itself explicitly acknowledges the similarity. All of the advantages the paper claims for CTR/ADR are also advantages for Postgres.

It's also true that both traditional SQL Server (SS with ADR/CTR disabled) and MySQL have background cleanup processes of their own for indexes, but that is a little different. It's about asynchronously reclaiming space for index entries that were already delete-marked.


Conceptually you still need to clean up the old row versions, regardless of where they're located. InnoDB has purge threads to clean up undo log entries that are older than the oldest active transaction. If too many of them accumulate, such as due to a long-running transaction preventing purge progress, it increasingly harms performance. This can be a major problem in MySQL for workloads which mix a high-volume of quick UPDATE and/or DELETE operations concurrently with other slower/longer transactions.


I think this might provide more context which you are looking for (and your suspicion seems spot on!):

> This means more writes to do on updates and makes access to old row versions quite a lot slower, but gets rid of the need for asynchronous vacuum and means you don't have table bloat issues. Instead you can have huge rollback segments or run out of space for rollback.

https://stackoverflow.com/questions/25153532/why-is-it-a-vac...

Edit: Just wanted to explicitly call out that the Robert Haas article linked in the answer is excellent (I am still reading my way through it).


There are plans for postgres to move to a more similar strategy to MySQL, at least in some cases:

http://amitkapila16.blogspot.com/2018/03/zheap-storage-engin...


“In MySQL, the row is overwritten in the table immediately, and then the old row is temporarily written to a separate area (the UNDO Segment)”

I would hope and expect MySQL does that in reverse order, as “yeah, we have that row in memory” isn’t a good answer to “are you sure you can roll back that change if needed?”


For future reference, I have found this excellent post about the same topic https://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-...


The best change imo is the efforts to verify backups now with the pg_verifybackup command. https://www.postgresql.org/docs/13/app-pgverifybackup.html


[Off Topic] Postgres 13 doesn't work with the PSequel client; PSequel is no longer developed and that kinda sucks.


line 24 takes 5m on 155M records, it's not a pg problem, how can it be made faster?

https://github.com/jakeogh/pubchemmer/blame/master/README.md


It would be great if we stop adding features now. Are there any pieces of software that has started doing this? Why are we perpetually in this loop of feature creep?

Like a painter, you must know when to stop painting. This is a real dilemma all painters face, especially when dealing with watercolor. If anyone has tried painting here, you know exactly the problem - how do you know you're done? It applies to writing and music as well, but it is more evident in visual arts than any other field.

I wish there was a Postgres branch that took previous version and then just applied optimizations and bugfixes. No more.


> It would be great if we stop adding features now... I wish there was a Postgres branch that took previous version and then just applied optimizations and bugfixes. No more.

To be fair, quoting from the article:

> There are no big new features in Postgres 13, but there are a lot of small but important incremental improvements. Let's take a look.

But also, in general, yes there are pieces of software that do this - most recently Moment.js[1]. There was some discussion earlier this week[2].

[1] https://momentjs.com/docs/#/-project-status/

[2] https://news.ycombinator.com/item?id=24477941


It's unfortunate that performance is so rarely considered to be a feature.


There's plenty performance work going on in postgres. Most major releases had significant improvements in some aspect of that.


Postgres usually adds plenty of performance from one version to another.

Not much on .0 versions, those are more feature packed, but all the other time.


Actually, Postgres does not add any performance features to point releases at all. Point releases are only for bug fixes. You may be thinking of MySQL.

Occasionally, there are cases that could be argued to be exceptions to the general rule. But that's a hard argument to make -- everything committed to a back branch is officially a bug fix. Things like optimizer regression fixes are "performance enhancements" in a certain sense, but are nevertheless justified as bug fixes.


“.0” versions are the only versions that would have a major performance change. Anything else would be a fix version.


[flagged]


I've reopened your comment upthread for editing. Please don't post off topic complaints like this though. The guidelines ask you not to go on about downvotes and also not to use allcaps for emphasis.

https://news.ycombinator.com/newsguidelines.html

Edit: I've re-closed the comment but if you want to add to it, you're welcome to email hn@ycombinator.com.


The moment.js author just said that no more changes would be added, so that's one recent example. People were praising the decision in the HN thread.

https://news.ycombinator.com/item?id=24477941


It's not as if Postgres is compromising on quality to add features, they're still very good at keeping Postgres stable even at the impressive rate that new features are added.

And while there are cases where I might agree with the general sentiment, I strongly disagree with this for Postgres. The new features are important and useful. Postgres is not a very specialized tool, it is a general purpose database that is used in many different ways. It isn't just done and feature-complete, there's still a lot of potential to improve it for various use cases.


> Like a painter, you must know when to stop painting.

You presuppose that Postgres (or all software) is comparable to art, without making or explaining the comparison.

You say that "you must know when to stop painting", but what if the developers behind Postgres know when to stop, and it just isn't finished?

What makes you reason that Postgres is finished?


Seriously, any analogy would have been better than an artistic painting one. They could have said "like a goatherd, you need to know when to stop feeding your goats" and been closer to the mark.


Suppose they had done that 10 years ago, would we be better of? Don't you want to know wher Postgres will be with 20 more years of innovation?

I'm afraid if postgres stopped adding features now, in 10 years it would no longer be relevant in software development. Most developers would probably jump ship to a more innovative database.

The world changes, which means requirements change. What was once good enough, will be insufficient in the future if you don't update it.


Well, there are branches that just apply bug fixes. We have five years of support for our (yearly) releases.

Optimizations kind of are features (sometimes kinda bugs too). Why should those be singled out vs other features?

The primary "bottlenecks" using postgres are different for different people. For you it may be performance. For others it's easier administration. For others it's SQL level query capabilities. Etc.


Postgres 9.5 is still getting updates and bug fixes, each major version has 5 years of support.

If you want to move very slowly and deliberately, you can do, depending on your requirements.

https://www.postgresql.org/docs/9.5/release-9-5-23.html


How does the introduction of new features negatively impact your use of Postgres?


You need to RTFA. Also, database development is absolutely nothing like painting. "Postgres 13 features and enhancements :: Making a watercolor painting" is Top Ten Worst HN Analogies material.


Why? There's lots of features that Postgres doesn't have, but could, which would fit perfectly well into the current offering. Right off the top of my head, built-in bitemporal support recently added to MariaDB would be welcome in Postgres, and would have no negative impact on any existing functionality. "Stop adding features" is sensible when adding features degrades existing features or the experience thereof, it makes no sense at all when it has no negative consequence.

Some people need or benefit from things that don't interest you. If that's the case with a RDBMS, why not just... not upgrade?


If you go through the changelog, v13 as a release is mostly about improving existing features but I don't think pg is at that point yet where it could "stop" adding features. It almost stopped 8-10 years back and that is when mysql had overtaken it completely. I believe it still is the best open source SQL engine but features like native partitioning, JSON support etc that have been added in the last few years were absolutely needed in this century to be competitive. And now that they're there, they need to be improved and made stable.


There is this special biologist word we use for 'stable'. It is 'dead'.

-- Jack Cohen


Fork source and organize like minded developers (if any) around that concept.

Of course, that is a lot more work than posting comment...


I'd be game if the project was called "Postpostgres", but only for that reason.


If postgres stopped adding new features, we'd be forced to adopt Oracle


Good news! It’s an open source project so you are free to do this yourself!


Be the change you wish to see in the world.


Be the change you wish to find in the couch.


I feel like a million bucks!


[Insert default "its OSS, just create a fork" here]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: