PostgreSQL Subtransactions Considered Harmful

jrockway · on Aug 31, 2021

I agree with this. I think we get baited into using subtransactions by how we structure our code. Each function feels like a transaction -- it gets its own local variables, and if it fails, it doesn't have any effect on the rest of the program. (Not strictly true, of course, I'm sure some failing functions modify global state, or their receiver.)

We then mindlessly copy that to our database code -- each mutation function takes a "database object", which could be a direct database connection, or it could be an in-progress transaction. It's generic so that you don't have to care. Functions that think their stuff needs to be a transaction just start one, and if it errors out, hey, it's rolled back.

(Whenever you have a "don't care" type, it means half the functions will be documented "// must be run in a transaction" and the other half will pessimistically create a transaction "just in case" it was invoked with a raw database connection instead of a transaction object.)

Thinking about it more critically, 100% of the times I've wanted to write this, I've wanted to abort the parent transaction as soon as the first child fails. I tend to retry transactions, and doing that twice doesn't make a lot of sense (parent transaction starts, calls a helper function, that starts a transaction, it has a conflict and has to be rolled back, helper function is re-run, that ends up committing, parent transaction fails because of a conflict... and the update gets rolled back anyway).

I basically structure my database APIs to take transaction objects, and make each public API member a transactional unit. Then, the very top level creates and commits the transaction, and can add whatever retry logic it deems necessary.

(Using the classic example, TransferFunds() would be public, and addMoney() and withdrawMoney() would be private. That way, the runner of a transaction can't do "doTx(addMoney); doTx(withdrawMoney)", it would be forced to do "doTx(TransferFunds)". And, all three would take a Transaction object instead of a TransactionOrDatabase object, so the type system enforces the transactional expectations of a money transfer operation.)

spencercw · on Sept 1, 2021

I do basically the same thing. If a method tries to start a transaction when a transaction is already open then it's immediately rejected.

I don't really like automatic transaction joining behaviour because it makes it hard to reason about the application behaviour. Will this transactional method commit when it returns? It's impossible to tell without looking at who's calling it.

It also encourages annoying behaviour like, oh this method uses the database, better make it @Transactional.

ahachete · on Aug 31, 2021

This is a deep and excellent post. Thanks for publishing it and all the research required.

Maybe too much 30,000 feet perspective, but in general everything that is "state" in a session is a potential limit to performance and scalability.

In this blog post, it is the "state" associated with subtransactions. In another area, it may be Prepared Statements. Often regarded as better performing than non prepared (and they strictly are, generally), they impose hard limits into connection scalability. Connection poolers cannot hold this state and thus you cannot use transaction pooling, which boosts performance and resource usage much more than prepared vs non-prepared.

Anyway, that's another topic. But session state --is something I'm becoming more and more against. YMMV.

ltbarcly3 · on Sept 1, 2021

Using subtransactions appropriately is unlikely to trigger any noticeable performance differences, any more than 'having less transactions' overall will.

When doing multiple steps that need to track external state, subtransactions can greatly simplify the schema you need as well as make the database interactions more efficient. You need to safely record progress at each step, and you can't afford to lose previous steps because a transaction gets rolled back unexpectedly. You could start a transaction and commit between each step, but this gets pretty annoying if you need to SELECT FOR UPDATE or otherwise make sure what you are doing is concurrency safe, since this would require you to track all the resources you need as you go, which would greatly complicate the design of your non-sql code. Not to mention the added failure modes of not being able to reacquire those resources after step N for whatever reason.

Where things get stupid is ORMs and frameworks that start sub transactions all over the place when you don't need them and aren't even trying to use them. I'm looking at you, Django transaction.atomic(), you absolute idiot.

SPBS · on Sept 1, 2021

> When doing multiple steps that need to track external state

Do you have an example of this? Many explanations for SAVEPOINT say it is useful when 'recalculation is deemed too expensive' but I cannot come up with a satisfactory example where this is truly the case.

samokhvalov · on Sept 1, 2021

What does "appropriately" mean in this context?

ltbarcly3 · on Sept 1, 2021

Not automatically inserting subtransactions that aren't necessary, especially when you aren't setting your application up to ever actually use them, as a side effect of something else. Like Django's transaction.atomic(), which at the top nesting level starts and commits a transaction, but which any dynamically nested calls start and commit sub transactions. This is almost never what anybody wants, I can count on one hand the number of times I've seen code that handles failed sub transactions and continues the overall transaction, and which most programmers don't even know how it works or what it does.

rattray · on Aug 31, 2021

Apparently Django uses subtransactions by default when you use atomic=True which can result in scary behavior

justinludwig · on Aug 31, 2021

Lame title, but a really good, in-depth examination of the lurking issues with subtransactions on postgres.

jakswa · on Sept 1, 2021

Unique constraints roll your transaction back. Put your `INSERT` in a nested transaction and you can fall back to `SELECT`, and keep going without throwing away previous work. In multi-table, write-heavy workloads, this insert-or-select path begs for nested transactions so you can lean on uniqueness constraints while keeping your transaction afloat. So far I'm not convinced to abandon that pattern by this article, but I'll keep an eye out for a happier path I guess.

gurjeet · on Sept 1, 2021

If that's what you want, you can happily use Postgres' UPSERT implementation, and abandon subtransactions very easily.

https://www.postgresql.org/docs/current/sql-insert.html#SQL-...

jakswa · on Sept 1, 2021

It's a good idea to revisit, but I think I need SELSERT or something. Are you saying the `DO NOTHING` clause can still result in a row being returned with `RETURNING *`? My impression is that it can't but I'd be excited to be wrong!

edit: Ah, I bet the implicit suggestion was to do `INSERT ... ON CONFLICT DO NOTHING RETURNING *` and if no row was returned, you hope to know/guess the conflict and do an extra `SELECT`. I'll think more on this, thanks.

crateless · on Sept 1, 2021

> The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table.

Therefore you can do DO UPDATE SET table_name.column = Excluded.column WHERE table_name.column IS DISTINCT FROM Excluded.column RETURNING *

jakswa · on Sept 1, 2021

Am I right to interpret this as "do a non-changing UPDATE so that you get a returned row"? It's another great idea I hadn't considered, thank you. I'm worried this causes the row to get re-written on disk (an actual UPDATE, even though values aren't changing). That could be a moot concern, I'll dig more.

crateless · on Sept 1, 2021

Yeah, that's a valid concern and perhaps you might find an answer here https://stackoverflow.com/questions/34708509/how-to-use-retu...

monkey26 · on Aug 31, 2021

Really dislike these "considered harmful" type titles.

simonw · on Aug 31, 2021

You may enjoy this classic then:

“Considered Harmful” Essays Considered Harmful https://meyerweb.com/eric/comment/chech.html

ahachete · on Aug 31, 2021

I'm the co-author of one of such ones: "Schema Later Considered Harmful" [1]. I don't see what's wrong with this "type titles", may you elaborate? :)

[1] https://www.enterprisedb.com/blog/schema-later-considered-ha...

monkey26 · on Aug 31, 2021

I've seen plenty over the years as a professional developer of 20+ years and they are opinion pieces for the most part. I'd rather see a title along the lines of "Why you should avoid subtransactions".

Drinking bleach is harmful.

cakoose · on Sept 1, 2021

The original was Dijkstra's "Go To Statement Considered Harmful". Many considered that title unnecessarily provocative, but it ended up creating a new CS/programming meme.

Now, I see "Considered Harmful" as just a concise way of alluding to that style of article, basically saying "here are the not-well-known downsides of a commonly-used thing". Many "Considered Harmful" articles do not live up to that promise, but I think this one does.

Another title meme: "Zen and the Art of ...".

unportant · on Sept 1, 2021

The funny thing is that Dijkstra named this article "A Case Against the Goto Statement". ACM editor didn't find that clickbaity enough and changed the title.

akvadrako · on Sept 1, 2021

That original title is so much better. If articles like this would use it instead of the clickbait one it would give them more credability.

Waterluvian · on Sept 1, 2021

In my opinion:

It’s smug.

You’re publishing an opinion with a title that suggests an authority has formed a consensus.

ketralnis · on Sept 1, 2021

It reads to me the same as "is deprecated" or neovim's claim of "literally the future of vim": it makes bold claims about consensus with no support

stonecolddevin · on Aug 31, 2021

They're hyperbolic and don't actually give you any useful information on why they're "harmful" or what's wrong with the subject being discussed.

ahachete · on Aug 31, 2021

That they are hyperbolic, maybe. I disagree with the rest, I believe there are tons of information on both examples about why they are harmful.

(edit, typo)

detaro · on Aug 31, 2021

Not in the titles. "X is harmful" could instead be "X is unsafe", "X is often slow", "X has non-obvious corner cases", "X led us to maintenance hell", ...

"Harmful" is less information and kind of suggests an overall judgement for all cases vs "things to consider" (which was e.g. also a criticism of the original letter, that it lead to rules like "never use goto" which forced people to do bad workarounds instead of discussion on where to use it and where not)

kbenson · on Aug 31, 2021

The problem with titles like that is that they don't convey the same thing. "considered harmful" has entered the lexicon at this point, so you know what you're going to get when you see it, which is someone attempting to make a reasoned case about why some behavior that people do and consider okay is actually worse than people may suspect.

In that way, it's much more informative and less likely to have people arguing pedantic points about the title being "correct" than "is harmful" or "is often slow" or "has non-obvious corner cases" or "can create maintenance problems", which might all be items together in a "is considered harmful" article.

In other words, it's come full circle. There was the first, then there was the numerous copycats, then the backlash, and now we're all the way around to the point there it's a fairly succinct way to describe exactly the type of article people use it on and most people know what it means and what to expect when they see it.

All that's left is for people to realize it's not going anywhere and actually has some beneficial use and that there's no point in complaining about it anymore, but that might be asking too much.

ahachete · on Aug 31, 2021

Fair enough. Yet it might be difficult to summarize the tradeoffs succinctly enough for a title.

"Harmful" conveys enough information to raise awareness of a topic which might not be usually considered. Both examples show this pattern: schema-less is often considered a good thing, when it is not (in Stonebraker's and my opinion); also subtransactions are usually considered a good thing, while the OP shows they can have notable negative effects (of diverse nature) in the database.

emaginniss · on Sept 1, 2021

They should start calling them "problematic"

smsm42 · on Sept 1, 2021

You mean, you actually have to read the article?

Cullinet · on Sept 1, 2021

I was confused enough by the word subtransactions to ask DDG for help. This discussion is #1 and #3 is https://core.ac.uk/download/pdf/14445799.pdf which is a more direct overview of where subtransactions are at. The paper's title Transactions Design For Databases With High Performance and Availability isn't desperately helpful, but the tabular isolation, concurrency and atomicity survey is.

EDIT :

the very first reference in the above paper A Critique of ANSI isolation levels, which is from 1995 and I have only just started with but looks like important reading, is here https://www.microsoft.com/en-us/research/wp-content/uploads/...

SPBS · on Sept 1, 2021

> It is recommended to learn if subtransactions are used in your systems. If they are, it does not immediately mean that they need to be eliminated – it all depends on the risks of your particular case. Evaluating such risks is a complex task that requires deep Postgres expertise, so if you need such an analysis, consider hiring a PostgreSQL expert.

It is quite disheartening that this essentially echoes this quote from the related blog post https://buttondown.email/nelhage/archive/22ab771c-25b4-4cd9-...:

> It is entirely possible to operate Postgres safely and with high performance and throughput at scale … but essentially the only way to do it is to have ready access to deep PostgreSQL experience and expertise on your team; people who Just Know where the landmines are because they’ve seen them before.

And https://blog.nelhage.com/post/some-opinionated-sql-takes/:

> My personal choice: MySQL

> The devil is in the details, but for me, as a default, and despite all these pet peeves, I would start with MySQL.

> As for Postgres, I have enormous respect for it and its engineering and capabilities, but, for me, it’s just too damn operationally scary. In my experience it’s much worse than MySQL for operational footguns and performance cliffs, where using it slightly wrong can utterly tank your performance or availability. In addition, because MySQL is, in my experience, more widely deployed, it’s easier to find and hire engineers with experience deploying and operating it. Postgres is a fine choice, especially if you already have expertise using it on your team, but I’ve personally been burned too many times.

Why should you need expert consultation in order to use core features, when other databases "just work" without all these caveats? Transaction ID exhaustion and VACUUM tuning comes to mind here.

aeyes · on Sept 1, 2021

I believe problem 3 (MultiXact SLRU buffer cache lock contention) can occur even if you don't nest transactions in your code. We traced the problem back to foreign keys, when rows are updated Postgres internally does SELECT .. FOR KEY SHARE on referenced tables. This requires MultiXacts.

I feel that we miss benchmarks and observability tools for the low level locks which are essentially single threaded, if you have lock contention on them the system will grind to a complete halt.

If you can recompile, raising the size of this buffer cache can help.

eyelidlessness · on Sept 1, 2021

This isn’t wrong at all, but…

1. A lot of it is heavily dependent on implementation details. (Yeah we have to care about XID, but we shouldn’t have to care about it exponentially for nested scope workloads once that scope exits.)

2. Actual nested transactions could help alleviate that without too much burden on the MVCC implementation. (This would provide a mechanism for isolating and cleaning up scoped XIDs.)

captaincrowbar · on Sept 1, 2021

"One of the basic issues with subtransactions is that they increment XID – the global transaction ID."

If the subtransaction is later rolled back, this is starting to sound like the kind of speculative execution information leak that led to the Spectre attack. I wonder if you could base a similar attack on this?

gigatexal · on Sept 1, 2021

Why is the XID a 32bit number? Surely we can move to 64bit and punt the wraparound issue a bit?

jakswa · on Sept 1, 2021

My fuzzy take on a concern here could be: storage costs and memory usage. My impression is that it is written/copied to actual rows as xmin (https://www.postgresql.org/docs/current/ddl-system-columns.h...). So loading X rows from disk into memory is loading X transaction IDs into memory.

That said, I'm not sure the concern holds water. At 2bil rows we're talking 7.4GiB of disk vs 15GiB. Depending on what % of your database you want to keep in memory, maybe it goes ok.

vvern · on Sept 1, 2021

cockroachdb implements subtransactions and I don't think any of these issues apply to cockroach's implementation.

samokhvalov · on Sept 1, 2021

They have an interesting section, "Beware: don't (over) use nested transactions", in their blog post: https://www.cockroachlabs.com/blog/nested-transactions-in-co...

> ...we do not recommend their use in new applications.

> The reality is that nested transactions are a product of the early days of software engineering, in the 1990s, back when systems were tightly coupled and the Internet and the Cloud were not yet very relevant. ...

> Nowadays, such tight coupling has a bad rep. This is because two additional decades of software engineering have taught us that implicit global state really, really does not play well with distributed services where some components may fail even as part of normal load ...

> Additionally, nested transactions can amplify performance anomalies ...

> Finally, nested transactions can run afoul of correctness in distributed apps. In fact, the idea of multi-component transactions in client code really evokes the idea of a bull in a china shop. As long as all is well and the transaction is due to commit, the idea somewhat makes sense. However, what happens when the database (and not the client) decides the transaction is un-committable and must be aborted, for example because of a serializability conflict or a node restarting for maintenance? It is not just the database state that must be rolled back; all the possible side effects performed by the components holding the transaction must also be rolled back.

They implemented it, but do not recommend using it.

cakoose · on Sept 1, 2021

Does anybody here understand the argument they're making? The language is vague and most examples seem to apply to un-nested transactions as well.

> Additionally, nested transactions can amplify performance anomalies...

This is a function of a transaction's duration and volume of writes, right? Does it matter whether there is nesting?

> It is not just the database state that must be rolled back; all the possible side effects performed by the components holding the transaction must also be rolled back.

Doesn't this apply to un-nested transactions as well? Having external API calls run "within" a DB transaction is common source of inconsistency.