That loop function should really have a Context so it can be cancelled; that's future work. But the idea stands -- it should be considered normal for transactions to fail, so you should always have a retry loop around them.
It's controversial for many good reasons. You make the general claim that retrying a db transaction should be the rule, when most experts agree that it should be the exception. Just in the context of web development it can be disputed on the account that a db transaction is just a part of a bigger contract that includes a user at the other end of a network, a request, a session, and a slew of other possible connected services. If one thing shows signs of being unstable, everything should fail. That's the general wisdom.
More specific to the code that you linked to, the retry happens in only two specific cases. Even then, I personally don't find what it's doing to be such great engineering. It hacks its way around something that should really be fixed by properly setting the db engine. By encroaching like this, it effectively hides the deeper problem that SQLite has been badly configured, which may come to bite you later.
Failing transactions would raise a stink earlier. Upon inquiry, you'd find the actual remedy, resulting in tremendous performance. Instead, this magic loop is trying to help SQLite be a database and it does this in Go! So you end up with these smart transactions that know to wait in a queue for their turn. And for some time, nobody in the dev team may be aware that this can become a problem, as everything seems to be working fine. The response time just gets slightly longer and longer as the load increases.
Code that tries to save failing things at all cost like this also tends to do this kind of glue and duct tape micromanaging of dependencies. Usually with worse results than simply adjusting some settings in the dependencies themselves. You end up with hard to diagnose issues. The code itself becomes hard to reason about as it's peppered with complicated ifs and buts to cover these strange cases.
Transactions are hard, and in reality there's a shit-ton of things people do that has no right to be close to a transaction (but still are), and transactions were a good imperative kludge at the time that has just warped into a monster that people kinda accept over the years.
A loop is a bad construct imho, something I like far better is the Mnesia approach that simply decides that transactional updates are self-contained functional blocks and the database manages the transactional issues (yes, this eschews the regular SQL interfaces and Db-application separation but could probably be emulated to a certain degree).
You'll just end up looping until your retry limit is reached. SQLite just isn't very good at upgrading read locks to write locks, so the appropriate fix really is to prevent that from happening.
I've needed the exact same loop on (an older) Postgres to stop production from hitting transient errors. It's fundamental to the concept of concurrent interactive transactions.
Yes, that's exactly what the linked code does: Calls your function, and if it returns an error, check through the wrapped errors to see if it's one of the SQLite errors which should be retried. If it is, try the transaction again; if not, pass the error up.
OK, a bunch of the replies here seem to be misunderstanding #1. In particular, the assumption is that the only reason a transaction might fail is that the database is too busy.
I come from the field of operating systems, and specifically Xen, where we extensively use lockless concurrency primitives. One prime example is a "compare-exchange loop", where you do something like this:
y = shared_state_var;
do {
oldx = y;
newx = f(oldx); // f may be arbitrarily complicated
} while((y = cmpxchg(&shared_state_var, oldx, newx)) != oldx);
Basically this reads oldx, mutates it into newx (using perhaps a quite complicated set of logic). Then the compare exchange will atomically:
- Read shared_state_var
- If and only if this value if equal to oldx, set it to newx
- In any case, return oldx
In the common case, when there's no contention, you read the old value, see that it hasn't changed, and then write the new value. In the uncommon case, you notice that someone else has changed the value, and so you'd better re-run the calculations.
From my perspective, database transactions are the same thing: You start a transaction, read some old values, you make some changes on those values. When you commit the transaction, if some of the the thing's you've read have been changed in the meantime, the transaction will fail and you start over again.
That's what I mean when I say "database transactions are designed to fail". Of course the transaction may fail because you have a connection issue, or a disk issue, or something like that; that's not really what I'm talking about. I'm saying specifically that there may be a data race due to concurrent accesses. Whenever there are more than one thing accessing the database, there is always the chance of this happening, regardless of how busy the system is -- even if in an entire week you only have two transactions, there's still a chance (no matter how small) that they'll be interleaved such that one transaction reads something which is then written to before the transaction is done.
Now SQLite can't actually have this sort of conflict, because it's always single-writer. But essentially what that means is that there's a conflict every time where there are two writes, not only when some data was overwritten by another process. Something that happens at a very very low rate when you're using a proper RDBMS like Postgres, now happens all the time. But the problem isn't with SQLite, it's with your code, which has assumed that transactions will never fail do to concurrency issues.
I always see SQLite as recommended, but every time I look into it there are some non-obvious subtleties around txn lock, retry behavior, and WAL mode. By default if you don't tweak things right getting frequent SQLITE_BUSY errors seems to occur at non-trivial QPS.
Is there a place that documents what the set-and-forget setting should be?
You shouldn't blindly retry things that fail as a default, and you should really not default into making the decision of what to do on a server that is just on the middle between the actual user and the database.
Handling errors on the middle is a dangerous optimization.
Others have said much about a transaction loop, but I also don't think that database transactions are necessarily designed to fail in the sense that the failure is a normal mode of operation. Failing transactions are still considered exceptional; their sole goal is to provide logical atomicity.
You're the first person I've heard say so. When I was learning DB stuff, there were loads of examples of things that looked at the error from a transaction. Not a single one of them then retried the transaction as a result.
The OP's comment is a symptom of this -- they did some writes or some transactions and were getting failures, which means they weren't retrying their transactions. And then when they searched to solve the problem, the advice they received wasn't "oh, you should be retrying your transactions" -- rather, it was some complicated technical thing to avoid the problem by preventing concurrent writes.
I'm in the opposite camp. Transactions can fail, however retrying them is not the solution. It hides the symptom (which is not a problem with proper monitoring), but more importantly, it can lead to overwhelming the db server. Sometimes failure happens because of the load, in which case retrying the query is counterproductive. And even in cases where retrying would be the correct approach, it is the responsibility of the app logic, not of the db connection layer, to retry the query. Imho of course. :)
OK, here's a potentially controversial opinion from someone coming into the web + DB field from writing operating systems:
1. Database transactions are designed to fail
Therefore
2. All database transactions should done in a transaction loop
Basically something like this:
https://gitlab.com/martyros/sqlutil/-/blob/master/txutil/txu...
That loop function should really have a Context so it can be cancelled; that's future work. But the idea stands -- it should be considered normal for transactions to fail, so you should always have a retry loop around them.