Hacker News new | past | comments | ask | show | jobs | submit login
ACID in Theory and Practice (danweinreb.org)
73 points by wglb on Nov 7, 2010 | hide | past | favorite | 17 comments



Good article. Too many people just keep using the "ACID" term without really understanding what their transactional semantics are.

A good example of bumps you might hit is a problem I encountered a while ago: the default level of transaction isolation in postgres is “Read Committed”, not “Serializable” as the SQL standard mandates. This sounds fine, until you get constraint violations under load because of race conditions. See http://jan.rychter.com/enblog/2009/6/6/postgresql-transactio... if you want the full story.

BTW, I suspect problems like the one I encountered exist in lots of software systems, they just rarely get discovered.


The VAST majority of database transactions operate correctly in READ COMMITTED mode. A vanishingly small amount need REPEATABLE READ, and even fewer need SERIALIZABLE, which, in reality, is not actually truly serializable in any MVCC database. The transaction isolation semantics in SQL were created before MVCC databases took hold and generally only roughly represent their intentions in an MVCC implementation.


Do any databases allow you to specify that a transaction needs a particular mode, and then that mode is implemented only for that transaction? e.g.,

  START TRANSACTION MODE READ REPEATABLE;
or something like that. It would seem to be a way to only use the extra resources when consistency is paramount. Of course it would take a lot of careful thought to determine which transactions need this mode...



Given that we have no way of reliably determining which systems require higher levels of serialization for correctness, I'm not sure we should draw much comfort from this.


MySQL/InnoDB defaults to REPEATABLE READ, I believe. We started running into locking/concurrency issues in this mode a long time ago and found that for our uses, 99.9% of the time READ COMMITTED would work just fine. We now use READ COMMITTED as the default and switch to REPEATABLE READ only when needed. We handle about 100x more traffic now than we did when we were forced to make the switch.


Posted this on Dan's blog, but will add here too:

A counterpoint to the nobody really uses full isolation levels is VoltDB. Due to it’s unique single-threaded architecture, it can ONLY perform transactions with full serializability guarantees, and it does it with remarkable performance. Our head of field engineering, Tim Callaghan, used to be a die-hard Oracle user and he keeps saying how refereshing it is to actually not have to think about concurrency.


It is a triumph of vendor marketing (Hi, John Hugg!) that reasonably informed engineers think ACID == SQL/relational and that NoSQL == EC. Both are false. As a couple of examples, have a look at Google Percolator http://research.google.com/pubs/pub36726.html and Scalaris http://code.google.com/p/scalaris/ .


I am John Hugg. I do work for VoltDB. I am proud of our transaction isolaton.

Not sure why this comment was a reply to me though. I've never claimed all NoSQL is EC, not all SQL/Relational is ACID. I totally agree there is a lot of unfortunate confusion, though I wouldn't limit the blame.


From your earlier comment:

"Due to it’s unique single-threaded architecture..."

That would be a single thread per partition, right? Are those partitions automatic or might a developer have to consider their creation carefully based on how data will be accessed? And replication? Seems concurrency _is_ something about which developers have to think. Here's the entry in the VoltDB FAQ: http://community.voltdb.com/faq#id545447

"...how refereshing it is to actually not have to think about concurrency."

Except you do. By figuring out the partitioning and replication up front. TANSTAAFL, regardless of our level of enthusiasm for our employers.


You have to think about partitioning up front. In a sense, this is a prerequisite for our concurrency. So you're right.

What I meant was, inside of a stored procedure itself, you never have to think about concurrent access to the data you are modifing during the lifetime of the procedure. You own it. Period.

I wouldn't ever try to make the argument that this functionality is "free" and that VoltDB is a system without tradeoffs. It just seemed particularly relavent when the original article was making claims that serializable transaction semantics in SQL-RDBMSs were rare due to performance issues. I didn't mention NoSQL or Key-Value stores anywhere in my original comment.


From the VoltDB FAQ (http://community.voltdb.com/faq#id545596):

Key-Value stores are a mechanism for storing arbitrary data (i.e. values) based on individual keys. Distributing Key-Value stores is simple, since there is only one key. However, there is no structure within the data store and no transactional reliability provided by the system.


Sorry. I didn't write it and I'll see what I can do about toning down the language. In the defense of the text, while some Key-Value stores provide atomic operations on individual values, I'm not aware of many that offer actual transactions in the traditional sense of the term.

If you want to complain about our marketing, feel free to email Fred Holahan (fholahan at voltdb.com). He didn't write everything on our website, but he's in charge.


I hadn't really looked into Scalaris much. I see they're trying to provide transactions on a DHT. Cool. Something to keep an eye on.


What I have seen in the real world is that different parts of those guarantees are used in different situations. ACID guarantees are often relaxed, as he rightly says, but for some transactions or reports it is absolutely critical to get it right and performance is secondary. A missed deadline or an error in an important financial report can be an existential threat to a company. That's why database technology is bound to be changing very very slowly for mission critical tasks. It's not an academic issue.


Most people don't operate in a 100% ACID environment. However, This has a pretty low impact on web applications, so many of us can breathe easy. PostgreSQL's default isolation level is READ COMMITTED and InnoDB's default isolation level is REPEATABLE READ. While this might seem like a big difference, the SQL standard is pretty loose in the definition of these, and because of their implementations, they have roughly equivalent consequences. From what I understand, most InnoDB users that demand better performance in high concurrency situations switch the default to READ COMMITTED.


"Oracle DB has more than one “isolation level”. The strongest, READ REPEATABLE, provides ACID transactions."

i couldn't read his blabber past this point. Before talking about something you don't know about, check the docs at least:

---

http://download.oracle.com/docs/cd/B28359_01/server.111/b283...

"Oracle Database offers the read committed and serializable isolation levels, as well as a read-only mode that is not part of SQL92. Read committed is the default."

---

To make it clear:

1. there is no READ REPEATABLE in Oracle

2. in general, among the isolation levels, READ REPEATABLE isn't the strongest




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: